NeuralSVG: An Implicit Representation for Text-to-Vector Generation

sagipolaczek.github.io

・

749 points

・

lnyan

・

3 days ago

82 comments

vipshek ・ 2 days ago

This is excellent!

I think the utility of generating vectors is far, far greater than all the raster generation that's been a big focus thus far (DALL-E, Midjourney, etc). Those efforts have been incredibly impressive, of course, but raster outputs are so much more difficult to work with. You're forced to "upscale" or "inpaint" the rasters using subsequent generative AI calls to actually iterate towards something useful.

By contrast, generated vectors are inherently scalable and easy to edit. These outputs in particular seem to be low-complexity, with each shape composed of as few points as possible. This is a boon for "human-in-the-loop" editing experiences.

When it comes to generative visuals, creating simplified representations is much harder (and, IMO, more valuable) than creating highly intricate, messy representations.

gwern ・ 2 days ago

Have you looked at https://www.recraft.ai/ recently? The image quality of their vector outputs seems to have gotten quite good, although you obviously still wouldn't want to try to generate densely textured or photographic-like images like Midjourney excels at. (For https://gwern.net/dropcap last year or before, we had to settle for Midjourney and create a somewhat convoluted workflow through Recraft; but if I were making dropcaps now, I think the latest Recraft model would probably suffice.)
- esperent ・ 2 days ago
  
  Link to their vector page, since the main page makes them look like yet another AI image generator:
  https://www.recraft.ai/ai-image-vectorizer
  The quality does look quite amazing at first glance. How are the vectors to work with? Can you just open them in illustrator and start editing?
  
  gwern ・ 2 days ago
  ・ 4 more
  
  No, I actually was referring to their native vector AI image generator, not their vectorizer - although the vectorizer was better than any other we found, and that's why we were using it to convert the Midjourney PNG dropcaps into SVGs
  (The editing quality of the vectorized ones were not great, but it is hard to see how they could be good given their raster-style appearance. I can't speak to the editing quality of the native-generated ones, either in the old obsolete Recraft models or the newer ones, because the old ones were too ugly to want to use, and I haven't done much with the new one yet.)
  
  brown_martin ・ 2 days ago
  ・ 3 more
  
  I was under the impression that their AI Vector generator generates a PNG and vectorizes under the hood.
  
  gwern ・ a day ago
  ・ 2 more
  
  Hm... I was definitely under the impression that it is generating SVGs natively, and that was consistent with its output and its recent upgrades like good text rendering, and I'm fairly sure I've said as much to the CEO and not been corrected... But I don't offhand recollect a specific reference where they say unambiguously that it's a SVG generator rather than vectorizer(raster), so maybe I'm wrong about that.
  
  brown_martin ・ 17 hours ago
  
  For me its based on that vector generation is much harder than raster, recraft has raised just over $10M (not that much in this space), and their api has no direct vector generation.
Lerc ・ 2 days ago

There is also the possibility for using these images as guidance for rasterization models. Generate easily manipulatable and composible images as a first stage then add detail once the image composition is satisfactory.
- datadrivenangel ・ 2 days ago
  
  Trivially possible with controlnets!
SillyUsername ・ 2 days ago

My little project for the highly intricate, messy representation ;) https://github.com/KodeMunkie/shapesnap (it stands on the backs of giants, original was not mine). It's also available on npm.
zidad ・ 2 days ago

I always imagine how useful Sora.ai could be if it would generate 3D models to render their animations from instead
- spyder ・ 2 days ago
  
  I agree, that's the future of these video models. For professional use you want more control and the obvious next step towards that is to generate the full 3D scene (in the form of animated gaussian splats since that's more AI friendly than the mesh based 3D). That also helps the model to be more consistent but also adds the ability for the user to have more control over the camera or the scene.
cochlear ・ 2 days ago

I couldn't agree more. I feel that the block-coding and rasterized approaches that are ubiquitous in audio codecs (even the modern "neural" ones) are a dead-end for the fine-grained control that musicians will want. They're just fine for text-to-music interfaces of course.
I'm working on a sparse audio codec that's mostly focused on "natural" sounds at the moment, and uses some (very roughly) physics-based assumptions to promote a sparse representation.
https://blog.cochlea.xyz/sparse-interpretable-audio-codec-pa...
- chaosprint ・ a day ago
  
  itneresting. I'm approaching music generation from another perspective:
  https://github.com/chaosprint/RaveForce
  RaveForce - An OpenAI Gym style toolkit for music generation experiments.
tasuki ・ 2 days ago

Ah, we should be friends!
I'm not sure what else to add, except that these are exactly the thoughts I think, and it used to feel lonely ;)

janalsncm ・ 3 days ago

I am a huge fan of this type of incremental generative approach. Language isn’t precise enough to describe a final product, so generating intermediate steps is very powerful.

I’d also like to see this in music generation. Tools like Suno are cool but I would much rather have something that generates MIDIs and instrument configurations instead.

Maybe this is a good lesson for generative tools. It’s possible to generate something that’s a good starting point. But what people actually want is long tail, so including the capability of precision modification is the difference between a canned demo and a powerful tool.

> Code coming soon

The examples are quite nice but I have no idea how reproducible they are.

kadushka ・ 2 days ago

I’d also like to see this in music generation. Tools like Suno are cool but I would much rather have something that generates MIDIs and instrument configurations instead.
Sounds like you're looking for something like https://www.aiva.ai
- janalsncm ・ 2 days ago
  
  Honestly that site feels like they have a database of midis tagged by genre and pick them out randomly. It’s totally different from their demo song.
  I guess I’m hoping for something better. It’s also closed source, the web ui doesn’t have editing functionality, and the output is pretty disjointed. Maybe if I messed around with it enough the result would be decent.
  
  kadushka ・ 2 days ago
  
  Fair enough. Still, for what you’ve described, Aiva is the best tool available.
bufferoverflow ・ 2 days ago

MIDI isn't enough. I want MIDI + filters, plus separate voice and custom sounds tracks.
chaosprint ・ a day ago

good point.
few days ago I was thinking about restarting this project with Glicol
https://github.com/chaosprint/RaveForce
RaveForce - An OpenAI Gym style toolkit for music generation experiments.
Love suno but eventually I need midi or xml or some lossless samples to work with
gexaha ・ 2 days ago

microtonal midi would be super awesome

scosman ・ 2 days ago

I’ve been impressed with even applying sonnet to SVGs for animations. This looks like it could be a lot more powerful.

Fun example: https://gist.github.com/scosman/701275e737331aaab6a2acf74a52...

astrodude ・ 2 days ago

oh, wow. this actually works. I didn't know :) thanks!

intalentive ・ 2 days ago

I’ve always thought that generation of intermediate representations was the way to go. Instead of generating concrete syntax, generate AST. Instead of generating PNG, generate SVG. Instead of generating a succession of images for animation, generate wire frame or rigging plus script.

Once you have your IR, modify and render. Once you have your render, apply a final coat of AI pixie dust.

Maybe generative models will get so powerful that fine-grained control can be achieved through natural language. But until then, this method would have the advantages of controllability, interoperability with existing tools (like Intellisense, image editors), and probably smaller, cheaper models that don’t have to accommodate high dimensional pixel space.

andy_ppp ・ 2 days ago

I’m looking forward to seeing what this makes of Simon Willison’s LLM SVG generation test prompt: “Generate an SVG of a pelican riding a bicycle”.

It’s quite amazing the progress we are seeing in AI and it will keep getting better which is somewhat terrifying.

nojvek ・ 2 days ago

I asked both Claude and ChatGPT o3 to "generate svg of mainland USA with black outline".
Tried various models and they got it hopelessly wrong. Claude does an okay job at "Generate an SVG of a pelican riding a bicycle"
- IanCal ・ a day ago
  
  How's this? https://imgur.com/a/aWQ0J49
  I might be missing something but at a first pass it looks good. Not from the US though so something may be more obviously wrong to you.
- undefined ・ 2 days ago
  
  [deleted]

goeiedaggoeie ・ 2 days ago

This is very nice.

I has to convert a bitmask to svg and was wishing to skip the intermediatary step so looked around for papers about segmentation models outputting svg and found this one https://arxiv.org/abs/2311.05276

zellyn ・ 3 days ago

The sketch generation is wild… and apparently comes for free.

airstrike ・ 3 days ago

This opens up lots of opportunities for document authoring tools. Really cool stuff, can't wait to try out the code once it's available.

lewisjoe ・ 2 days ago

Curious how this can augment document authoring! Can you toss some ideas?
- airstrike ・ 2 days ago
  
  I just think about how often professionals need placeholder images or doodles in their documents, but cliparts are generally terrible and actually making a nice looking drawing for those purposes is out of scope for business users and immensely time consuming... so this fills a nice gap.
  I'm obviously biased as a former "business user" writing a document authoring software!

chestervonwinch ・ 2 days ago

I wonder if you can use an existing svg as a starting point. I would love to use the sketch approach and generate frame-by-frame animations to plot with my pen plotter.

jonathaneunice ・ 2 days ago

Nice! Looking forward to similar textual generation of diagrams. (The Pic/Pikchr for the LLM age.)

_1 ・ 2 days ago

I've had some success with converting SQL to Mermaid Markdown diagrams.
da_rob ・ 2 days ago

It’s not PIC and not really suitable for complex diagrams, yet, but you can use Vizzlo’s Chart Vizzard to create a subset of the supported chart types (let’s say a Gantt) and then continue editing it using the chart editor: https://vizzlo.com/ai

Jean-Papoulos ・ 2 days ago

This is the kind of image generation I've been waiting for. No more messing around in Inkscape (or at least, less of it) when I need a specific icon.

murtio ・ 3 days ago

This is really cool! I have been using Claude to animate SVG, and it has been great.

NiloCK ・ 2 days ago

I'd be interested to see examples and hear about process here, if you're willing to share.

CyberDildonics ・ 2 days ago

If you can generate an image you can flatten it and if you can flatten it you can cluster it, and if you can cluster the flat sections you can draw vectors around them.

strangecasts ・ a day ago

This posterization-vectorization approach is what the Flash "Trace Bitmap" tool implemented (I'm not sure if Animate still has it?), but if your image isn't originally clipart/vector art, it gives the resulting vector art a very early 2000s look...

theckel ・ a day ago

Does anyone know how this compares to: https://github.com/ximinng/SVGDreamer ?

TeMPOraL ・ 2 days ago

Available in ComfyUI when? :).

Seriously though, this is amazing, I'm glad to see this tackled directly.

Also, I just learned from this thread that Claude is apparently usable for generating SVGs (unlike e.g. GPT-4 when I tested for it some months ago), so I'll play with that while waiting for NeuralSVG to become available.

niemandhier ・ 2 days ago

It looks as if this is not autoregressive.

It would be interesting to see a similar approach that incrementally works from simpler ( fewer curves ) to more complex representations.

That way one could probably apply RLHF along the trajectory too.

toisanji ・ 2 days ago

This is a group applying vector generation to animations: https://www.youtube.com/@studyturtlehq The graphic fidelity has been slowly improving over time.

gcr ・ 2 days ago

can you say more? all of these videos have less than 5 views and i can't find any explanation of their process

thomasfl ・ 2 days ago

Finally something that can benefit artists as a sketching tool.

shahzaibmushtaq ・ 2 days ago

I am really impressed with how it generates rough sketches because everything in the design world begins that way.

cyp0633 ・ 2 days ago

Claude has been doing a good job generating SVGs compared to its rivals, happy to see new models bringing image generation even further

IncreasePosts ・ 2 days ago

Shouldn't the girl with the pearl earring have an earring?

mcraiha ・ 2 days ago

No, because it is not a pearl earring. https://www.theartnewspaper.com/2023/02/08/the-girl-with-a-g...
- IncreasePosts ・ 2 days ago
  
  Okay, but shouldn't she at least have a glass teardrop-shaped bauble?

piombisallow ・ 2 days ago

This is much more useful for actual design jobs.

nikolayasdf123 ・ 2 days ago

very nice. had this idea for awhile, but never had time to implement it.

glad someone actually did it! great work!

nbzso ・ 2 days ago

So designers, artist, musicians we are done, right? Who's next, I wonder?

kelseyfrog ・ 3 days ago

Why does the fourth example show a hamburger but is labeled as a dragon?

jsheard ・ 3 days ago

American cultural bias in the training data led it to infer that dragons would be turned into burgers if they were real.
airstrike ・ 3 days ago

Most likely just a clerical error, since the dragon is two examples to the left with the same caption.
dekhn ・ 2 days ago

because hamburgers aren't made from chopped ham.
undefined ・ 3 days ago

[deleted]

pizza ・ 3 days ago

Prompting Claude to make SVGs then dropping them into Inkscape and getting the last ~20% of it to match the picture in my head has been a phenomenal user experience for me. This, too, piques my curiosity..!

jgalt212 ・ 3 days ago

I poked around with NeoSVG a few months back. I was not happy with the results, the computation time, or the cost. That being said, I do hope they've made big progress lately because SVGs work real nice when you have an LLM and human working in tandem (as per the comment above).
https://neosvg.com/generations
undefined ・ 3 days ago

[deleted]
xvfLJfx9 ・ 3 days ago

Claide doesn't work at all for me when generating SVGs
- iambateman ・ 3 days ago
  
  It depends for me on if there is an existing SVG that exists in its training set.
  “Make an SVG of a clock icon” is likely to work. “Make an SVG of a playground swingset with the sun setting” is not.
  
  varunneal ・ 3 days ago
  
  Your prompt verbatim turned out quite well, single-shot.
  https://claude.site/artifacts/0f696bf8-399d-42c3-93c0-296493...
undefined ・ 3 days ago

[deleted]

fosterbuster ・ 2 days ago

Its a wasted opportunity not using SVG to show the examples.

undefined ・ 2 days ago

[deleted]

1970-01-01 ・ 2 days ago

Aside: I've been having a very hard time prompting ChatGPT to spit out ASCII art. It really seems to not be able to do it.

     Here is an ASCII art representation of a hopping rabbit:

     ```
      (\(\  
      ( -.-)  
      o_(")(")
     ```

     This is a simple representation of a rabbit with its ears up and in a hopping stance. Let me know if you'd like me to adjust it!

jsheard ・ 2 days ago

It seems to have just pulled an ASCII rabbit from the training data verbatim
https://old.reddit.com/r/identifythisfont/comments/ytd25m/wh...
Kiro ・ 2 days ago

Pretty good if you ask me. What would a proper hopping rabbit ASCII art look like?
- jansan ・ 2 days ago
  
  Not sure, but that is a sitting rabbit.
undefined ・ 2 days ago

[deleted]

lbj ・ 3 days ago

"Code coming soon" - I hope someone reposts this when there's more to dig into

undefined ・ 3 days ago

[deleted]

dheera ・ 3 days ago

[flagged]

undefined ・ 3 days ago

[deleted]

knoxg ・ 2 days ago

[flagged]

undefined ・ 2 days ago

[deleted]