This is excellent!
I think the utility of generating vectors is far, far greater than all the raster generation that's been a big focus thus far (DALL-E, Midjourney, etc). Those efforts have been incredibly impressive, of course, but raster outputs are so much more difficult to work with. You're forced to "upscale" or "inpaint" the rasters using subsequent generative AI calls to actually iterate towards something useful.
By contrast, generated vectors are inherently scalable and easy to edit. These outputs in particular seem to be low-complexity, with each shape composed of as few points as possible. This is a boon for "human-in-the-loop" editing experiences.
When it comes to generative visuals, creating simplified representations is much harder (and, IMO, more valuable) than creating highly intricate, messy representations.
Have you looked at https://www.recraft.ai/ recently? The image quality of their vector outputs seems to have gotten quite good, although you obviously still wouldn't want to try to generate densely textured or photographic-like images like Midjourney excels at. (For https://gwern.net/dropcap last year or before, we had to settle for Midjourney and create a somewhat convoluted workflow through Recraft; but if I were making dropcaps now, I think the latest Recraft model would probably suffice.)
Link to their vector page, since the main page makes them look like yet another AI image generator:
https://www.recraft.ai/ai-image-vectorizer
The quality does look quite amazing at first glance. How are the vectors to work with? Can you just open them in illustrator and start editing?
No, I actually was referring to their native vector AI image generator, not their vectorizer - although the vectorizer was better than any other we found, and that's why we were using it to convert the Midjourney PNG dropcaps into SVGs
(The editing quality of the vectorized ones were not great, but it is hard to see how they could be good given their raster-style appearance. I can't speak to the editing quality of the native-generated ones, either in the old obsolete Recraft models or the newer ones, because the old ones were too ugly to want to use, and I haven't done much with the new one yet.)
I was under the impression that their AI Vector generator generates a PNG and vectorizes under the hood.
Hm... I was definitely under the impression that it is generating SVGs natively, and that was consistent with its output and its recent upgrades like good text rendering, and I'm fairly sure I've said as much to the CEO and not been corrected... But I don't offhand recollect a specific reference where they say unambiguously that it's a SVG generator rather than vectorizer(raster), so maybe I'm wrong about that.
For me its based on that vector generation is much harder than raster, recraft has raised just over $10M (not that much in this space), and their api has no direct vector generation.
There is also the possibility for using these images as guidance for rasterization models. Generate easily manipulatable and composible images as a first stage then add detail once the image composition is satisfactory.
Trivially possible with controlnets!
My little project for the highly intricate, messy representation ;) https://github.com/KodeMunkie/shapesnap (it stands on the backs of giants, original was not mine). It's also available on npm.
I always imagine how useful Sora.ai could be if it would generate 3D models to render their animations from instead
I agree, that's the future of these video models. For professional use you want more control and the obvious next step towards that is to generate the full 3D scene (in the form of animated gaussian splats since that's more AI friendly than the mesh based 3D). That also helps the model to be more consistent but also adds the ability for the user to have more control over the camera or the scene.
I couldn't agree more. I feel that the block-coding and rasterized approaches that are ubiquitous in audio codecs (even the modern "neural" ones) are a dead-end for the fine-grained control that musicians will want. They're just fine for text-to-music interfaces of course.
I'm working on a sparse audio codec that's mostly focused on "natural" sounds at the moment, and uses some (very roughly) physics-based assumptions to promote a sparse representation.
https://blog.cochlea.xyz/sparse-interpretable-audio-codec-pa...
itneresting. I'm approaching music generation from another perspective:
https://github.com/chaosprint/RaveForce
RaveForce - An OpenAI Gym style toolkit for music generation experiments.
Ah, we should be friends!
I'm not sure what else to add, except that these are exactly the thoughts I think, and it used to feel lonely ;)