If folks are interested, @antirez has opened a C implementation of Voxtral Mini 4B here: https://github.com/antirez/voxtral.c
I have my own fork here: https://github.com/HorizonXP/voxtral.c where I’m working on a CUDA implementation, plus some other niceties. It’s working quite well so far, but I haven’t got it to match Mistral AI’s API endpoint speed just yet.
hey,
how does someone get started with doing things like these (writing inference code/ cuda etc..). any guidance is appreciated. i understand one doesn't just directly write these things and this would require some kind of reading. would be great to receive some pointers.
These are good lectures and there is also a discord. https://github.com/gpu-mode/lectures
Same! Would love any resources. I'm interested more in making models run vs making the models themselves :)
There is also another Mistral implementation: https://github.com/EricLBuehler/mistral.rs Not sure what the difference is, but it seems to be just be overall better received.
mistral.rs is more like llama.cpp, it's a full inference library written in rust that supports a ton of models and many hardware architectures, not just mistral models.