Just spent the last week or so porting TheRock to stagex in an effort to get ROCm built with a native musl/mimalloc toolchain and get it deterministic for high security/privacy workloads that cannot trust binaries only built with a single compiler.
It has been a bit of a nightmare and had to package like 30+ deps and their heavily customized LLVM, but got the runtime to build this morning finally.
Things are looking bright for high security workloads on AMD hardware due to them working fully in the open however much of a mess it may be.
I also attempted to package ROCM on musl. Specifically, packaging it for Alpine Linux.
It truly is a nightmare to build the whole thing. I got past the custom LLVM fork and a dozen other packages, but eventually decided it had been too much of a time sink.
I’m using llama.cpp with its vulkan support and it’s good enough for my uses. Vulkan so already there and just works. It’s probably on your host too, since so many other things rely on it anyway.
That said, I’d be curious to look at your build recipes. Maybe it can help power through the last bits of the Alpine port.
Interesting how Vulkan and ROCM are roughly the same age (~9 years), but one is incredibly more stable (and sometimes even more performant) for AI use cases as side-gig, while the other one is having AI as its primary raison d'être. Tells you a lot about the development teams behind them.
[dead]
Keep an eye out for a stable rocm PR to stagex in the next week or so if all goes well.
I've built llama.cpp against both Vulkan and ROCm on a Strix Halo dev box. I agree Vulkan is good enough, at least for my hobbyist purposes. ROCm has improved but I would say not worth the administrative overhead.
I realize it does not address the OP security concerns, but I'm having success running rocm containers[0] on alpine linux specifically for llama.cpp. I also got vLLM to run in a rocm container, but I didn't have time to to diagnose perf problems, and llama.cpp is working well for my needs.
FWIW, Alpine now has native packages for llama.cpp (using Vulkan).
It is sad to observe this time and time again. Last year I had the idea to run a shareholder campaign to change this, I suspended it after last years AMD promises - but maybe this really needs to be done: https://unlockgpu.com/action-plan/
https://github.com/ROCm/TheRock/issues/3477 makes me quite sad for a variety of reasons. It shouldn't be like this. This work should be usable.
Oh I fully abandoned TheRock in my stagex ROCm build stack. It is not worth salvaging, but it was an incredibly useful reference for me to rewrite it.
So much about this confuses me. What do Kitty and ncurses have to do with ROCm? Why is this being built with GCC instead of clang? Why even bother building it yourself when the tarballs are so good and easy to work with?
The analysis was AI generated. This was Claude brute-forcing itself through building a library.
On the last one: OP said they were trying to get it working for a musl toolchain, so the tarballs are probably not useful to them (I assume they're built for glibc).
Agreed on the others though. Why's it even installing ncurses, surely that's just expected to be on the system?
> Hey @rektide, @apaz-cli, we bundle all sysdeps to allow to ship self-contained packages that users can e.g. pip install. That's our basic default and it allows us to tightly control what we ship. For building, it should generally be possible to build without the bundled sysdeps in which case it is up to the user to make sure all dependencies are properly installed. As this is not our default we seemed to have missed some corner cases and there is more work needed to get back to allow builds with sysdeps disabled. I started #3538 but it will need more work in some other components to fully get you what you're asking with regards to system dependencies. Please not that we do not test with the unbundled, system provided dependencies but of course we want to give the community the freedom to build it that way.
I did get past that issue with nurses & kitty! Thanks for some work there!
There are however quite a large list of other issues that have been blocking builds on systems with somewhat more modern toolchains / OSes than whatever the target is here (Ubuntu 24.04 I suspect). I really want to be able to engage directly with TheRock & compile & run it natively on Ubuntu 25.04 and now Ubuntu 26.04 too. For people eager to use the amazing leading edge capabilities TheRock offers, I suspect they too will be more bleeding edge users, also with more up to date OS choices. They are currently very blocked.
I know it's not the intent at all. There's so much good work here that seems so close & so well considered, an epic work spanning so many libraries and drivers. But this mega thread of issues gives me such vibes of the bad awful no good Linux4Tegra, where it's really one bespoke special Linux that has to be used, that nothing else works. In this case you can download the tgz and it will probably work on your system, but that means you don't have any chance to improve or iterate or contribute to TheRock, that it's a consume only relationship, and that feels bad and is a dangerous spot to be in, not having usable source.
I'd really really like to see AMD have CI test matrixes that we can see, that shows the state of the build on a variety of Linux OSes. This would give the discipline and trust that situations like what we have here do not arise. This obviously cannot hold forever, Ubuntu 24.04 is not acceptable as a build machine for perpetuity, so these problems eventually have to be tackled, but it's really a commitment to avoiding making the build work on one blessed image only that needs to happen. This situation should not have developed; for TheRock to be accepted and useful, the build needs to work on a variety of systems. We need fixes right now to make that true, and AMD needs to be showing that their commitment to that goal is real, ideally by running and showing a build matrix CI where we can see it that it does compile.
Sorry if it wasn't clear - I was just copy-pasting from the github issue, in a comment further down.
- [deleted]
[flagged]
Wait ?
You don't trust Nvidia because the drivers are closed source ?
I think Nvidia's pledged to work on the open source drivers to bring them closer to the proprietary ones.
I'm hopping Intel can catch up , at 32GB of VRAM for around 1000$ it's very accessible
Nvidia is opening their source code because they moved most of their source code to the binary blob they're loading. That's why they never made an open source Nvidia driver for Pascal or earlier, where the hardware wasn't set up to use their giant binary blobs.
It's like running Windows in a VM and calling it an open source Windows system. The bootstrapping code is all open, but the code that's actually being executed is hidden away.
Intel has the same problem AMD has: everything is written for CUDA or other brand-specific APIs. Everything needs wrappers and workarounds to run before you can even start to compare performance.
Huawei’s CANN is fully open and supposedly a drop-in replacement for CUDA. The latter could make it a supеrior option to either AMD or Intel.
In the python eco system you can just replace CUDA with DirectML in at least one popular framework and it just runs. You are limited to windows then though.
Nvidia has been pledging that for years. If it ever actually happens, I am here for it.
It happened 2 years ago:
https://developer.nvidia.com/blog/nvidia-transitions-fully-t...
Their userspace is still closed. ROCm is fully open.
Provided you happen to have one of those few supported GPUs.
Thus being open source isn't of much help without it.
> Intel
For some workloads, the Arc Pro B70 actually does reasonably well when cached.
With some reasonable bring-up, it also seems to be more usable versus the 32gb R9700.
I have both of those cards. Llama.cpp with SYCL has thus far refused to work for me, and Vulkan is pretty slow. Hoping that some fixes come down the pipe for SYCL, because I have plenty of power for local models (on paper).
Hmm.
I had to rebuild llama.cpp from source with the SYCL and CPU specific backends.
Started with a barebones Ubuntu Server 24 LTS install, used the HWE kernel, pulled in the Intel dependencies for hardware support/oneapi/libze, then built llama.cpp with the Intel compiler (icx?) for the SYCL and NATIVE backends (CPU specific support).
In short, built it based mostly on the Intel instructions.
>Just spent the last week or so porting TheRock to stagex in an effort to get ROCm built with a native musl/mimalloc toolchain and get it deterministic for high security/privacy workloads that cannot trust binaries only built with a single compiler.
...I have a feeling you might not be at liberty to answer, but... Wat? The hell kind of "I must apparently resist Reflections on Trusting Trust" kind of workloads are you working on?
And what do you mean "binaries only built using a single compiler"? Like, how would that even work? Compile the .o's with compiler specific suffixes then do a tortured linker invo to mix different .o's into a combined library/ELF? Are we talking like mixing two different C compilers? Same compiler, two different bootstraps? Regular/cross-mix?
I'm sorry if I'm pushing for too much detail, but as someone whose actually bootstrapped compilers/user spaces from source, your usecase intrigues me just by the phrasing.
You can get a sense of what my team and I do from https://distrust.co/threatmodel.html
For information on stagex and how we do signed deterministic compiles across independently operated hardware see https://stagex.tools
Stagex is used by governments, fintech, blockchains, AI companies, and critical infrastructure all over the internet, so our threat model must assume at least one computer or maintainer is compromised at all times and not trust any third party compiled code in the entire supply chain.
Nice! I'd thought about doing something similar, but never went so far as to get where y'all are at! I got about to an LFS distro that I was in the process of picking apart GCC to see if I could get the thing verifiable. Can't say as I'm fond of the container first architecture, but I understand why you did it, and my old fartness aside, keep up the good work! Now I have another project to keep an eye on. And at least 4 other people other than me that take supply chain risk seriously! Yay!