It sounds like you don’t need immediate llm responses and can batch process your data nightly? Have you considered running a local llm? May not need to pay for api calls. Today’s local models are quite good. I started off with cpu and even that was fine for my pipelines.
Though haven't done any extensive testing then I personally could easily get by with current local models. The only reason I don't is that the hosted ones all have free tiers.
Agreed, I'm pretty amazed at what I'm able to do locally just with an AMD 6700XT and 32GB of RAM. It's slow, but if you've got all night...
I haven't thought about that, but really want to dig in more now. Any places you recommend starting?
I started off using gpt-oss-120b on cpu. It uses about 60-65gb of memory or so but my workstation has 128gb of ram. If I had less ram, I would start off with the gpt-oss-20b model and go from there. Look for MoE models as they are more efficient to run.
My old threadripper pro was seeing about 15tps, which was quite acceptable for the background tasks I was running.
Can you suggest any good llms for cpu?
I started off using gpt-oss-120b on cpu. It uses about 60-65gb of memory or so but my workstation has 128gb of ram. If I had less ram, I would start off with the gpt-oss-20b model and go from there. Look for MoE models as they are more efficient to run.
Following.
I started off using gpt-oss-120b on cpu. It uses about 60-65gb of memory or so but my workstation has 128gb of ram. If I had less ram, I would start off with the gpt-oss-20b model and go from there. Look for MoE models as they are more efficient to run.