hacker news

We’ve been benchmarking a few models on our API platform and got some interesting performance numbers: - MiniMax M2.5 → 0.118s time-to-first-token, 103 tokens/sec - GLM 5.1 → 120 tokens/sec throughput - Kimi K2.5 → 0.643s TTFT, 69 tokens/sec - All models → ~99.9% request success rate The latency difference is especially noticeable, ~0.1s TTFT feels almost instant in interactive apps. Let me know how you're evaluating LLM APIs. Are you optimizing more for latency, throughput, or cost?

Model API Performance