Also see
# Smarter Local LLMs, Lower VRAM Costs – All Without Sacrificing Quality, Thanks to Google’s New [Quantization-Aware Training] "QAT" Optimization
https://www.hardware-corner.net/smarter-local-llm-lower-vram...
> According to Google, they’ve «reduced the perplexity drop by 54% (using llama.cpp perplexity evaluation) when quantizing down to Q4_0.»