I'm confused by all the takes implying decode is more important than prefill.
There are an enormous number of use cases where the prompt is large and the expected output is small.
E.g. providing data for the LLM to analyze, after which it gives a simple yes/no Boolean response. Or selecting a single enum value from a set.
This pattern seems far more valuable in practice, than the common and lazy open ended chat style implementations (lazy from a product perspective).
Obviously decode will be important for code generation or search, but that's such a small set of possible applications, and you'll probably always do better being on the latest models in the cloud.