Interesting, I’ve never needed 1M, or even 250k+ context. I’m usually under 100k per request.
About 80% of my code is AI-generated, with a controlled workflow using dev-chat.md and spec.md. I use Flash for code maps and auto-context, and GPT-4.5 or Opus for coding, all via API with a custom tool.
Gemini Pro and Flash have had 1M context for a long time, but even though I use Flash 3 a lot, and it’s awesome, I’ve never needed more than 200k.
For production coding, I use
- a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results)
- Then, auto context, but based on code lensing. Meaning auto context takes some globs that narrow the visibility of what the AI can see, and it uses the code map intersection to ask the AI for the proper files to put in context. (Typically Flash, cheap, relatively fast, and very good)
- Then, use a bigger model, GPT 5.4 or Opus 4.6, to do the work. At this point, context is typically between 30k and 80k max.
What I’ve found is that this process is surprisingly effective at getting a high-quality response in one shot. It keeps everything focused on what’s needed for the job.
Higher precision on the input typically leads to higher precision on the output. That’s still true with AI.
For context, 75% of my code is Rust, and the other 25% is TS/CSS for web UI.
Anyway, it’s always interesting to learn about different approaches. I’d love to understand the use case where 1M context is really useful.
Yeah this is the simpler and also effective strategy. A lot of people are building sophisticated AST RAG models. But you really just need to ask Claude to generally build a semantic index for each large-ish piece of code and re-use it when getting context.
You have to make sure the semantic summary takes up significantly less tokens than just reading the code or its just a waste of token/time.
Then have a skill that uses git version logs to perform lazy summary cache when needed.
- [deleted]
Well, out of all the workflows I have seen, this one is rather nice, might give it a try.
I imagine if the context were being commited and kept up-to-date with CI would work for others to use as well.
However, I'm a little confused on the autocontext/globs narrowing part. Do you, the developer, provide them? Or you feed the full code map to flash + your prompt so it returns the globs based on your prompt?
Also, in general, is your map of a file relatively smaller than the file itself, even for very small files?
Yeah we all converge to the same workflow, in my ai coding agent I'm working on now, I've added an "index" tool that uses tree-sitter to compress and show the AI a skeleton of a code file.
Here's the implementation for the interested: https://github.com/tontinton/maki/blob/main/maki-code-index%...
Oh, that's great.
I've always wanted to explore how to fit tree-sitter into this workflow. It's great to know that this works well too.
Thanks for sharing the code.
(Here is the AIPack runtime I built, MIT: https://github.com/aipack-ai/aipack), and here is the code for pro@coder (https://github.com/aipack-ai/packs-pro/tree/main/pro/coder) (AIPack is in Rust, and AI Packs are in md / lua)
It seems like a very good use of LLMs. You should write a blog post with detail of your process with examples for people who are not into all AI tools as much. I only use Web UI. Lots of what you are saying is beyond me, but it does sound like clever strategy.
I think you've kind of hit on the more successful point here, which is that you should be keeping things focused in a sufficiently focused area to have better success and not necessarily needing more context.
This is really interesting; ive done very high level code maps but the entire project seems wild, it works?
So, small model figures out which files to use based on the code map, and then enriches with snippets, so big model ideally gets preloaded with relevant context / snippets up front?
Where does code map live? Is it one big file?
So, I have a pro@coder/.cache/code-map/context-code-map.json.
I also have a `.tmpl-code-map.jsonl` in the same folder so all of my tasks can add to it, and then it gets merged into context-code-map.json.
I keep mtime, but I also compute a blake3 hash, so if mtime does not match, but it is just a "git restore," I do not redo the code map for that file. So it is very incremental.
Then the trick is, when sending the code map to AI, I serialize it in a nice, simple markdown format.
- path/to/file.rs - summary: ... - when to use: ... - public types: .., .., .. - public functions: .., .., ..
- ...
So the AI does not have to interpret JSON, just clean, structured markdown.
Funny, I worked on this addition to my tool for a week, planning everything, but even today, I am surprised by how well it works.
I have zero sed/grep in my workflow. Just this.
My prompt is pro@coder/coder-prompt.md, the first part is YAML for the globs, and the second part is my prompt.
There is a TUI, but all input and output are files, and the TUI is just there to run it and see the status.
Your code map compresses signal on the context side. Same principle applies on the prompt side: prompts that front-load specifics (file, error, expected behavior) resolve in 1-2 turns. Vague ones spiral into 5-6. 1M context doesn't change that — it just gives you more room for the spiral.
whenever I see post like this
i said well yeah, but its too sophiscated to be practical
Fair point, but because I spent a year building and refining my custom tool, this is now the reality for all of my AI requests.
I prompt, press run, and then I get this flow: dev setup (dev-chat or plan) code-map (incremental 0s 2m for initial) auto-context (~20s to 40s) final AI query (~30s to 2m)
For example, just now, in my Rust code (about 60k LOC), I wanted to change the data model and brainstorm with the AI to find the right design, and here is the auto-context it gave me:
- Reducing 381 context files ( 1.62 MB)
- Now 5 context files ( 27.90 KB)
- Reducing 11 knowledge files ( 30.16 KB)
- Now 3 knowledge files ( 5.62 KB)
The knowledge files are my "rust10x" best practices, and the context files are the source files.
(edited to fix formatting)
It's not sophisticated at all, he just uses a model to make some documentation before asking another model to work using the documentation
very interested in this approach and many other people are for sure. Please do a blog post.
1M context is super useful with Gemini, not so much for coding, but for data analysis.
Even there, I use AI to augment rows and build the code to put data into Json or Polars and create a quick UI to query the data.