This is really well thought out. The git-like versioning approach for memory artifacts is something I’ve been advocating for after spending way too much time debugging agent state issues.
I’ve been working on AI memory backends and context management myself and the core insight here — that context needs to be versionable and inspectable, not just a growing blob — is spot on.
Tried UltraContext in my project TruthKeeper and it clicked immediately. Being able to trace back why an agent “remembered” something wrong is a game changer for production debugging.
One thing I’d love to see: any thoughts on compression strategies for long-running agents? I’ve been experimenting with semantic compression to keep context windows manageable without losing critical information. Great work, will be following this closely.
For compression and long-running agents, may I suggest https://memtree.dev. We offer a simple API that compresses messages asynchronously for instant responses and small context leading to much higher quality generations. We're about to release a dashboard that will show you what each compressed request looked like, the token distribution between system, memory, and tool messages, along with memory retrievals, etc... Is this the type of thing that you're looking for?
Something like this needs to be open-sourced. You're going to have a hell of a time trying to get enough trust from people to run all of their prompts through your servers.
For now, I’m intentionally keeping UltraContext as unopinionated as possible, since every application has different constraints and failure modes.
The goal is to focus on the core building blocks that enable more sophisticated use cases. Compression, compaction, and offloading strategies should be straightforward to build on top of UC rather than baked in at the core layer.