Show HN: Gambit, an open-source agent harness for building reliable AI agents

Hey HN!

Wanted to show our open source agent harness called Gambit.

If you’re not familiar, agent harnesses are sort of like an operating system for an agent... they handle tool calling, planning, context window management, and don’t require as much developer orchestration.

Normally you might see an agent orchestration framework pipeline like:

compute -> compute -> compute -> LLM -> compute -> compute -> LLM

we invert this so with an agent harness, it’s more like:

LLM -> LLM -> LLM -> compute -> LLM -> LLM -> compute -> LLM

Essentially you describe each agent in either a self contained markdown file, or as a typescript program. Your root agent can bring in other agents as needed, and we create a typesafe way for you to define the interfaces between those agents. We call these decks.

Agents can call agents, and each agent can be designed with whatever model params make sense for your task.

Additionally, each step of the chain gets automatic evals, we call graders. A grader is another deck type… but it’s designed to evaluate and score conversations (or individual conversation turns).

We also have test agents you can define on a deck-by-deck basis, that are designed to mimic scenarios your agent would face and generate synthetic data for either humans or graders to grade.

Prior to Gambit, we had built an LLM based video editor, and we weren’t happy with the results, which is what brought us down this path of improving inference time LLM quality.

We know it’s missing some obvious parts, but we wanted to get this out there to see how it could help people or start conversations. We’re really happy with how it’s working with some of our early design partners, and we think it’s a way to implement a lot of interesting applications:

- Truly open source agents and assistants, where logic, code, and prompts can be easily shared with the community.

- Rubric based grading to guarantee you (for instance) don’t leak PII accidentally

- Spin up a usable bot in minutes and have Codex or Claude Code use our command line runner / graders to build a first version that is pretty good w/ very little human intervention.

We’ll be around if ya’ll have any questions or thoughts. Thanks for checking us out!

Walkthrough video: https://youtu.be/J_hQ2L_yy60

github.com

63 points

randall

6 hours ago


14 comments

salesplay an hour ago

This is an interesting direction for agent frameworks. What stood out to me is the shift from simple tool orchestration to agents that can reason, call other agents, and self-manage workflows. That’s something we’ve been thinking about a lot while building SalesPlay — especially around how autonomous sales agents need clear evaluation, guardrails, and accountability to actually be useful in real GTM teams. The built-in grading/evaluation angle here feels like a practical step toward making agents less brittle and more production-ready. Curious to see how this evolves in real-world use cases.

Trufa 4 hours ago

Is this an alternative to https://mastra.ai/docs

How would it compare?

  • randall 4 hours ago

    So I look at something like Mastra (or LangChain) as agent orchestration, where you do computing tasks to line up things for an LLM to execute against.

    I look at Gambit as more of an "agent harness", meaning you're building agents that can decide what to do more than you're orchestrating pipelines.

    Basically, if we're successful, you should be able to chain agents together to accomplish things extremely simply (using markdown). Mastra, as far as I'm aware, is focused on helping people use programming languages (typescript) to build pipelines and workflows.

    So yes it's an alternative, but more like an alternative approach rather than a direct competitor if that makes sense.

tomhow 4 hours ago

[under-the-rug stub]

[see https://news.ycombinator.com/item?id=45988611 for explanation]

  • franciscomello 6 hours ago

    This looks quite interesting in terms of the architecture. Seems like a fresh take on stuff like Langchain, which at least last time I checked sucks.

  • alberson 5 hours ago

    I’m excited to give this a spin at Agentive! Really interesting approach.

  • sofdao 5 hours ago

    this is awesome

    are things like file system baked in?

    fan of the design of the system. looks great architecturally

    • randall 5 hours ago

      omg thank you so much. We're working on the file system stuff, that's an easier lift for us than the initial work, so we wanted to start with the big stuff and work backward. Claude Code and Codex are obviously really great at that stuff, and we'd like to be able to support a lot of that out of the box.

  • pych 5 hours ago

    wow this looks cool - been meaning to dig into harness stuff this looks like a good starting point

    • randall 5 hours ago

      Thx! Happy to help if you need it. :)

  • randall 4 hours ago

    thx, i appreciate it, believe it or not. :)