hacker news

remolacha ・ 18 hours ago

OP here.

We recently open-sourced a small tool we built internally to help answer a question we couldn't find a good solution for: How do you evaluate AI coding agents on a real production codebase?

Like most teams, we had lots of opinions about which agents and models "felt" best, but no hard data. The missing piece wasn’t analysis; it was attribution. We needed to know which lines of code were written by which agent/model, without changing how engineers work.

The key insight was that Git already gives us most of what we need.

By reverse-engineering how tools like Cursor and Claude Code modify files, we attach attribution metadata directly to Git whenever an AI agent edits code. Engineers don’t have to opt in or change their workflows.

Once that data exists, we can run fairly simple queries to answer questions like:

- merged lines per dollar by agent/model

- bug rates correlated with AI-generated code

- how different developers actually use AI in practice

An unexpected side effect was code review: once we surfaced AI attribution in pull requests, reviews got faster because reviewers could focus on AI-generated code in sensitive areas.

We've open-sourced the data capture layer and code review extension so other teams can experiment with this approach. For us, the most valuable part wasn't which agent "won," but finally having a way to measure it at all.

Happy to answer questions or hear critiques.

reena_signalhq ・ 18 hours ago

This is a smart approach to the AI attribution problem. As more code is AI-generated, tracking provenance becomes critical for debugging and compliance.

One question: How do you handle cases where AI-generated code is heavily edited by humans? Does it still count as "AI-generated" or does it become "human-written"?

Also curious about performance impact - does adding this metadata to every commit slow down large repos?

OliverGilan ・ 17 hours ago

Maintainer here.
In the case where a human edits a previously AI-generated line we simply track that as a human-produced line of code. We are thinking of making this a distinct subcategory because there's a lot of useful info in that data.
Regarding performance we haven't done rigorous benchmarks but we've been using this internally and haven't noticed any problems

Using Git to attribute AI-generated code