"To test this system, we pointed it at an ambitious goal: building a web browser from scratch."
I shared my LLM predictions last week, and one of them was that by 2029 "Someone will build a new browser using mainly AI-assisted coding and it won’t even be a surprise" https://simonwillison.net/2026/Jan/8/llm-predictions-for-202... and https://www.youtube.com/watch?v=lVDhQMiAbR8&t=3913s
This project from Cursor is the second attempt I've seen at this now! The other is this one: https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_chr...
Time to raise the bar. By 2029 someone will build a new browser using mainly AI-assisted coding and the surprise is that it was designed to be used by pelicans.
> Time to raise the bar
Lets make someone pass the one we have, this experiment didn't seem to yield a functioning browser, why would we raise the bar?
> why would we raise the bar?
The web needs to be more p5n friendly.
Surely a smart implementation would just find the chromium source on github, do some cosmetic rewrites and strip out all none-essential features?
You'd be able to see it doing that by looking at the transcript. You could then tell it not to!
> The other is this one: https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_chr...
I took a 5-minute look at the layout crate here and... it doesn't look great:
1. Line height calculation is suspicious, the structure of the implementation also suggests inline spans aren't handled remotely correctly
2. Uhm... where is the bidi? Directionality has far reaching implications on an inline layout engine's design. This is not it.
3. It doesn't even consider itself a real engine:
I won't even begin talking about how this particular aspect that it "approximates" also has far reaching implications on your design...// Estimate text width (rough approximation: 0.6 * font_size * char_count) // In a real implementation, this would use font metrics let char_count = text.chars().count() as f32; let avg_char_width = font_size * 0.5; // Approximate average character width let text_width = char_count * avg_char_width;I could probably go on in perpetuity about the things wrong with this, even test it myself or something. But that's a waste of time I'm not undertaking.
Making a "browser" that renders a few particular web pages "correctly" is an order of magnitude easier than a browser that also actually cares about standards.
If this is how "A Browser for the modern age." looks then I want a time machine.
I saw a "web browser" that was AI generated in maybe 2k lines of python based on tkinter that tried to support CSS and probably was able to render some test cases but didn't at all have the shape of a real web browser.
It reminds of having AI write me an MUI component the other day that implemented the "sx" prop [1] with some code that handles all the individual properties that were used by the component in that particular application, it might have been correct, the component at all was successful and well coded... but MUI provides a styled() function and a <Box> component, either one of which could have been used to make this component handle all the properties that "sx" is supposed to handle with as little as one line of code. I asked the agent "how would I do this using the tools that MUI provides to support sx" and had a great conversation and got a complete and clear understanding about the right way to do it but on the first try it wrote something crazy overcomplicated to handle the specific case as opposed to a general-purpose solution that was radically simple. That "web browser" was all like that.
[1] you can write something like sx={width: 4} and MUI multiplies 4 by the application scale and puts on, say, a width: 20px style
Sure, but getting this far would be inconceivable just half a year ago. It will only get better as time passes
Its impressive, but how sure are we that the code for the current open source browsers isn't part of the model's training data?
It turns out the Cursor one is stitching together a ton of open source components already.
That said, I don't really find the critique that models have browser source code in their training data particularly interesting.
If they spat out a full, working implementation in response to a single prompt then sure, I'd be suspicious they were just regurgitating their training data.
But if you watch the transcripts for these kinds of projects you'll see them make thousands of independent changes, reacting to test failures and iterating towards an implementation that matches the overall goals of the project.
The fact that Firefox and Chrome and WebKit are likely buried in the training data somewhere might help them a bit, but it still looks to me more like an independent implementation that's influenced by those and many other sources.
The goal I am currently using for long horizon coding experiments is implementation of a PDF rasterizer given an ISO32000 specification document.
We're almost there, I've been working on something similar using a markdown'd version of the ISO32000 spec
On Jan 1 2026
> Given how badly my 2025 predictions aged I'm probably going to sit that one out! [1]
Seven days later you appear on the same podcast you appeared on in 2025 to share your LLM predictions for 2026.
What changed?
He changed his mind? The comment you're citing seems partly tongue-in-cheek anyway, but even if it wasn't, how is this some kind of gotcha?
Bryan got in touch and said "you're being too hard on yourself, those predictions were actually pretty good".
Great, they can call it "artificial Internet Explorer", or aIE for short.
Web browser should be easy as source exists. Fix all SVG bugs in my browser tho...
There are 3.5 serious open codebases of web browsers currently. Only two are full featured. It's not nothing, but it's very far from "source exists so it's easy to copy what they do".
But detailed specs exists for both HTML and JS and tests also exists and unlimited amount of test data. You can just try running webpage or program and also have reference implementations - it's much easier for agents to understand that. Also HTML they know super well from scraping whole internet but still impressive.
2029? I have no idea why you would think this is so far off. More like Q2 2026.
You're either overestimating the capabilities of current AI models or underestimating the complexity of building a web browser. There are tons of tiny edge cases and standards to comply with where implementing one standard will break 3 others if not done carefully. AI can't do that right now.
Even if AI will not achieve the ability to perform at this level on its own, it clearly is going to be an enormous force multiplier, allowing highly skilled devs to tackle huge projects more or less on their own.
Skilled devs compress, not generate (expand).
https://www.youtube.com/watch?v=8kUQWuK1L4w
The "discoverer" of APL tried to express as many problems as he could with his notation. First he found that notation expands and after some more expansion he found that it began shrinking.
The same goes to Forth, which provides means for a Sequitur-compressed [1] representation of a program.
[1] https://en.wikipedia.org/wiki/Sequitur_algorithm
Myself, I always strive to delete some code or replace some code with shorter version. First, to better understand it, second, to return back and read less.
Not only edge cases and standards, but also tons of performance optimizations.
It's most likely both.
> There are tons of tiny edge cases and standards to comply with where implementing one standard will break 3 others if not done carefully. AI can't do that right now.
Firstly the CI is completely broken on every commit, all tests have failed and its and looking closely at the code, it is exactly what you expect for unmaintainable slop.
Having more lines of code is not a good measure of robust software, especially if it does not work.
Web browsers are insanely hard to get right, that’s why there are only ~3 decent implementations out there currently.
The one nice thing about web browsers is that they have a reasonably formalized specification set and a huge array of tests that can be used. So this makes them a fairly unique proposition ideally suited to AI construction.
As far as I read on Ladybird's blog updates, the issue is less the formalised specs, and more that other browsers break the specs, so websites adjust, so you need to take the non-compliance to specs into account with your design
You should make your own predictions, and then we can do a retrospective on who was right.
Yeah if you let them index chromium I'm sure it could do it next week. It just won't be original or interesting.
That makes a lot of sense for massive-scale efforts like a browser, using coordinated agents to push toward a huge, well defined target with existing benchmarks and tests.
My angle has been a bit different: scaling autonomous coding for individual developers, and in a much simpler way. I love CLI agents, but I found myself wasting time babysitting terminals while waiting for turns to finish. At some point it clicked: what if I could just email them?
Email sounds backward, but that’s the feature. It’s universal, async, already collaborative. The agent sends me a focused update, I reply with guidance, and it keeps working on a server somewhere, or my laptop, while I’m not glued to my desk. There’s still a human in the loop, just without micromanagement.
It’s been surprisingly joyful and productive, and it feels closer to how real organizations already work. I’ve put together a small, usable tool around this and shared it here if anyone wants to try it or kick the tires: https://news.ycombinator.com/item?id=46629191