This is, as far as I understand, self healing ONLY if the name of a CSS class changes. Not for anything else. That seems like a very very very very narrow definition of "self healing": there are 9999 other subtle or not so subtle things that can change per session or per update version of a page.
If you run this against let's say a typical e-commerce page where the navigation and all screen elements are super dynamic — user specific data, language etc. — this problems becomes even harder.
My running hypothesis on this is that AI is a sentient screenreader and the last thing you should be worrying about is CSS class names, IDs, data-testid attributes, DOM traversal, and all of these things that are essentially querying the 'internal state' of a page. Classes, IDs, data attributes, etc. aren't a public API and semantic elements, ARIA attributes, etc. are.
So, focus on WCAG compliance, following the spec as faithfully as you can. The style or presentation of something may change as part of a simple A/B test but the underlying purpose or behaviour would remain the same.
This might be the last of the weed talking but that's rich. I'm gonna have to chew on that.
And maybe even crack the WCAG docs. Wait... It's a trap...
I feel like this could work if the selectors are chosen carefully to capture semantic meaning, rather than basing off of something arbitrary like a class name. The agent must have some understanding of the document to be able to perform those actions in the first place.
If it can find an ellipse tool, it's likely based off some combination of accessible role, accessible name, and inner text (perhaps the icon if it's multi-modal.) So in theory, couldn't it capture that criteria in a JS snippet and replay it?
That's exactly what is it doing. The workflows are pretty much js snippets in itself you can see in the "code" tab (in the plugin when you select a saved workflow).
Everyone thinks of typical e-commerce pages when its comes "browser agent doing something", but our real use cases are far from shopping for the user. But your point still stands valid. The idea is that maybe there are websites where generating stable selectors/hierarchy maps wouldn't solve, but 80% (from 80-20) of websites are not like that (including a lot of internal dashboard/interfaces) (there will also be issues for websites with proper i18n implementations if the selectors are aria label based)
Self healing css selectors is also only 1 part of the story. The other part is the cohesive interface for the agent itself to use these selectors.
> The other part is the cohesive interface for the agent itself to use these selectors
We are incubating this over at the WebMCP web standard proposal. You can see the current draft of explainer for the declarative API. https://github.com/webmachinelearning/webmcp/pull/26
Also, great work on the browser agent, this is the best of the DOM parsing/screenshot agents I've used. I was really impressed with the wordle example