> Claude often ignores CLAUDE.md
> The more information you have in the file that's not universally applicable to the tasks you have it working on, the more likely it is that Claude will ignore your instructions in the file
Claude.md files can get pretty long, and many times Claude Code just stops following a lot of the directions specified in the file
A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell Claude is not paying attention to the instructions on Claude.md, when Claude stops calling him “Mr Tinkleberry” consistently
That’s hilarious and a great way to test this.
What I’m surprised about is that OP didn’t mention having multiple CLAUDE.md files in each directory, specifically describing the current context / files in there. Eg if you have some database layer and want to document some critical things about that, put it in “src/persistence/CLAUDE.md” instead of the main one.
Claude pulls in those files automatically whenever it tries to read a file in that directory.
I find that to be a very effective technique to leverage CLAUDE.md files and be able to put a lot of content in them, but still keep them focused and avoid context bloat.
Ummm… sounds like that directory should have a readme. And Claude should read readme files.
READMEs are written for people, CLAUDE.mds are written for coding assistants. I don’t write “CRITICAL (PRIORITY 0):” in READMEs.
The benefit of CLAUDE.md files is that they’re pulled in automatically, eg if Claude wants to read “tests/foo_test.py” it will automatically pull in “tests/CLAUDE.md” (if it exists).
If AI is supposed to deliver on this magical no-lift ease of use task flexibility that everyone likes to talk about I think it should be able to work with a README instead of clogging up ALL of my directories with yet another fucking config file.
Also this isn’t portable to other potential AI tools. Do I need 3+ md files in every directory?
> Do I need 3+ md files in every directory?
Don’t worry, as of about 6 weeks ago when they changed the system prompt Claude will make sure every folder has way more than 3 .md files seen as it often writes 2 or more per task so if you don’t clean them up…
Strange. I haven’t experienced this a single time and I use it almost all day everyday.
That is strange because it's been going on since sonnet 4.5 release.
Is your logic that unless something is perfect it should not be used even though it is delivering massive productivity gains?
> it is delivering massive productivity gains
[citation needed]
Every article I can find about this is citing the valuation of the S&P500 as evidence of the productivity gains, and that feels very circular
It’s not delivering on magical stuff. Getting real productivity improvements out of this requires engineering and planning and it needs to be approached as such.
One of the big mistakes I think is that all these tools are over-promising on the “magic” part of it.
It’s not. You need to really learn how to use all these tools effectively. This is not done in days or weeks even, it takes months in the same way becoming proficient in eMacs or vim or a programming language is.
Once you’ve done that, though, it can absolutely enhance productivity. Not 10x, but definitely in the area of 2x. Especially for projects / domains you’re uncomfortable with.
And of course the most important thing is that you need to enjoy all this stuff as well, which I happen to do. I can totally understand the resistance as it’s a shitload of stuff you need to learn, and it may not even be relevant anymore next year.
My issue is not with learning. This "tool" has an incredibly shallow learning curve. My issue is that I'm having to make way for these "tools" that everyone says vastly increases productivity but seems to just churn out tech-debt as quickly as it can write it.
It a large leap to "requires engineering and planning" when no one even in this thread can seem to agree on the behavior of any of these "tools". Some comments tell anecdotes of not getting the agents to listen until the context of the whole world is laid out in these md files. Others say the only way is to keep the context tight and focused, going so far as to have written _yet more tools_ to remove and re-add code comments so they don't "poison" the context.
I am slightly straw-manning, but the tone in this thread has already shifted from a few months ago where these "tools" were going to immediately give huge productivity gains but now you're telling me they need 1) their own special files everywhere (again, this isn't even agreed on) and 2) "engineering and planning...not done in days or weeks even
The entire economy is propped up on this tech right now and no one can even agree on whether it's effective or how to use it properly? Not to mention the untold damage it is doing to learning outcomes.
While I believe you're probably right that getting any productivity gains from these tools requires an investment, I think calling the process "engineering" is really stretching the meaning of the word. It's really closer to ritual magic than any solid engineering practices at this point. People have guesses and practices that may or may not actually work for them (since measuring productivity increases is difficult if not impossible), and they teach others their magic formulas for controlling the demon.
Most countries don’t have a notion of a formally licensed software engineer, anyway. Arguing what is and is not engineering is not useful.
>> [..] and it may not even be relevant anymore next year.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Yeah I feel like on average I still spend a similar amount of time developing but drastically less time fixing obscure bugs, because once it codes the feature and I describe the bugs it fixed them, the rest of my times spent testing and reviewing code.
Learning how to equip a local LLM with tools it can use to interact with to extend its capabilities has been a lot of fun for me and is a great educational experience for anyone who is interested. Just another tool for the toolchest.
> “CRITICAL (PRIORITY 0):”
There's no need for this level of performative ridiculousness with AGENTS.md (Codex) directives, FYI.
Is this documented anywhere? This is the first I have ever heard of it.
Here: https://www.anthropic.com/engineering/claude-code-best-pract...
claude.md seems to be important enough to be their very first point in that document.
Naw man, it's the first point because in April Claude code didn't really gave anything else that somewhat worked.
I tried to use that effectively, I even started a new greenfield project just to make sure to test it under ideal circumstances - and while it somewhat worked, it was always super lackluster and way more effective to explicitly add the context manually via prepared md you just reference in the prompt.
I'd tell anyone to go for skills first before littering your project with these config files everywhere
I often can't tell the difference between my Readme and Claude files to the point that I cannibalise the Claude file for the Readme.
It's the difference between instructions for a user and instructions for a developer, but in coding projects that's not much different.
We are back to color-sorted M&Ms bowls.
That's smart, but I worry that that works only partially; you'll be filling up the context window with conversation turns where the LLM consistently addresses it's user as "Mr. Tinkleberry", thus reinforcing that specifc behavior encoded by CLAUDE.md. I'm not convinced that this way of addressing the user implies that it keeps attention the rest of the file.
I have a /bootstrap command that I run which instructs Claude Code to read all system and project CLAUDE.md files, skills and commands.
Helps me quickly whip it back in line.
Mind sharing it? (As long as it doesn’t involve anything private.)
Isn’t that what every new session does?
That also clears the context; a command would just append to the context.
For whatever reason, I can't get into Claude's approach. I like how Cursor handles this, with a directory of files (even subdirectories allowed) where you can define when it should use specific documents.
We are all "context engineering" now but Claude expects one big file to handle everything? Seems luke a deadend approach.
They have an entire feature for this: https://www.claude.com/blog/skills
CLAUDE.md should only be for persistent reminders that are useful in 100% of your sessions
Otherwise, you should use skills, especially if CLAUDE.md gets too long.
Also just as a note, Claude already supports lazy loaded separate CLAUDE.md files that you place in subdirectories. It will read those if it dips into those dirs
I think their skills have the ability to dynamically pull in more data, but so far i've not tested it to much since it seems more tailored towards specific actions. Ie converting a PDF might translate nicely to the Agent pulling in the skill doc, but i'm not sure if it will translate well to it pulling in some rust_testing_patterns.md file when it writes rust tests.
Eg i toyed with the idea of thinning out various CLAUDE.md files in favor of my targeted skill.md files. In doing so my hope was to have less irrelevant data in context.
However the more i thought through this, the more i realized the Agent is doing "everything" i wanted to document each time. Eg i wasn't sure that creating skills/writing_documentation.md and skills/writing_tests.md would actually result in less context usage, since both of those would be in memory most of the time. My CLAUDE.md is already pretty hyper focused.
So yea, anyway my point was that skills might have potential to offload irrelevant context which seems useful. Though in my case i'm not sure it would help.
This is good for the company, chances are you will eat more tokens. I liked Aider approach, it wasn't trying to be too clever, it used files added to chat and asks if it figure out that something more is needed (like, say, settings in case of Django application).
Sadly Aider is no longer maintained...
I wonder if there are any benefits, side-effects or downsides of everyone using the same fake name for Claude to call them.
If a lot of people always put call me Mr. Tinkleberry in the file will it start calling people Mr. Tinkleberry even when it loses the context because so many people seem to want to be called Mr. Tinkleberry.
Then you switch to another name.
You could make a hook in Claude to re-inject claude.md. For example, make it say "Mr Tinkleberry" in every response, and failing to do so re-injects the instructions.
I've found that Codex is much better at instruction-following like that, almost to a fault (for example, when I tell it to "always use TDD", it will try to use TDD even when just fixing already-valid-just-needing-expectation-updates tests!
It baffles me how people can be happy working like this. "I wrap the hammer in paper so if the paper breaks I know the hammer has turned into a saw."
If you have any experience in 3D modeling, I feel it's quite closer to 3D Unwrapping than software development.
You got a bitmap atlas ("context") where you have to cram as much information as possible without losing detail, and then you need to massage both your texture and the structure of your model so that your engine doesn't go mental when trying to map your informations from a 2D to a 3D space.
Likewise, both operations are rarely blemish-free and your ability resides in being able to contain the intrinsic stochastic nature of the tool.
You could think of it as art or creativity.
> It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It
probably by not thinking in ridiculous analogies that don't help
I used to tell it to always start every message with a specific emoji. Of the emoji wasn’t present, I knew the rules were ignored.
But it’s bro reliable enough. It can send the emoji or address you correctly while still ignoring more important rules.
Now I find that it’s best to have a short and tight rules file that references other files where necessary. And to refresh context often. The longer the context window gets, the more likely it is to forget rules and instructions.
The article explains why that's not a very good test however.
Why not? It's relevant for all tasks, and just adds 1 line
I guess I assumed that it's not highly relevant to the task, but I suppose it depends on interpretation. E.g. if someone tells the bus driver to smile while he drives, it's hopefully clear that actually driving the bus is more important than smiling.
Having experimented with similar config, I found that Claude would adhere to the instructions somewhat reliably at the beginning and end of the conversation, but was likely to ignore during the middle where the real work is being done. Recent versions also seem to be more context-aware, and tend to start rushing to wrap up as the context is nearing compaction. These behaviors seem to support my assumption, but I have no real proof.
It will also let the LLM process even more tokens, thus decreasing it's accuracy
The green m&M's trick of AI instructions.
I've used that a couple times, e.g. "Conclude your communications with "Purple fish" at the end"
Claude definitely picks and chooses when purple fish will show up
I tell it to accomplish only half of what it thinks it can, then conclude with a haiku. That seems to help, because 1) I feel like it starts shedding discipline as it starts feeling token pressure, and 2) I feel like it is more likely to complete task n - 1 than it is to complete task n. I have no idea if this is actually true or not, or if I'm hallucinating... all I can say is that this is the impression I get.
- [deleted]
> A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell Claude is not paying attention to the instructions on Claude.md, when Claude stops calling him “Mr Tinkleberry” consistently
this is a totally normal thing that everyone does, that no one should view as a signal of a psychotic break from reality...
is your friend in the room with us right now?
I doubt I'll ever understand the lengths AI enjoyers will go though just to avoid any amount of independent thought...
I suspect you’re misjudging the friend here. This sounds more like the famous “no brown m&ms” clause in the Van Halen performance contract. As ridiculous as the request is, it being followed provides strong evidence that the rest (and more meaningful) of the requests are.
Sounds like the friend understands quite well how LLMs actually work and has found a clever way to be signaled when it’s starting to go off the rails.
It's also a common tactic for filtering inbound email.
Mention that people may optionally include some word like 'orange' in the subject line to tell you they've come via some place like your blog or whatever it may be, and have read at least carefully enough to notice this.
Of course ironically that trick's probably trivially broken now because of use of LLMs in spam. But the point stands, it's an old trick.
Apart from the fact that not even every human would read this and add it to the subject, this would still work.
I doubt there is any spam machine out there the quickly tries to find peoples personal blog before sending them viagra mail.
If you are being targeted personally, then of course all bets are off, but that would’ve been the case with or without the subject-line-trick
Could try asking for a seahorse emoji in addition…
> I suspect you’re misjudging the friend here. This sounds more like the famous “no brown m&ms” clause in the Van Halen performance contract. As ridiculous as the request is, it being followed provides strong evidence that the rest (and more meaningful) of the requests are.
I'd argue, it's more like you've bought so much into the idea this is reasonable, that you're also willing to go through extreme lengths to recon and pretend like this is sane.
Imagine two different worlds, one where the tools that engineers use, have a clear, and reasonable way to detect and determine if the generative subsystem is still on the rails provided by the controller.
And another world where the interface is completely devoid of any sort of basic introspection interface, and because it's a problematic mess, all the way down, everyone invents some asinine way that they believe provides some sort of signal as to whether or not the random noise generator has gone off the rails.
> Sounds like the friend understands quite well how LLMs actually work and has found a clever way to be signaled when it’s starting to go off the rails.
My point is that while it's a cute hack, if you step back and compare it objectively, to what good engineering would look like. It's wild so many people are all just willing to accept this interface as "functional" because it means they don't have to do the thinking that required to emit the output the AI is able to, via the specific randomness function used.
Imagine these two worlds actually do exist; and instead of using the real interface that provides a clear bool answer to "the generative system has gone off the rails" they *want* to be called Mr Tinkerberry
Which world do you think this example lives in? You could convince me, Mr Tinkleberry is a cute example of the latter, obviously... but it'd take effort to convince me that this reality is half reasonable or that's it's reasonable that people who would want to call themselves engineers should feel proud to be a part of this one.
Before you try to strawman my argument, this isn't a gatekeeping argument. It's only a critical take on the interface options we have to understand something that might as well be magic, because that serves the snakeoil sales much better.
> > Is the magic token machine working?
> Fuck I have no idea dude, ask it to call you a funny name, if it forgets the funny name it's probably broken, and you need to reset it
Yes, I enjoy working with these people and living in this world.
It is kind of wild that not that long ago the general sentiment in software engineering (at least as observed on boards like this one) seemed to be about valuing systems that were understandable, introspectable, with tight feedback loops, within which we could compose layers of abstractions in meaningful and predictable ways (see for example the hugely popular - at the time - works of Chris Granger, Bret Victor, etc).
And now we've made a complete 180 and people are getting excited about proprietary black boxes and "vibe engineering" where you have to pretend like the computer is some amnesic schizophrenic being that you have to coerce into maybe doing your work for you, but you're never really sure whether it's working or not because who wants to read 8000 line code diffs every time you ask them to change something. And never mind if your feedback loops are multiple minutes long because you're waiting on some agent to execute some complex network+GPU bound workflow.
You don’t think people are trying very hard to understand LLMs? We recognize the value of interpretability. It is just not an easy task.
It’s not the first time in human history that our ability to create things has exceeded our capacity to understand.
> You don’t think people are trying very hard to understand LLMs? We recognize the value of interpretability. It is just not an easy task.
I think you're arguing against a tangential position to both me, and the person this directly replies to. It can be hard to use and understand something, but if you have a magic box that you can't tell if it's working. It doesn't belong anywhere near the systems that other humans use. The people that use the code you're about to commit to whatever repo you're generating code for, all deserve better than to be part of your unethical science experiment.
> It’s not the first time in human history that our ability to create things has exceeded our capacity to understand.
I don't agree this is a correct interpretation of the current state of generative transformer based AI. But even if you wanted to try to convince me; my point would still be, this belongs in a research lab, not anywhere near prod. And that wouldn't be a controversial idea in the industry.
We used the steam engine for 100 years before we had a firm understanding of why it worked. We still don’t understand how ice skating works. We don’t have a physical understanding of semi-fluid flow in grain silos, but we’ve been using them since prehistory.
I could go on and on. The world around you is full of not well understood technology, as well as non deterministic processes. We know how to engineer around that.
> We used the steam engine for 100 years before we had a firm understanding of why it worked. We still don’t understand how ice skating works. We don’t have a physical understanding of semi-fluid flow in grain silos, but we’ve been using them since prehistory.
I don't think you and I are using the same definition for "firm understanding" or "how it works".
> I could go on and on. The world around you is full of not well understood technology, as well as non deterministic processes. We know how to engineer around that.
Again, you're side stepping my argument so you can restate things that are technically correct, but not really a point in of themselves. I see people who want to call themselves software engineers throw code they clearly don't understand against the wall because the AI said so. There's a significant delta between knowing you can heat water to turn it into a gas with increased pressure that you can use to mechanically turn a wheel, vs, put wet liquid in jar, light fire, get magic spinny thing. If jar doesn't call you a funny name first, that's bad!
> I don't think you and I are using the same definition for "firm understanding" or "how it works".
I’m standing in firm ground here. Debate me in the details if you like.
You are constructing a strawman.
> It doesn't belong anywhere near the systems that other humans use
Really for those of us who actually work in critical systems (emergency services in my case) - of course we're not going to start patching the core applications with vibe code.
But yeah, that frankenstein reporting script that half a dozen amateur hackers made a mess of over 20 years instead of refactoring and redesigning? That's prime fodder for this stuff. NOBODY wants to clean that stuff up by hand.
> Really for those of us who actually work in critical systems (emergency services in my case) - of course we're not going to start patching the core applications with vibe code.
I used to believe that no one would seriously consider this too... but I don't believe that this is a safe assumption anymore. You might be the exception, but there are many more people who don't consider the implications of turning over said intellectual control.
> But yeah, that frankenstein reporting script that half a dozen amateur hackers made a mess of over 20 years instead of refactoring and redesigning? That's prime fodder for this stuff. NOBODY wants to clean that stuff up by hand.
It's horrible, no one currently understands it, so let the AI do it, so that still, no one will understand it, but at least this one bug will be harder to trigger.
I don't agree that harder to trigger bugs are better than easy to trigger bugs. And from my view, the argument that "it's currently broken now, and hard to fix!" Isn't exactly an argument I find compelling for leaving it that way.
Your comment would be more useful if you could point us to some concrete tooling that’s been built out in the last ~3 years that LLM assisted coding has been around to improve interpretability.
That would be the exact opposite of my claim: it is a very hard problem.
[dead]
This reads like you either have an idealized view of Real Engineering™, or used to work in a stable, extremely regulated area (e.g. civil engineering). I used to work in aerospace in the past, and we had a lot of silly Mr Tinkleberry canaries. We didn't strictly rely on them because our job was "extremely regulated" to put it mildly, but they did save us some time.
There's a ton of pretty stable engineering subfields that involve a lot more intuition than rigor. A lot of things in EE are like that. Anything novel as well. That's how steam in 19th century or aeronautics in the early 20th century felt. Or rocketry in 1950s, for that matter. There's no need to be upset with the fact that some people want to hack explosive stuff together before it becomes a predictable glacier of Real Engineering.
> There's no need to be upset with the fact that some people want to hack explosive stuff together before it becomes a predictable glacier of Real Engineering.
You misunderstand me. I'm not upset that people are playing with explosives. I'm upset that my industry is playing with explosives that all read, "front: face towards users"
And then, more upset that we're all seemingly ok with that.
The driving force of enshittifacation of everything, may be external, but degradation clearly comes from engineers first. These broader industry trends only convince me it's not likely to get better anytime soon, and I don't like how everything is user hostile.
Man I hate this kind of HN comment that makes grand sweeping statement like “that’s how it was with steam in the 19th century or rocketry in the 1950s”, because there’s no way to tell whether you’re just pulling these things out of your… to get internet points or actually have insightful parallels to make.
Could you please elaborate with concrete examples on how aeronautics in the 20th century felt like having a fictional friend in a text file for the token predictor?
We're not going to advance the discussion this way. I also hate this kind of HN comment that makes grand sweeping statement like "LLMs are like having a fictional friend in a text file for the token predictor", because there's no way to tell whether you're just pulling these things out of your... to get internet points or actually have insightful parallels to make.
Yes, during the Wright era aeronautics was absolutely dominated by tinkering, before the aerodynamics was figured out. It wouldn't pass the high standard of Real Engineering.
> Yes, during the Wright era aeronautics was absolutely dominated by tinkering, before the aerodynamics was figured out. It wouldn't pass the high standard of Real Engineering.
Remind me: did the Wright brothers start selling tickets to individuals telling them it was completely safe? Was step 2 of their research building a large passenger plane?
I originally wanted to avoid that specific flight analogy, because it felt a bit too reductive. But while we're being reductive, how about medicine too; the first smallpox vaccine was absolutely not well understood... would that origin story pass ethical review today? What do you think the pragmatics would be if the medical profession encouraged that specific kind of behavior?
> It wouldn't pass the high standard of Real Engineering.
I disagree, I think it 100% is really engineering. Engineering at it's most basic is tricking physics into doing what you want. There's no more perfect example of that than heavier than air flight. But there's a critical difference between engineering research, and experimenting on unwitting people. I don't think users need to know how the sausage is made. That counts equally to planes, bridges, medicine, and code. But the professionals absolutely must. It's disappointing watching the industry I'm a part of willingly eschew understanding to avoid a bit of effort. Such a thing is considered malpractice in "real professions".
Ideally neither of you to wring your hands about the flavor or form of the argument, or poke fun at the gamified comment thread. But if you're gonna complain about adding positively to the discussion, try to add something to it along with the complaints?
As a matter of fact, commercial passenger service started almost immediately as the tech was out of the fiction phase. The airship were large, highly experimental, barely controllable, hydrogen-filled death traps that were marketed as luxurious and safe. First airliners also appeared with big engines and large planes (WWI disrupted this a bit). Nothing of that was built on solid grounds. The adoption was only constrained by the industrial capacity and cost. Most large aircraft were more or less experimental up until the 50's, and aviation in general was unreliable until about 80's.
I would say that right from the start everyone was pretty well aware about the unreliability of LLM-assisted coding and nobody was experimenting on unwitting people or forcing them to adopt it.
>Engineering at it's most basic is tricking physics into doing what you want.
Very well, then Mr Tinkleberry also passes the bar because it's exactly such a trick. That it irks you as a cheap hack that lacks rigor (which it does) is another matter.
> As a matter of fact, commercial passenger service started almost immediately as the tech was out of the fiction phase. The airship were large, highly experimental, barely controllable, hydrogen-filled death traps that were marketed as luxurious and safe.
And here, you've stumbled onto the exact thing I'm objecting to. I think the Hindenburg disaster was a bad thing, and software engineering shouldn't repeat those mistakes.
> Very well, then Mr Tinkleberry also passes the bar because it's exactly such a trick. That it irks you as a cheap hack that lacks rigor (which it does) is another matter.
Yes, this is what I said.
> there's a critical difference between engineering research, and experimenting on unwitting people.
I object to watching developers do, exactly that.
It feels like you’re blaming the AI engineers here, that they built it this way out of ignorance or something. Look into interpretability research. It is a hard problem!
I am blaming the developers who use AI because they're willing to sacrifice intellectual control in trade for something that I find has minimal value.
I agree it's likely to be a complex or intractable problem. But I don't enjoy watching my industry revert down the professionalism scale. Professionals don't choose tools that they can't explain how it works. If your solution to understanding if your tool is still functional is inventing an amusing name and trying to use that as the heuristic, because you have no better way to determine if it's still working correctly. That feels like it might be a problem, no?
I’m sorry you don’t like it. But this has very strong old-man-yells-at-cloud vibes. This train is moving, whether you want it to or not.
Professionals use tools that work, whether they know why it works is of little consequence. It took 100 years to explain the steam engine. That didn’t stop us from making factories and railroads.
> It took 100 years to explain the steam engine. That didn’t stop us from making factories and railroads.
You keep saying this, why do you believe it so strongly? Because I don't believe this is true. Why do you?
And then, even assuming it's completely true exactly as stated; shouldn't we have higher standards than that when dealing with things that people interact with? Boiler explosions are bad right? And we should do everything we can to prove stuff works the way we want and expect? Do you think AI, as it's currently commonly used, helps do that?
Because I’m trained as a physicist and (non-software) engineer and I know my field’s history? Here’s the first result that comes up on Google. Seems accurate from a quick skim: https://www.ageofinvention.xyz/p/age-of-invention-why-wasnt-...
And yes we should seek to understand new inventions. Which we are doing right now, in the form of interpretability research.
We should not be making Luddite calls to halt progress simply because our analytic capabilities haven’t caught up to our progress in engineering.
Can you cite a section from this very long page that might convince me no one at the time understood how turning water into steam worked to create pressure?
If this is your industry, shouldn't you have a more reputable citation, maybe something published more formally? Something expected to stand up to peer review, instead of just a page on the internet?
> We should not be making Luddite calls to halt progress simply because our analytic capabilities haven’t caught up to our progress in engineering.
You've misunderstood my argument. I'm not making a luddite call to halt progress, I'm objecting to my industry which should behave as one made up of professionals, willingly sacrifice intellectual control over the things they are responsible for, and advocate others should do the same. Especially not at the expense of users, which I see happening.
Anything that results in sacrificing the understanding over exactly how the thing you built works is bad should be avoided. The source, either AI or something different, doesn't matter as much as the result.
[dead]
This could be a very niche standup comedy routine, I approve.
I use agents almost all day and I do way more thinking than I used to, this is why I’m now more productive. There is little thinking required to produce output, typing requires very little thinking. The thinking is all in the planning… If the LLM output is bad in any given file I simply step in and modify it, and obviously this is much faster than typing every character.
I’m spending more time planning and my planning is more comprehensive than it used to be. I’m spending less time producing output, my output is more plentiful and of equal quality. No generated code goes into my commits without me reviewing it. Where exactly is the problem here?
The 'canary in the coal mine' approach (like the Mr. Tinkleberry trick) is silly but pragmatic. Until we have deterministic introspection for LLMs, engineers will always invent weird heuristics to detect drift. It's not elegant engineering, but it's effective survival tactics in a non-deterministic loop.