Amazon is holding a mandatory meeting about AI breaking its systems

twitter.com

・

222 points

・

lwhsiao

・

3 hours ago

151 comments

happytoexplain ・ 2 hours ago

>Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off

Review by a senior is one of the biggest "silver bullet" illusions managers suffer from. For a person (senior or otherwise) to examine code or configuration with the granularity required to verify that it even approximates the result of their own level of experience, even only in terms of security/stability/correctness, requires an amount of time approaching the time spent if they had just done it themselves.

I.e. senior review is valuable, but it does not make bad code good.

This is one major facet of probably the single biggest problem of the last couple decades in system management: The misunderstanding by management that making something idiot proof means you can now hire idiots (not intended as an insult, just using the terminology of the phrase "idiot proof").

ardeaver ・ an hour ago

When I was really early in my career, a mentor told me that code review is not about catching bugs but spreading context (i.e. increasing bus factor.) Catching bugs is a side effect, but unless you have a lot of people review each pull request, it's basically just gambling.
The more expensive and less sexy option is to actually make testing easier (both programmatically and manually), write more tests and more levels of tests, and spend time reducing code complexity. The problem, I think, is people don't get promoted for preventing issues.
- bluGill ・ an hour ago
  
  > people don't get promoted for preventing issues.
  they do - but only after a company has been burned hard. They also can be promoted for their area being enough better that everyone notices.
  still the best way to a promotion is write a major bug that you can come in at the last moment and be the hero for fixing.
  
  tartoran ・ an hour ago
  ・ 3 more
  
  That could work but plenty of quiet heros weren’t promoted for fixing critical bugs.
  
  recursive ・ 21 minutes ago
  ・ 2 more
  
  They fixed it too soon. You have to wait until the effect is visible on someone's dashboard somewhere.
  
  bluGill ・ 17 minutes ago
  
  You have to make sure it doesn't arrive at you before it is on the dashboard. Otherwise you are why it is blowing up the time to fix a bug metric. Unless you can make the problem so obscure other smart people asked to help you can't figure it out thus making you look bad.
- 8note ・ 21 minutes ago
  
  > The problem, I think, is people don't get promoted for preventing issues.
  cleaning up structural issues across a couple orgs is a senior => principal promo ive seen a couple of times
AgentOrange1234 ・ 8 minutes ago

Seniors are going to need to hold Juniors to a high bar for understanding and explaining what they are committing. Otherwise it will become totally soul destroying to have a bunch of juniors submitting piles of nonsense and claiming they are blocked on you all the time.
marginalia_nu ・ 2 hours ago

Expert reviews are just about the only thing that makes AI generated code viable, though doing them after the fact is a bit sketchy, to be efficient you kinda need to keep an eye on what the model is doing as its working.
Unchecked, AI models output code that is as buggy as it is inefficient. In smaller green field contexts, it's not so bad, but in a large code base, it's performs much worse as it will not have access to the bigger picture.
In my experience, you should be spending something like 5-15X the time the model takes to implement a feature on reviewing and making it fix its errors and inefficiencies. If you do that (with an expert's eye), the changes will usually have a high quality and will be correct and good.
If you do not do that due dilligence, the model will produce a staggering amount of low quality code, at a rate that is probably something like 100x what a human could output in a similar timespan. Unchecked, it's like having a small army of the most eager junior devs you can find going completely fucking ape in the codebase.
- jonnycoder ・ 2 minutes ago
  
  I tend to agree. I spent a lot of time revising skills for my brownfield repo, writing better prompts to create a plan with clear requirements, writing a skill/command to decompose a plan, having a clear testing skill to write tests and validate, and finally having a code reviewer step using a different model (in my case it's codex since claude did the development). My last PR was as close to perfect as I have got so far.
- locusofself ・ an hour ago
  
  If you spend 5-15x the time reviewing what the LLM is doing, are you saving any time by using it?
  
  happytoexplain ・ an hour ago
  
  No, but that's the crux of the AI problem in software. Time to write code was never the bottleneck. AI is most useful for learning, either via conversation or by seeing examples. It makes writing code faster too, but only a little after you take into account review. The cases where it shines are high-profile and exciting to managers, but not common enough to make a big difference in practice. E.g AI can one-shot a script to get logs from a paginated API, convert it to ndjson, and save to files grouped by week, with minimal code review, but only if I'm already experienced enough to describe those requirements, and, most importantly, that's not what I'm doing every day anyway.
  
  ritlo ・ an hour ago
  ・ 2 more
  
  A related Dirty Secret that's going to become clear from all this is that a very large proportion of code in the wild (yes, even in 2026—maybe not in FAANG and friends, IDK, but across all code that is written for pay in the entire economy) has limited or no automated test coverage, and is often being written with only a limited recorded spec that's usually fleshed out only to the degree needed (very partial) as a given feature is being worked on.
  What do the relatively hands-off "it can do whole features at a time" coding systems need to function without taking up a shitload of time in reviews? Great automated test coverage, and extensive specs.
  I think we're going to find there's very little time-savings to be had for most real-world software projects from heavy application of LLMs, because the time will just go into tests that wouldn't otherwise have been written, and much more detailed specs that otherwise never would have been generated. I guess the bright-side take of this is that we may end up with better-tested and better-specified software? Though so very much of the industry is used to skipping those parts, and especially the less-capable (so far as software goes) orgs that really need the help and the relative amateurs and non-software-professionals that some hope will be able to become extremely productive with these tools, that I'm not sure we'll manage to drag processes & practices to where they need to be to get the most out of LLM coding tools anyway. Especially if the benefit to companies is "you will have better tests for... about the same amount of software as you'd have written without LLMs".
  We may end up stuck at "it's very-aggressive autocomplete" as far as LLMs' useful role in them, for most projects, indefinitely.
  On the plus side for "AI" companies, low-code solutions are still big business even though they usually fail to deliver the benefits the buyer hopes for, so there's likely a good deal of money to be made selling companies LLM solutions that end up not really being all that great.
  
  _wire_ ・ 40 minutes ago
  
  > because the time will just go into tests that wouldn't otherwise have been written
  Writing tests to ensure a program is correct is the same problem as writing a correct program.
  Evaluating conformance is a different category of concern from ensuring correctness. Tests are about conformance not correctness.
  Ensuring correct programs is like cleaning in the sense that you can only push dirt around, you can't get rid of it.
  You can push uncertainty around and but you can't eliminate it.
  This is the point of Gödel's theorem. Shannon's information theory observes similar aspects for fidelity in communication.
  As Douglas Adams noted: ultimately you've got to know where your towel is.
  
  shimman ・ an hour ago
  
  These companies don't care about saving time or lowering operating costs, they have massive monopolies to subsidize their extremely poor engineering practices with. If the mandate is to force LLM usage or lose your job, you don't care about saving time; you care about saving your job.
  One thing I hope we'll all collectively learn from this is how grossly incompetent the elite managerial class has become. They're destroying society because they don't know what to do outside of copying each other.
  It has to end.
  
  marginalia_nu ・ an hour ago
  
  To be honest, some times it's still beneficial.
  For fairly straightforward changes it's probably a wash, but ironically enough it's often the trickier jobs where they can be beneficial as it will provide an ansatz that can be refined. It's also very good at tedious chores.
  
  bluGill ・ an hour ago
  ・ 4 more
  
  Some, but not very much. Writing code is hard. Ai will do a lot of tedious code that you procrastinate writing.
  
  hard24 ・ an hour ago
  ・ 3 more
  
  Also when you are writing code yourself you are implicitly checking it whilst at the back of your mind retaining some form of the entire system as a whole.
  People seem to gloss over this... As a CEO if people don't function like this I'd be awake at night sweating.
  
  bluGill ・ 19 minutes ago
  
  Sortof. I work on a system too large for anyone to know the whole thing. Often people who don't know each other do something that will break the other. (Often because of the number of different people - most individuals go years between this)
  
  bonesss ・ 12 minutes ago
  
  That’s the reverse-centaur issue I see: humans are not great at repetitive nuanced similar seeming tasks, putting the onus on humans to retroactively approve high volumes of critical code has them managing a critical failure mode at their weakest and worst. Automated reviews should be enhancing known good-faith code, manual reviews of high volume superficially sound but subversive code is begging for issues over time.
  Which results the software engineering issue I’m not seeing addressed by the hype: bugs cost tens to hundreds of times their coding cost to resolve if they require internal or external communication to address. Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place.
  An LLM workflow that yields 10x an engineer but psychopathically lies and sabotages client facing processes/resources once a quarter is likely a NNPP (net negative producing programmer), once opportunity and volatility costs are factored in.
- rectang ・ an hour ago
  
  > Expert reviews are just about the only thing that makes AI generated code viable
  I disagree, in the sense that an engineer who knows how to work with LLMs can produce code which only needs light review.
  * Work in small increments
  * Explicitly instruct the LLM to make minimal changes
  * Think through possible failure modes
  * Build in error-checking and validation for those failure modes
  * Write tests which exercise all paths
  This is a means to produce "viable" code using an LLM without close review. However, to your point, engineers able to execute this plan are likely to be pretty experienced, so it may not be economically viable.
  
  marginalia_nu ・ an hour ago
  ・ 6 more
  
  By the time you're working in increments small enough that it doesn't introduce significant issues, you really might as well write the code yourself.
  
  rectang ・ 41 minutes ago
  ・ 5 more
  
  That's not my experience — I'm significantly faster while guiding an LLM using this methodology.
  The gains are especially notable when working in unfamiliar domains. I can glance over code and know "if this compiles and the tests succeed, it will work", even if I didn't have the knowledge to write it myself.
  
  marginalia_nu ・ 38 minutes ago
  ・ 4 more
  
  That's where the Gell-Mann amnesia will get you though. As much it trips up on the domains you're familiar with, it also trips up in unfamiliar domains. You just don't see it.
  
  rectang ・ 32 minutes ago
  ・ 3 more
  
  You're not telling me anything I don't know already. Only a person who accepts that they're fallible can execute this methodology anyway, because that's the kind of mentality that it takes to think through potential failure modes.
  Yes, code produced this way will have bugs, especially of the "unknown unknown" variety — but so would the code that I would have written by hand.
  I think a bigger factor contributing to unforeseen bugs is whether the LLM's code is statistically likely to be correct:
  * Is this a domain that the LLM has trained on a lot? (i.e. lots of React code out there, not much in your home-grown DSL)
  * Is the codebase itself easy to understand, written with best practices, and adhering to popular conventions? Code which is hard for humans to understand is also hard for an LLM to understand.
  
  marginalia_nu ・ 24 minutes ago
  ・ 2 more
  
  Right, I think the latter part is my concern with AI generated code. Often it isn't easy to read (or as easy to read as it could be), and the harder it is to navigate, the more code problems the AI model introduces.
  It introduces unnecessary indirection, additional abstractions, fails to re-use code. Humans do this too, but AI models can introduce this type of architectural rot much faster (because it's so fast), and humans usually notice when things start to go off the rails, whereas an AI model will just keep piling on bad code.
  
  rectang ・ 13 minutes ago
  
  I agree that under default settings, LLMs introduce way too many changes and are way too willing to refactor everything. I was only able to get the situation under control by adding this standing instruction:
  --- applyTo: '**' --- By default: Make the smallest possible change. Do not refactor existing code unless I explicitly ask.
  Under this, Claude Opus at least produces pretty reliable code with my methodology even under surprisingly challenging circumstances, and recent ChatGPTs weren't bad either (though I'm no longer using them). Less powerful LLMs struggle, though.
- raw_anon_1111 ・ an hour ago
  
  In my experience, inefficient code is rarely the issue outside of data engineering type ETL jobs. It’s mostly architectural. Inefficient code isn’t the reason your login is taking 30 seconds. Yes I know at Amazon/AWS scale (former employee) every efficiency matters. But even at Salesforce scale, ringing out every bit of efficiency doesn’t matter.
  No one cares about handcrafted artisanal code as long as it meets both functional and non functional requirements. The minute geeks get over themselves thinking they are some type of artists, the happier they will be.
  I’ve had a job that requires coding for 30 years and before ther I was hobbyist and I’ve worked for from everything from 60 person startups to BigTech.
  For my last two projects (consulting) and my current project, while I led the project, got the requirements, designed the architecture from an empty AWS account (yes using IAC) and delivered it. I didn’t look at a line of code. I verified the functional and non functional requirements, wrote the hand off documentation etc.
  The customer is happy, my company is happy, and I bet you not a single person will ever look at a line of code I wrote. If they do get a developer to take it over, the developer will be grateful for my detailed AGENTS.md file.
  
  hard24 ・ 29 minutes ago
  ・ 3 more
  
  "No one cares about handcrafted artisanal code as long as it meets both functional and non functional requirements"
  Speak for yourself. I don't hire people like you.
  
  raw_anon_1111 ・ 28 minutes ago
  ・ 2 more
  
  And guess what? You probably don’t pay as much as I make now either…
  Even in late 2023 with the shit show of the current market, I had no issues having multiple offers within three weeks just by reaching out to my network and companies looking for people with my set of skills.
  
  hard24 ・ 25 minutes ago
  
  I field a small team of experts who are paid upwards of a million GBP in cold-hard cash in London. Not stock. Cash.
  You sound like a bozo, I can sniff it through my screen.
  
  YCpedohaven ・ an hour ago
  ・ 2 more
  
  You are the reason software is so shitty today. Congrats code monkey.
  
  raw_anon_1111 ・ 33 minutes ago
  
  Yes because I didn’t check to see if Claude code used a for loop instead of a while loop? Or that it didn’t use my preferred GOF pattern and didn’t use what I read in “Clean Code”?
  Guess what? I also stopped caring how registers are used and counting clock cycles in my assembly language code like it’s the 80s and I’m still programming on a 1Mhz 65C02
- Skidaddle ・ an hour ago
  
  Just lead with “You are an expert software engineer…”, easy!
onion2k ・ 20 minutes ago

I.e. senior review is valuable, but it does not make bad code good.
I suspect that isn't the goal.
Review by more senior people shifts accountability from the Junior to a Senior, and reframes the problem from "Oh dear, the junior broke everything because they didn't know any better" to "Ah, that Senior is underperforming because they approved code that broke everything."
js8 ・ an hour ago

> requires an amount of time approaching the time spent if they had just done it themselves
It's actually often harder to fix something sloppy than to write it from scratch. To fix it, you need to hold in your head both the original, the new solution, and calculate the difference, which can be very confusing. The original solution can also anchor your thinking to some approach to the problem, which you wouldn't have if you solve it from scratch.
- bluGill ・ an hour ago
  
  Sloppy code that has been around for a while works. It likely has support for edge cases you forgot about. Often the sloppyness is because of those edge cases.
steveBK123 ・ 2 hours ago

Right, code reviews should already have been happening with human written junior code.
If AI is a productivity boost and juniors are going to generate 10x the PRs, do you need 10x the seniors (expensive) or 1/10th the juniors (cost save).
A reminder that in many situations, pure code velocity was never the limiting factor.
Re: idiot prooofing I think this is a natural evolution as companies get larger they try to limit their downside & manage for the median rather than having a growth mindset in hiring/firing/performance.
jetrink ・ an hour ago

It could create the right sort of incentives though. If I'm a junior and I suddenly have to take my work to a senior every time I use AI, I'm going to be much more selective about how I use it and much more careful when I do use it. AI is dangerous because it is so frictionless and this is a way to add friction.
Maybe I don't have the correct mental model for how the typical junior engineer thinks though. I never wanted to bug senior people and make demands on their time if I could help it.
- devonbleak ・ an hour ago
  
  What you're actually going to see is seniors inundated by slop and burning out and quitting because what used to be enjoyable solving of problems has become wading through slop that took 10 minutes to generate and submit but 30+ minutes to understand and write up a critique for it.
qnleigh ・ 34 minutes ago

I seriously doubt that they think senior reviewers will meticulously hunt down and fix all the AI bugs. Even if they could, they surely don't have the time. But it offers other benefits here:
1. They can assess whether the use of AI is appropriate without looking in detail. E.g. if the AI changed 1000 lines of code to fix a minor bug, or changed code that is essential for security.
2. To discourage AI use, because of the added friction.
raw_anon_1111 ・ an hour ago

Why only AI generated code? I wouldn’t let a junior or mid level developer’s code go into production without at least verifying the known hotspots - concurrency, security, database schema, and various other non functional requirements that only bite you in production.
I’m probably not going to review a random website built by someone except for usability, requirements and security.
- happytoexplain ・ an hour ago
  
  I didn't restrict my opinion to genAI code. I'm expressing a general thought that was relevant before AI. AI is just salient in relation to it.
  I also said senior review is valuable, but I'm not 100% sure if you're implying I didn't.
bs7280 ・ an hour ago

This is also why I think we will enter a world without Jr's. The time it takes for a Senior to review the Jr's AI code is more expensive than if the Sr produced their own AI code from scratch. Factor in the lack of meetings from a Sr only team, and the productivity gains will appear to be massive.
Whether or not these productivity gains are realized is another question, but spreadsheet based decision makers are going to try.
- czscout ・ an hour ago
  
  In this scenario, how might one become a senior without first being a junior? Seniors just pop into existence?
  
  simplyluke ・ an hour ago
  
  The bet from various industry leaders appears to be that the current generation of engineers will be the last who will ever need to think about complex systems and engineering, as the AI will just get good enough to do all of that by the time they retire.
hnthrow0287345 ・ 34 minutes ago

>requires an amount of time approaching the time spent if they had just done it themselves.
I would actually say having at least 2 people on any given work item should probably be the norm at Amazon's size if you also want to churn through people as Amazon does and also want quality.
Doing code reviews are not as highly valued in terms of incentives to the employees and it blocks them working on things they would get more compensation for.
grvdrm ・ 2 hours ago

What a statement at the end. You are absolutely right.
I hear “x tool doesn’t really work well” and then I immediately ask: “does someone know how to use it well?” The answer “yes” is infrequent. Even a yes is often a maybe.
The problem is pervasive in my world (insurance). Number-producing features need to work in a UX and product sense but also produce the right numbers, and within range of expectations. Just checking the UX does what it’s supposed to do is one job, and checking the numbers an entirely separate task.
I don’t many folks that do both well.
belval ・ 2 hours ago

The unwritten thing is that if you need seniors to review every single change from junior and mid-level engineers, and those engineers are mostly using Kiro to write their CRs, then what stops the senior from just writing the CRs with Kiro themselves?
yifanl ・ 2 hours ago

Senior reviews are useful, but as I understand it, Amazon has a fairly high turnover rate, so I wonder just how many seniors with deep knowledge of the codebase they could possibly have.
- tartoran ・ 41 minutes ago
  
  From engineers are interchangeable to high turnover are decisions that the company took. The payback time always comes at some point.
lokar ・ an hour ago

The goal of Sr code review is not to make the code better, it's to make the author better.
- skeeter2020 ・ an hour ago
  
  Agree but even broader: authors. I always viewed reviews as targeting Brook's less famous findings about the optimal team size being one, and asking how can we get better at building systems too big for the individual. I think code review is about shared, consistent understanding with catching bugs a nice side effect (or justification for the bean counters).
  
  lokar ・ an hour ago
  
  I agree, made (mostly) that point in my top level comment. Code reviews (both in the normal GitHub flow, but also small meetings, design reviews, etc) all help to tie the team together and improve quality.
mrbonner ・ an hour ago

What stops the senior from using AI to review the AI generated code the junior published?
- tartoran ・ 38 minutes ago
  
  That’s something that the junior can do. What companies want to do is put responsibility on someone who has more knowledge and skin in the game
femiagbabiaka ・ an hour ago

the outcome of the review isn't just that the code gets shipped, it's knowledge transfer from the senior engineer to the junior engineers that then creates more senior engineers
napolux ・ 40 minutes ago

LGTM
RamblingCTO ・ an hour ago

Who said PR reviews need to solve all the things and result in proof against idiots?
So you're saying that peer reviews are a waste of time and only idiots would use/propose them?
- happytoexplain ・ an hour ago
  
  None of that, sorry if I wasn't clear.
  To partially clarify: "Idiot proof" is a broad concept that here refers specifically to abstraction layers, more or less (e.g. a UI framework is a little "idiot proof"; a WYSIWYG builder is more "idiot proof"). With AI, it's complicated, but bad leadership is over-interpreting the "idiot proof" aspects of it. It's a phrase, not an insult to users of these tools.

cobolcomesback ・ 2 hours ago

This “mandatory meeting” is just the usual weekly company-wide meeting where recent operational issues are discussed. There was a big operational issue last week, so of course this week will have more attendance and discussion.

This meeting happens literally every week, and has for years. Feels like the media is making a mountain out of a mole hill here.

davidclark ・ 2 hours ago

The article claims:
>He asked staff to attend the meeting, which is normally optional.
Is that false? It also discusses a new policy:
>Junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes, Treadwell added.
Is that inaccurate? It is good context that this is a regularly scheduled meeting. But, regularly scheduled meetings can have newsworthy things happen at them.
- cobolcomesback ・ an hour ago
  
  It’s not false. But it’s also weaselly worded.
  Note that the article doesn’t say that he told staff they have to attend the meeting. It says he “asked” staff to attend the meeting. Which again, it’s really really normal for there to be an encouragement of “hey, since we just had an operational event, it would be good to prioritize attending this meeting where we discuss how to avoid operational events”.
  As for the second quote: senior engineers have always been required to sign off on changes from junior engineers. There’s nothing new there. And there is nothing specific to AI that was announced.
  This entire meeting and message is basically just saying “hey we’ve been getting a little sloppy at following our operational best practices, this is a reminder to be less sloppy”. It’s a massive nothingburger.
  
  8note ・ 18 minutes ago
  
  > senior engineers have always been required to sign off on changes from junior engineers.
  definitely a team by team question. if it was required it would be a crux rule that the code review isnt approved without an l6 approver.
- skeeter2020 ・ an hour ago
  
  That's not really what the headline attempts to communicate though. It specifically emphasizes "Mandatory" and "AI breaking things". Nobody was going to click on "Regularly scheduled Amazon staff meeting will include discussion on operational improvement"
CoolGuySteve ・ 2 hours ago

It didn't seem to make the news but at least in NYC the entire Amazon storefront was broken all afternoon on Friday.
Items weren't displaying prices and it was impossible to add anything to your cart. It lasted from about 2pm to 5pm.
It's especially strange because if a computer glitch brought down a large retail competitor like Walmart I probably would have seen something even though their sales volume is lower.
- malfist ・ an hour ago
  
  Over the weekend I was trying to return a pair of shoes and get a different size and I kept getting 500s trying to go to the store page for the shoes.
- kotaKat ・ an hour ago
  
  A little birdie told me someone pushed duplicate data into one of Amazon’s core noSQL systems that runs most of e-commerce. The front end of the site broke in weird ways but it certainly wasn’t taking orders.
otterley ・ 2 hours ago

> Feels like the media is making a mountain out of a mole hill here.
That's been their job ever since cable news was invented.
- ses1984 ・ an hour ago
  
  It’s been a bit longer than that.
  https://en.wikipedia.org/wiki/Yellow_journalism
  It probably goes back as long as they have been shouting news in the town square in Rome or before that even.
  
  otterley ・ 23 minutes ago
  
  True enough!
belval ・ 2 hours ago

I am not in that specific meeting but it made me chuckle that a weekly ops meeting will somehow get media attention. It's been an Amazon thing forever. Wait until the public learns about CoEs!
- 8note ・ 17 minutes ago
  
  id.expect COEs to be coming up with AI code action items though, not to have more thorough human checks
cmiles8 ・ 38 minutes ago

The core message of the article is that Amazon has been having issues with AI slop causing operational reliability concerns, and that seems to be 100% accurate.
Clent ・ 29 minutes ago

Who is the media you're accusing here? This is a twitter post. As far as I can tell they do not work a media company.
What is worth being pointed out is how quickly people blame "The Media" for how people use, consume and spread information on social networks.
- otterley ・ 23 minutes ago
  
  The source is not a Twitter post, it's a Financial Times article (that the poster failed to cite).
- undefined ・ 8 minutes ago
  
  [deleted]
embedding-shape ・ an hour ago

> This meeting happens literally every week, and has for years. Feels like the media is making a mountain out of a mole hill here.
Are you completely missing the point of the submission? It's not about "Amazon has a mandatory weekly meeting" but about the contents of that specific meeting, about AI-assisted tooling leading to "trends of incidents", having a "large blast radius" and "best practices and safeguards are not yet fully established".
No one cares how often the meeting in general is held, or if it's mandatory or not.
- skeeter2020 ・ an hour ago
  
  >> Are you completely missing the point of the submission
  no, and that's what people are noting: the headline deliberately tries to blow this up into a big deal. When did you last see the HN post about Amazon's mandatory meeting to discuss a human-caused outage, or a post mortem? It's not because they don't happen...
niwtsol ・ an hour ago

I believe it is by group - AWS started the weekly operations meeting, effectively every service's oncall from the last week had to attend. Then it grew massive, so they made it optional. Alexa had a similar meeting that tried to replicate what AWS did. A lot of time spent reviewing load tests getting ready for holiday season, prime day, and the superbowl (super bowl ads used to cause crazy TPS spikes for Alexa). And a lot of finger pointing if there was an outage from one team. While it probably did help raise the operational bar, so much time wasted by engineers on busywork/paperwork documenting an error or fix vs improving the actual service.

sethops1 ・ 2 hours ago

> The response for now? Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off.

So basically, kill the productivity of senior engineers, kill the ability for junior engineers to learn anything, and ensure those senior engineers hate their jobs.

Bold move, we'll see how that goes.

whateveracct ・ an hour ago

Juniors could just code things the old fashioned way. It isn't hard. And if they do find it too hard, they aren't cut out for this job.
- sdevonoes ・ an hour ago
  
  But aren’t companies enforcing AI usage? If noy, wait for it
  
  ritlo ・ 42 minutes ago
  ・ 3 more
  
  Mine's tracking it complete with a leaderboard (LOL) and it's been suggested to me that it'd be in my best interest not to be too low on that list, so I suspect in the back half of the year some sterner conversations and/or pink-slips are going to be coming the way of those who've not caught on that they need to at least be sending some make-work crap to their LLMs every day, even if they immediately throw the output in the metaphorical garbage bin.
  It's basically an even-more-ridiculous version of ranking programmers by lines-of-code/week.
  What's especially comical is I've seen enormous gains in my (longish, at this point) career from learning other tools (e.g. expanding my familiarity with Unix or otherwise fairly common command line tools) and never, ever has anyone measured how much I'm using them, and never, ever has management become in any way involved in pushing them on me. It's like the CEO coming down to tell everyone they'll be making sure all the programmers are using regular expressions enough, and tracking time spent engaging with regular expressions, or they'll be counting how many breakpoints they're setting in their debuggers per week. WTF? That kind of thing should be leads' and seniors' business, to spread and encourage knowledge and appropriate tool use among themselves and with juniors, to the degree it should be anyone's business. Seems like yet another smell indicating that this whole LLM boom is built on shaky ground.
  
  bonesss ・ 2 minutes ago
  
  From a management perspective I would be highly skeptics of token leaderboards. You are incentivizing people to piss away company money with uncertain rewards.
  I mean… throw some docs into the context window, see it explode. Repeat that a few times with some multi-step workflows. Presto, hundreds of dollars in “AI” spending accomplishing nothing. In olden days we’d just burn the cash in a waste paper basket.
  
  to11mtm ・ 13 minutes ago
  
  > even if they immediately throw the output in the metaphorical garbage bin.
  Gotta be careful if you do that tho; e.x. Copilot can monitor 'accept' rate, so at bare minimum you'd have to accept the changes than immediately back them out...
- thewhitetulip ・ an hour ago
  
  Well, not when they are mandated to use AI tools and asked for justification about their usage!
  I am saying in General, I've never worked in Amazon
- throw_m239339 ・ an hour ago
  
  Aren't these companies mandating the use of these tools at first place? Juniors aren't the problem.
dragonelite ・ 2 hours ago

Accelerate a person speed toward being burned out..
- altairprime ・ 2 hours ago
  
  ..and you lower overall engineering salary spend by rotating out seniority-paid engineers for newly-promoted AI reviewers with lower specs
almostdeadguy ・ 2 hours ago

I'm sorry what? Junior engineers can't learn anything without using AI assistants (or is the implication that having seniors review their code makes them incapable of learning?) and senior engineer would hate their jobs reviewing more code from their teammates? What reality do people live in now?
- zdragnar ・ an hour ago
  
  I thought the implication was that juniors would continue to use AI to stay "productive" (AWS is not a rest and vest job for juniors, from what I've heard) and seniors would no longer have time to do anything but review code from juniors who just spin the AI wheel.
  There's a lot of learning opportunity in failing, but if failure just means spam the AI button with a new prompt, there's not much learning to be had.
- ritlo ・ an hour ago
  
  > senior engineer would hate their jobs reviewing more code from their teammates
  Jesus, yes. Maybe I'm an oddball but there's a limit to how much PR reviewing I could do per week and stay sane. It's not terribly high, either. I'd say like 5 hours per week max, and no more than one hour per half-workday, before my eyes glaze over and my reviews become useless.
  Reviewing code is important and is part of the job but if you're asking me to spend far more of my time on it, and across (presumably) a wider set of projects or sections of projects so I've got more context-switching to figure out WTF I'm even looking at, yes, I would hate my job by the end of day 1 of that.
  
  almostdeadguy ・ 10 minutes ago
  
  If we can't spend that much time reviewing code, what are we exactly doing with this AI stuff?
  I don't disagree, I think reviewing is laborious, I just don't see how this causes any unintended consequences that aren't effectively baked into using an AI assistant.

prakhar897 ・ 23 minutes ago

From the amazon I know, people only care about a. not getting fired and b. promotions. For devs, the matrix looks like this:

1. Shipping: deliver tickets or be pipped.

2. Having Less comments on their PRs: for some drastically dumb reason, having a PR thoroughly reviewed is a sign of bad quality. L7 and above use this metric to Pip folks.

3. Docs: write docs, get them reviewed to show you're high level.

Without AI, an employee is worse off in all of the above compared to folks who will cheat to get ahead.

I can't see how "requesting" folks for forego their own self-preservation will work. especially when you've spent years pitting people against each other.

cmiles8 ・ an hour ago

The optics here are really bad for Amazon. The continuing mass departures of long tenured folks, second-rate AI products, and a string of bad outages paints a picture that current leadership is overseeing a once respected engineering train flying off the tracks.

News from the inside makes it sound like things are getting pretty bad.

sdevonoes ・ an hour ago

Reviewing AI generated code at PR time is a bottleneck. It cancels most of the benefits senior leadership thinks AI offers (delivery speed).

There’s also this implicit imbalance engineers typically don’t like: it takes me 10 min to submit a complete feature thanks to Claude… but for the human reviewing my PR in a manual way it will take them 10-20 times that.

Edit: at the end real engineers know that what takes effort is a) to know what to build and why, b) to verify that what was built is correct. Currently AI doesn’t help much with any of these 2 points.

The inbetweens are needed but they are a byproduct. Senior leadership doesn’t know this, though.

hard24 ・ an hour ago

Indeed. My view as a CEO is, if you are still reviewing the code yourself then what use is it that you can produce a bunch of text at a faster rate?
I'd prefer people wrote good quality code and checked it as they went along... whilst allowing room for other stuff they didn't think of to come to the front. The production process of using LLMs is entirely different, in its current state I don't see the net benefit.
E.g. if you have a very crystalised vision of what you want, why would I want an engineer to use an LLM to write it, when the LLM can't do both raw production and review? Could this change? Sure. But there's no benefit for me personally to shift toward working that way now - I'd rather it came into existence first before I expose myself to incremental risk that affects business operations. I want a comprehensive solution.
qnleigh ・ 28 minutes ago

Surely they know all this. They're worried about AI code degrading codebase quality, so they're putting on the brakes.
beardedetim ・ an hour ago

This is what I don't understand about this policy. There's no way a senior has enough spare capacity to be the gate keeper on every PR made by AI below them. So now we are just making it so the senior people use more AI to keep up but now they're to blame for letting it happen.
It sounds like a piss poor deal for seniors unless senior engineer now means professional code reviewer.

julienchastang ・ 10 minutes ago

> best practices and safeguards are not yet fully established

The way I am working with AI agents (codex) these days is have the AI generate a spec in a series of MD documents where the AI implementation of each document is a bite sized chunk that can be tested and evaluated by the human before moving to the next step and roughly matches a commit in version control. The version control history reflects the logical progression of the code. In this manner, I have a decent knowledge of the code, and one that I am more comfortable with than one-shotting.

dedoussis ・ 5 minutes ago

How do they determine whether a PR is AI-assisted and therefore requires senior review? A junior engineer could still copy-paste AI-generated code and claim it as their own.

emotiveengine ・ a minute ago

Right? If they're using some sort of tool, there's always another tool to fool the tool.

lokar ・ 2 hours ago

If this is true, it misunderstands the primary goals of code review.

Code review should not be (primarily) about catching serious errors. If there are always a lot of errors, you can’t catch most of them with review. If there are few it’s not the best use of time.

The goal is to ensure the team is in sync on design, standards, etc. To train and educate Jr engineers, to spread understanding of the system. To bring more points of view to complex and important decisions.

These goals help you reduce the number of errors going into the review process, this should be the actual goal.

burkaman ・ 2 hours ago

Source is https://www.ft.com/content/7cab4ec7-4712-4137-b602-119a44f77..., archived at https://archive.is/hLd8X

Someone1234 ・ 2 hours ago

Thanks for the links. Strangely I cannot get past the arhive.is "I am not a robot" wall. I click it, then it refreshes, I click it again, and then it asks me to find Traffic Lights, and then "I am not a robot," repeat.
Maybe I need a bot to do this for me...
- distances ・ an hour ago
  
  I don't know what changed, but in recent months it has become impossible to pass. Not a single success, whereas earlier it wasn't failing ever. Firefox on Android. Maybe I look like a bot now, for whatever reason.
- bink ・ an hour ago
  
  I've had the same problem for the last few days, just repeated CAPTCHAs.
- skeledrew ・ an hour ago
  
  Archive link took me right in; always has. Could be because I use NoScript.

Lalabadie ・ 2 hours ago

I'm not sure the sustainable solution is to treat an excess of lower-quality code output as the fixed thing to work with, and operationalize around that, but sure.

gtowey ・ 2 hours ago

It's the same as the offshoring episode of the early 2000's. There is such a massive financial incentive to somehow make the low quality code work. And they will try to resist the reality that it's a huge net negative for as long as they can.

ritlo ・ 2 hours ago

The only way to see the kinds of speed-up companies want from these things, right now, is to do way too little review. I think we're going to see a lot of failures in a lot of sectors where companies set goals for reduced hours on various things they do, based on what they expected from LLM speed-ups, and it will have turned out the only way to hit those goals was by spending way too little time reviewing LLM output.

They're torn between "we want to fire 80% of you" and "... but if we don't give up quality/reliability, LLMs only save a little time, not a ton, so we can only fire like 5% of you max".

(It's the same in writing, these things are only a huge speed-up if it's OK for the output to be low-quality, but good output using LLMs only saves a little time versus writing entirely by-hand—so far, anyway, of course these systems are changing by the day, but this specific limitation has remained true for about four years now, without much improvement)

SoftTalker ・ an hour ago

So will it turn out that actually writing code was never the time sink in the first place?
That has always been my feeling. Once I really understand what I need to implement, the code is the easy part. Sure it takes some time, but it's not the majority. And for me, actually writing the code will often trigger some additional insight or awareness of edge cases that I hadn't considered.
- hard24 ・ an hour ago
  
  "So will it turn out that actually writing code was never the time sink in the first place?"
  Of course it wasn't! Do you think people can envision the right objects to produce all the time? Yeah.. we have a lot of Steve Jobs walking around lol.
  As you say, there's 'other stuff' that happens naturally during the production process that add value.
hard24 ・ an hour ago

My prediction is a concorde-like incident is going to shatter trust and make people re-think their expectations of the capabilities of LLMs and their abilities of the present.
Essentially something big has to happen that affects the revenue/trust of a large provider of goods, stemming from LLM-use.
They wont go away entirely. But this idea that they can displace engineers at a high-rate will.
- Terr_ ・ 17 minutes ago
  
  Assuming you mean this crash [0], it reads to me more like a confluence of bad events versus a big fundamental design flaw in the THERAC-25 mold.
  I feel the current proliferation of LLMs is going to resemble asbestos problem: Cheap miracle thingy, overused in several places, with slow gradual regret and chronic harms/costs. Although... I suppose the "undocumented nasty surprise" aspect would depend on future adoption of local LLMs. As long as they're a cloud-subscription, people are less likely to forget they're being used.
  [0] https://en.wikipedia.org/wiki/Air_France_Flight_4590

LogicFailsMe ・ an hour ago

For the good of the company's future, all code should be reviewed by L10s going forward before they are accepted. They're the only ones with enough skin in the game to know what really matters after all.

And from their sagely reviews, we shall train a large language model to ultimately replace them because the most fungible thing at Amazon is the leadership.

AlotOfReading ・ 2 hours ago

I'm not surprised by the outages, but I am surprised that they're leaning into human code review as a solution rather than a neverending succession of LLM PR reviewers.

I wonder if it's an early step towards an apprenticeship system.

monarchwadia ・ 2 hours ago

Interesting. How would it be an early step towards an apprenticeship system?
bilbo0s ・ 2 hours ago

You shouldn't be surprised.
How else would they train the LLM PR reviewers to their standards?
I've never personally been in the position, because my entire career has been in startups, but I've had many friends be in the unenviable position of training their replacements. Here's the thing though, at least they knew they were training their replacements. We could be looking at a potential future where an employee or contractor doesn't realize s/he is actually just hired to generate training data for an LLM to replace them, and then be cut.

butILoveLife ・ 41 minutes ago

Maybe its my 1 buddy that works at amazon, but they seemed extremely slow to adopt LLMs. Big ships take a long time to turn, but this seemed hostile.

I am seeing this mindset still, with AI Agents. I imagine they will slowly realize they need to use this stuff to be competitive, but being slow to adopt AI seems like it could have been the source of this.

mhitza ・ 2 hours ago

https://xcancel.com/lukolejnik/status/2031257644724342957

throwaw12 ・ an hour ago

If Seniors are going to review every GenAI generated code, how do they keep up with the volume of changes?

So you have 2 systems of engineers: Sr- and Sr+

1. Both should write code to justify their work and impact

2. Sr- code must be reviewed by Sr+

What happens:

a. Sr+ output drops because review takes their time more and more

b. Sr+ just blindly accepts because of the volume is too high, and they should also do their own work

c. Sr+ asks Sr- to slow-down, then Sr- can get bad reviews for the output, because on average Sr+ will produce more code

I think (b) will happen

smy20011 ・ an hour ago

An outage could cost Amazon ~millions to tens of millions. Most of the time, we want the junior to learn from the outage and fix the process. With AI agent, we can only update the agent.md and hope it will never happen again.

Insanity ・ an hour ago

It's only going to get worse with the brain drain as a result of the layoffs. Which will increase the use of AI assisted coding and increase the number of outages related to this.

Imagine having to debug code that caused an outage when 80% is written by an LLM and you now have to start actually figuring out the codebase at 2am.. :)

letitgo12345 ・ an hour ago

Worth noting that this is when they used Amazon's own AI product, not when using Claude Code or Codex.

dragonelite ・ 2 hours ago

Expect a shitload of AI powered code review products the next 18 months.

daheza ・ an hour ago

Create the problem and then create the solution.
gdulli ・ an hour ago

"Why don't they just make the plane out of the black box?"
hard24 ・ an hour ago

This is incredibly circular lol...
AlexeyBrin ・ 2 hours ago

You mean like what Anthropic announced yesterday ? Code Review can review your code for $15 - $25 per review.
/s
So now, you can speed up using Claude Code and use Code Review to keep it in check.

mattschaller ・ an hour ago

Anyone work with Kiro before? As I understood, it was held as an INTERNAL USE ONLY tool for much longer than expected.

daheza ・ an hour ago

I used Kiro IDE and really liked it. The all you can eat model of LLM usage is very tempting compared to say Cursor. The features in the editor are basically the same.
Haven't tried Kiro CLI.

oxqbldpxo ・ an hour ago

Not fun to work at amazon.com it seems.

skeledrew ・ 2 hours ago

> the affected tool served customers in mainland China

Thought this blurb most interesting. What's the between-lines subtext here? Are they deliberately serving something they know to be faulty to the Chinese? Or is it the case that the Chinese use it with little to no issue/complaint? Or...?

guessmyname ・ 2 hours ago

dupe: https://news.ycombinator.com/item?id=47319273 (10 hrs ago)

AlexandrB ・ 35 minutes ago

"We want you to use AI for everything!"

"No, not like that though!"

fredgrott ・ 36 minutes ago

Curious question, how many Amazon Engineers flunk basic CS?

If you know CS you know two things:

1. AI can not judge code either noise or signal, AI cannot tell. 2. CS-wise we use statistic analysis to judge good code from bad.

How much time does it take to take AI output and run the basic statistic tools for most computer languages?

Some juniors need firing outright

bigbuppo ・ 2 hours ago

Ugh. The Great Oops has never been closer.

MDGeist ・ an hour ago

A former colleague of mine recently took a role that has largely turned out to be "greybeard that reviews the AI slop of the junior engineers". In theory it sounds workable, but the volume of slop makes thoughtful review impossible. Seems like most orgs will just put pressure on the slop generators to do more and put pressure on the approvers and then scape goat the slop approvers if necessary?

dude250711 ・ 2 hours ago

I knew this would happen.

Take a perfectly productive senior developer and instead make him be responsible for output of a bunch of AI juniors with the expectation of 10x output.

frogperson ・ 2 hours ago

makes me want to vomit. I am not spending more time reviewing code than the "author" spent creating it. Ill just leave the industry if that happens.
- hard24 ・ an hour ago
  
  I think as long as having to review code stays around, the 'artistry' of writing code isn't going away.
  Think about it - how do you increase the speed at which one can review code? Well first it must be attractive to look at - the more attractive the faster you review/understand and move through the review. Now this won't be the case everywhere - e.g. in outsourced regions the conditions will force people to operate a certain way.
  Im not a SWE by trade, I just try to look at things from a pragmatic stand-point of how org's actually make incremental progress faster.

ChrisArchitect ・ an hour ago

[dupe] Source: https://news.ycombinator.com/item?id=47319273

th2o34i3432897 ・ an hour ago

First Microsoft and now Amazon (eg. their RufusAI is useless compared to the old comment search!)

Has Seattle now become the code-slop capital ? Or is SFO still on top ?

josefritzishere ・ 2 hours ago

The excessive exuberance of AI adoption is all part of the bubble.

undefined ・ 43 minutes ago

[deleted]

throw_m239339 ・ an hour ago

Yet another example of vibe coding at scale. You'll have to hire a lot of seniors out of retirement to fix that mess of gigantic proportions... and don't blame "the juniors" for that, they didn't make the decision to allow those tools at first place.

ihsw ・ an hour ago

[dead]

irishmanlondon ・ 2 hours ago

[dead]

TesterVetter ・ 2 hours ago

[dead]