Hey all, Boris from the Claude Code team here.
We've been investigating these reports, and a few of the top issues we've found are:
1. Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred. To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude.
2. People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins. This was the case for a surprisingly large number of users, and we are actively working on (a) improving the UX to make these cases more visible to users and (b) more intelligently truncating, pruning, and scheduling non-main tasks to avoid surprise token usage.
In the process, we ruled out a large number of hypotheses: adaptive thinking, other kinds of harness regressions, model and inference regressions.
We are continuing to investigate and prioritize this. The most actionable thing for people running into this is to run /feedback, and optionally post the feedback ids either here or in the Github issue. That makes it possible for us to debug specific reports.
Boris, you're seeing a ton of anecdotes here and Claude has done something that has affected a bunch of their most fervent users.
Jeff Bezos famously said that if the anecdotes are contradicting the metrics, then the metrics are measuring the wrong things. I suggest you take the anecdotes here seriously and figure out where/why the metrics are wrong.
On the subject of metrics, better user-facing metrics to understand and debug usage patterns would be a great addition. I'd love an easier way to understand the ave cost incurred by a specific skill, for example. (If I'm missing something obvious, let me know.)
Baking deeper analytics into CC would be helpful... similar to ccusage perhaps: https://github.com/ryoppippi/ccusage
This is useful if you want to keep an eye on what claude's actually doing behind the scenes: https://github.com/simple10/agents-observe
[dead]
We are taking it seriously, and are continuing to investigate. We are not trusting the metrics.
The quantitative ux research team at Google was created for exactly this problem: a service which became popular before the right metrics existed, meaning metrics need to be derived first, then optimized. We would observe users (irl), read their logs, then generate experiments to improve the behavior as measured by logs, and return to see if the experiment improves irl experiences. There were not many of us and we are around :)
I worked with Boris in the past and in my experience, Boris cares deeply about the customer. I'd vouch that Boris really cares about the issue people are running into.
“ Hello. My name is Mr. Sirob,”
https://amphetamem.es/meme/?id=the-simpsons_04_12_89×ta...
But no other user has yet come and said "I worked with ajma in the past ..." so how can we trust your judgement about Boris?
I saw this guy named Claude saying ajma is a genius!
Nice try boris
[flagged]
Anthropic can't win in this case.
They don't use Claude Code, they get accused that they don't even trust it themselves.
They use Claude Code, they get accused the code is shit because it's slop.
I think dogfooding is known to be a legitimate approach here.
The idea is that Claude Code is surprisingly buggy and unrefined for something created by the very tool and processes that are supposed to be replacing us as we speak.
The idea is that sculpted ideal code is rarely the best choice.
At the same time I'd say sloppy code (human or AI generated) is rarely the best choice. I'd say the best is in between.
And they don't use our version of CC, or with our settings. They have flags for internal use only.
> Anthropic can't win in this case.
Sure they can. The solution is pretty simple and in your own post. Choose either:
* Make the product good to the point code is no longer slop and shit.
* Stop hyping the quality when it isn’t there.
* Do a hybrid approach. Use their own product but actually have competent humans in the loop to make the code good.
This is not hard. Be honest and humble and that criticism goes away. It’s no one’s fault but Anthropic’s that they hype up their product to more than it can do and use it carelessly to build itself. It’s not a no-win scenario if you’re the one causing your own obviously avoidable problems.
Google products ux is widely acknowledged to be a steaming pile of shit though, so I am not sure you should follow their example.
Many of the metrics they use are obviously actively user hostile.
Metrics and quantitative ux results in really bad software, making it rigid while optimizing for the wrong things.
The most obvious example is Google creating multiple steps for Login where you have to enter your password after you put in your user.
I wonder what metric lead to that decision or was it a political decision to make it seem like their "old" software has some new feature.
If you mean Google website login, that step is needed because the email address is used to determine which identity provider to use. E.g. I have three different accounts that branch off from that same initial login flow.
One is my person "gmail.com" account, and the other two go through enteprise identity providers related to my employment and their G-Suite licenses. So after I put in one of these three email addresses, I get prompted for the appropriate next step. Only one of them involves giving a password to a Google server. The other two are redirects to completely separate login systems operated by my employer.
I mean I get it logically makes sense. But it still seems like a waste of time for a small percentage of use cases.
Maybe a better approach is put in your login have it automatically detect if it requires an identity provider. Gray out the password to signal to the user password is not necessary and automatically redirect.
Less clicking, don't break flow and think of a smoother solution.
Thank you
[flagged]
[flagged]
HN sometimes talks about pathological customers who will never be happy. Boris is probably the single best rep in the community, possibly ever.
The way your tone and complaints come across reminds me of this. As a paying customer ($5k spend per month in my corporate job), I’d rather anthropic keep doing what they’re doing — innovating and shipping useful stuff at blinding speed — and not index on your feedback. I think the tradeoffs they would cost far outweigh the consequences.
> Boris is probably the single best rep in the community, possibly ever.
When you say “the community”, what exactly are you referring to?
Dang man, chill.
Man, expecting the minimal from companies who are supposed to deliver a pro... there is no SLA for any this, so you are right.
Also, why is there no SLA?
You’re not getting a worthwhile sla on a subscription at this rate. What are you going to get? A few dollars? An sla isn’t useful unless it actually bites for the provider and actually compensates the customer. And it costs money - how much are you willing to spend for this insurance?
because there isn't one and people still paid for it.
My clients demand one, so there is one.
Imagine if people were like your clients.
If they were, they wouldn't buy your product without an SLA. But they're not.
Because this is ultimately a beta service. The whole industry is.
Wait, where is there a 'beta' tag to something that they are charging real money for? Why is this software any different than any other software and we should completely give away our rights as a consumer to ensure what we pay for is delivered?
I think the parent is saying that one should be aware that the whole LLM industry is still in an experimental stage and far from mature. What you want isn’t what’s being offered. I agree that there should be higher standards, but what we currently have is an arms race. The consequence is to factor that into the value proposition and maybe not rely too much on it.
SLAs should be standard for any paid service, especially on the enterprise side, but also on the consumer side. Being immature as a company does not excuse a lack of service delivery.
Not every customer, even a paying customer, demands reliability at a particular level. Market segmentation tends to address those situations: pay more, get more.
> pay more, get more
Users on $200 plan complaining, already at max level of subscription, I don't think a $200 subscription should make you feel like you are getting unfair advantage. Like restricting claude -p to API ... after I paid so much? Moderate use should not do that. I am not running it batch mode on a million inputs.
'I don't want to hold companies to account for failing to deliver services, therefore I think everyone else should live by my permissive "standards".'
They can be held to account when they fail to deliver what they promise! But what is promised for delivery is what's in the Terms of Service (i.e. the agreement). Nothing more. If it's not in there, you can't hold them to account for it.
Yes, that's the problem.
It's too easy for companies to fail to provide their service as long as they never promise to provide their service.
> It's too easy for companies to fail to provide their service as long as they never promise to provide their service.
I don't even know what this means. You can't make anyone work for free, nor dictate the terms of what kind of work someone will do without their consent. I assume you are not pro-slavery.
I'll make a very simple example.
The service at mcdonald's is providing food for money.
When their ice cream machine is broken, they fail to provide part of their service.
I'm not saying anything about "making" them do anything. I'm just calling out their failure and saying it's a bad thing.
You didn't merely call out their failure. You said it was "too easy," implying something more, like they owe you something. It's a pretty entitled point of view.
I don't think it's "entitled" to want companies to put some effort into avoiding those failures.
If the government did something, we could think of it as similar to passing inspection.
The other way to look at things is that the market isn't varied and competitive enough to punish the companies that fail this way.
They don't have to "owe me" anything for me to desire a different balance. My desire is fine.
"[W]ant[ing] companies to put some effort into avoiding ... failures" is not the same as "hold[ing] them to account". The former is "this sucks and I don't like it." The latter is "punish them or force them to do what I want!"--i.e., some sort of legal remedy.
If you can point to a consumer targeted service that provides and keeps their SLAs, I’ll be impressed.
What right as a consumer do you have that is pertinent here, other than to have the vendor adhere to the terms of the agreement you have with them?
Anthropic has many customers despite the fact that they have occasional problems. They’re not suing Anthropic because Anthropic isn’t promising in its agreement something they can’t deliver.
I think you’re reading into the agreement something that isn’t there, and that’s the cause of your confusion.
I am not reading into an agreement, I am saying there is no agreement to be found to ensure service delivery and the associated liability that would come for any SLA. Also, where is the Anthorpic SLA for Enterprise?
Does it exist?
Just because people pay for things doesn't mean they know or understand what they are paying for. Nor is there the legal precedence to actually understand where the rub lies or how that impacts business.
> Just because people pay for things doesn't mean they know or understand what they are paying for.
I believe, respectfully, that’s precisely what is happening in this thread because you keep complaining about the absence of an SLA that was never in the agreement, as though it is—or is supposed to be—there, and therefore the existence of some “rights” that would flow from that.
There are no SLAs, in any agreement, thats the problem.
We're back to square one: https://news.ycombinator.com/item?id=47741877
It's incredible that Boris is here on HN being open and sharing an issue they don't fully understand yet, and offering a possible workaround. CTFO.
Thank you Boris.
I am sorry you feel this way, but the reality of the situation is there is zero reason to trust anything Anthropic or Boris says. They have no legal liability or obligation to tell the truth, besides brand risk, which to people like you is mitigated for a single person to show up, post, and thats it.
You should work at these companies and understand they have good intentioned employees otherwise they’d rarely pass the cultural interviews plus background checks plus backchanneling. Have a bit more faith in the employees
> Have a bit more faith in the employees
Have you been asleep for a decade?
lol it is _way_ too easy for people to talk like this behind a computer screen.
What truth do you believe you are not being told, exactly?
Dude is on hacker news on a Sunday. half the GDP of the world is competing with him. What metrics would you like to see?
An enforceable SLA with the services that Anthropic offers rather than putting an employee to respond to things on Sunday.
>> rather than putting an employee to respond to things on Sunday.
Maybe just maybe they didn’t put him here, rather he just a normal guy who reads HN, who is passionate about his role, and is here on his own time.
Maybe... maybe... maybe... none of this builds trust when there is something that does build trust; putting revenue on the line and opening yourself to legal liability. Otherwise everything is empty and meaningless, its just PR, and nothing more.
You can get a SLA and ZDR by choosing one of the Claude partners (eg on Bedrock)
https://platform.claude.com/docs/en/build-with-claude/api-an...
Then you should offer to pay them for one. I’m sure they’d love to hear from you, and they could probably deliver one to you for the right price. But it will be a high price.
They don't offer a ZDR [0] for files, even if you have a BAA or dealing with HIPAA data, no matter how much you pay them. Trust me, we have tried.
- [deleted]
I’m really confused. We were talking about SLAs, not other product features. Are you moving the goalposts?
There isn't an SLA nor is there any protections around file uploads to their services. Two, bad, things can be true at the same time.
Did you talk to them about purchasing an SLA? If so, what did they say?
I feel like you aren't really understanding what a Service-level Agreement actually is in practice. It's not a piece of paper with a specific number of nines and an associated price tag. They can be and often are very complicated documents that take multiple rounds of redlining to arrive at something both parties agree to.
If zero data-retention was non-negotiable for the customer, it's totally possible that the negotiations ended there.
I'm not sure what you're trying to accomplish or unearth beyond what's already been said, which certainly suffices for me.
- [deleted]
As both an attorney and SRE, I understand what an SLA is. And you can absolutely get an SLA when you buy cloud services from many vendors, including AWS. Some vendors provide it at all price points; others include it at higher service tiers, without complex negotiations needed at all. And, yes, if it’s not on the menu, you may need to negotiate one. But you can’t conclusively say “they don’t offer one” unless you’ve actually gone to the company and asked.
https://aws.amazon.com/legal/service-level-agreements/
https://trailhead.salesforce.com/content/learn/modules/slack...
https://support.atlassian.com/subscriptions-and-billing/docs...
Before you casually accuse someone of not knowing what they’re talking about, first make sure you’re on firm ground yourself.
It seems like you could save a lot of time and confusion by talking about the SLA that you pay for from Anthropic instead of establishing your bona fides by posting links to various unrelated companies’ SLA pages.
Like how was your experience negotiating your SLA with Anthropic? What ballpark are you paying for the SLA with Anthropic that you have in place? How many 9s does your Anthropic SLA cover? Obviously you haven’t posted a half dozen times in this thread about how Anthropic by nature of existing offers SLAs without any knowledge of that, so some simple stuff about your SLA with Anthropic would be helpful.
I make no unqualified claims as to whether Anthropic offers an SLA. I never did. But I do know that it's unreasonable to claim they don't when you didn't even take the steps to conclusively determine it for yourself.
As I said: "I’m sure they’d love to hear from you, and they could probably deliver one to you for the right price. But it will be a high price."
Oh, well in that case, if posting URLs counts as proof of… something, there doesn’t appear to be any SLA page anywhere in their sitemap. https://www.anthropic.com/sitemap.xml
Maybe it is just common for enterprise SaaS businesses to offer SLAs without having a page about it though. Something like that could possibly be unjustifiably burdensome as well because it’s not like they could just type “make a page about how we offer SLAs” and have it magically appear
Not everything a business might be willing to do is listed on their public website.
That’s a good point. Having an SLA page is an indicator that a business offers SLAs, not having an SLA page is also an indicator that they offer SLAs, just secretly. If you think about it all of the people constantly complaining about uptime and saying stuff like “I would pay money for an SLA from Anthropic if I could” probably means that they are killing it with all those secret SLAs.
I mean obviously they have to offer them, because they exist, as otherwise you’d have to believe something crazy like “they don’t currently offer them” for reasons “that they haven’t disclosed”
Again, many companies will do things they don’t ordinarily offer for the right price. I’ve seen it happen myself (on both the buyer and seller side) on many occasions.
It goes to the extent of the company itself! Very few businesses publicize that they’re for sale or put their company’s purchase price on their website. But acquisitions happen all the time.
Anyway, I don’t appreciate your sarcasm coupled with what seems to be willful ignorance about how the world works, so I won’t be participating in this discussion with you anymore.
I don’t get it. If you wanted to convince everybody about a vast universe of secret business and your expertise in it, why would you start with telling people that weren’t able to get an SLA from Anthropic that Anthropic offers SLAs? And then admit that you don’t actually know and then double down?
Like if I wanted to convince people that In’N’Out has a secret menu (they do) I wouldn’t start by saying “They have the ingredients to make onion rings, therefore they sell onion rings” (they do not). They offer burgers with lettuce instead of a bun (“protein style”) though. That’s a fact that you can verify by going there or calling them and asking about it. I didn’t rely on my assumptions based on other fast food restaurants, I relied on my knowledge of the topic!
Edit: It seems like bad faith to admit that you’re using “probably” interchangeably with “I don’t know” and then editing in “for a billion dollars” several posts into a conversation.
I guess enjoy posting about entirely unrelated conversations in other threads though. (otterley’s post about my having previously had a short amicable exchange with dang in a different thread was deleted, but I’ll leave this part up. I think digging through people’s post histories to find unrelated grievances is icky, for lack of a better word, and wildly unhelpful for any type of discussion)
Even with the “for a billion dollars” addition, admitting “I don’t know” and “probably” are interchangeable doesn’t really change anything from a logical standpoint. Nobody argued against you not knowing, so I don’t understand the purpose of the repetition.
> why would you start with telling people that weren’t able to get an SLA
That hasn’t been established. There’s no evidence that they went to Anthropic and tried to negotiate one.
> that Anthropic offers SLAs
I didn’t. I said “they probably will for the right price.” There are two modifiers in that statement. And the price is unspecified. Their first offer could be a billion dollars. Too expensive? Negotiate down.
I would invite you to notice your interlocutor's assumptions, especially as revealed in his prior comment. Look at how he misunderstands the situation:
> If you wanted to convince everybody about a vast universe of secret business and your expertise in it...
> Like if I wanted to convince people that In’N’Out has a secret menu...
You are discussing business. He is understanding you to be attempting to "mog" him, because he cannot adopt a perspective wherein the conversation represents anything other than a vacuous social challenge or "brodown."
In short, you're wasting your time.
I am so old :(
I looked up “mogging” and I’d think “my assumptions about stuff are valid because I’m a lawyer and don’t know what you do” would count more as mogging than “that doesn’t quite sound right, this is a conversation about something specific and not your general cleverness” but I’ve got a Benny Hill archive to get through
Those are not assumptions on your interlocutor's part. You've embarrassed yourself quite badly, I'm afraid. I know you don't understand how, but that doesn't change the fact of it.
> You've embarrassed yourself quite badly, I'm afraid.
:( you are right. This isn’t the first time I’ve lost an argument because hours into a discussion somebody introduced “what if a billion dollars” or “magic amulet” or “ブルマの母” etc
A billion dollars is just an example. I could have said a million. When someone says "a high price" that's unspecified, you can use your imagination to hazard a guess at what that might be. Such a figure might seem unreasonable or unrealistic to you, but deals are done between companies under terms most individuals wouldn't come close to considering.
The only reason I mentioned being an attorney was because someone in the thread above accused me of not understanding SLAs. I don't ordinarily bring it up unless we're talking about law or contracts and I feel the need to defend myself or correct misunderstandings. I don't try to use it to browbeat anyone into submission, although I do believe that respect for others' lived experiences and education is relatively uncommon here on HN.
I also don't care for my words to be misconstrued to mean something I didn't say. I rarely speak in absolutes because I've learned over time that there are very few absolutes in the world. Thus, I include qualifying language in nearly everything I write. So when someone accuses me of making claims of certainty that I didn't make, I can get pretty defensive about that.
You know, I had come away with the impression of you as someone able to take embarrassment with good grace, to "walk it off" without either crumpling under the weight of unhandled insecurities, or letting your ego insist on turning it into an escalation dominance contest. There is always something to learn from the experience of making a fool of oneself (1), and you struck me as someone very well prepared and equipped to do so. It's a rare capacity to encounter.
Disappointing me shouldn't make much nevermind to you; you don't know me from Adam. But think of the people in your life who care for you and vice versa, or of the kind of folks you would like to be there. Wouldn't you rather behave so they may regard you in the way I just described?
It's hard to acknowledge a situation like this one, especially in its moment, especially when you're young. Being able to do hard things, well and gracefully, is another skill we do very well to cultivate. You were putting in some good practice, and the other gentleman (esq.) has offered some good advice in consequence.
It'd be a shame to blow it here at the very last moment, don't you think?
(1) Ever shit yourself in public, right there in front of God and everybody? I did, about ten years ago - there was a time in this town before the health code had teeth, when eating at the wrong place or on the wrong day could be like taking your life in your hands. Let me tell you, after that day - complete with an hour cleaning yourself up in a Subway restaurant toilet, followed by the train ride home - discovering you have inadvertently said something a little dumb on the Internet falls easily into something much more like the perspective it is due.
It's just a world you've never seen. Don't take it too personally.
I appreciate your kindness. While I’ve got you, did you know that the Benny Hill show started in 1955 and a good chunk of what aired from then to 1969 was lost? There are a lot of fans that don’t even realize that what is sometimes labeled as season 1 is season 15! Crazy stuff!
I had not known that! In a similar vein, there exists an Alice in Wonderland-themed Muppet Show episode, starring Brooke Shields, which has had to be left out of home video releases due to so far unresolvable music licensing issues. Not quite totally lost, but somewhat hard to find!
I’ll check that out! If I find a good link for it I’ll post it as a reply here.
- [deleted]
Boring corporate Ai will surely come, but hey, lets enjoy the wild west while it lasts. I am grateful to see Boris come here to address problems people face. I 100% sure nobody is making him - he has one of the coolest jobs in the world.
>he has one of the coolest jobs in the world.
So that means we just eject any critical thinking when it comes to companies, especially where they is no liability or obligation for them (Boris or Anthropic) to be honest.
Other than 'trust'.
Don’t like Anthropic? Use a competing service. At this point the sheer volume of your commentary is not particularly complimentary to your own critical thinking skills. It’s not your job to correct the internet or to convince randoms of the rightness of your position. Of all the things in the world to be pissed at so insistently, this seems to be a pretty minor one.
But the default 1M context window just rolled out a few weeks ago. If refreshing old sessions on 1M context windows is the problem, it's completely aligned with what Boris is saying.
So Anthropic is trying to save money on infrastructure, we all get it. However, it's not ok to degrade the performance your users have paid for. Last week the issue was that you reduced the default "effort" level, now the prompt cache is shortened. Several users experience far more restrictive usage limits lately.
There is only so much you can do through "UX improvements" or some smart routing on the backend. Your flagship product is actively getting worse, and if users need to fiddle with hidden settings and keep track of GitHub issues every week they will start voting with their money.
Dear sir, please think of the shareholders, they need a fair exit.
For context, my company gives each developer a decent monthly allowance for Claude and if push comes to shove, we are allowed to fallback to using AWS Bedrock hosted Anthropic models.
When you pay for a Claude subscription, what exactly were you promised?
> they will start voting with their money.
And go where? Sooner or later the party is going to be over and Claude and its competitors are going to have to start charging enough to actually be profitable when the VC money dries up.
> When you pay for a Claude subscription, what exactly were you promised?
I was promised 5x or 20x the amount of resources that the free tier would offer. I implicitly expected the same quality too, not some watered-down version of the product they allowed me to sample before committing to a subscription.
Sooner or later Anthropic will run out of VC money, yes. That's their problem, not mine. When I took an Uber while it was subsidized by venture capital, the driver did not drop me half way through my destination because they were having cash flow issues.
So how do you know that the free tier hasn’t been reduced by 5x?
It’s exhausting enough to deal with services that change around on an annual/semi-annual basis with pricing and expectations.
Now the expectation is that we should tolerate goalposts being shuffled around on a weekly/daily basis with the added requirement of digging into bug tickets because there’s no attempt at transparency? The tech is cool but this is absolutely insane.
If you’re an individual developer paying $100-200/mo for a service that keeps changing, there is a LOT of reason to keep an eye on other products.
I’m not saying that there isn’t a reason to keep an eye on other products. I’m saying that every other product in the space has the same unit economics and will eventually need to charge enough to be profitable - and to continue training and hardware expansion.
Honestly a developer paying $200 a month is a nothingburger and if using their service to the fullest is losing them money.
For context, the company I work for gives each consultant a $2000 a month allowance and I think there are probably around 500-700 people with that allowance. I’m sure everyone doesn’t use it all.
If they have limited hardware resources, where do you think they are going to focus?
Classic VC pump playbook - run it uneconomically until everyone is addicted, then 5x prices once you have enough critical mass. See 2010s "Millennial Lifestyle Subsidy"..
It seems pretty transparent that they are heavily resource constrained, (training run for Claude 5.x, higher usage / growth than anticipated). I don’t disagree that their long play is monopolistic pricing, but what we’re observing seems better explained by the fact they have a very tight compute budget they are trying to optimize over to put as much as they can into next gen experiments / training to make sure they stay competitive over the next 6-months / year.
you know once, anthropic was supposed to be a public benefit org!
Where did they say the prompt cache is shortened?
from 1h to 5 minutes, was in the news recently
But not as a solid fact?
The HN thread in question is here (and had that info edited out of the title)
Why did this become an issue seemingly overnight when 1M context has been available for a while, and I assume prompt caching behavior hasn't changed?
EDIT: prompt caching behavior -did- change! 1hr -> 5min on March 6th. I'm not sure how starting a fresh session fixes it, as it's just rebuilding everything. Why even make this available?
It feels like the rules changed and the attitude from Anth is "aw I'm sorry you didn't know that you're supposed to do that." The whole point of CC is to let it run unattended; why would you build around the behavior of watching it like a hawk to prevent the cache from expiring?
> 1hr -> 5min on March 6th
This is not accurate. The main agent typically uses a 1h cache (except for API customers, which can enable 1h but it is not on by default because it costs more). Sub-agents typically use a 5m cache.
https://github.com/anthropics/claude-code/issues/46829#issue... - Have you checked with your colleague? (and his AI, of course)
Doesn't what's said at the link approximately agree? The 5m bug was said to be isolated to use of overage (API billing).
Then my original question stands: why did this become an issue seemingly overnight if nothing changed?
So if I run a test suite or compile my rust program in a sub agent I’m going to get cache misses? Boo.
Sub agents don't have much context and don't stay around for long, so misses in that case are trivial.
As of yesterday subagents were often getting the entire session copied to them. Happened to me when 2 turns with Claude spawned a subagent, caused 2 compactions, and burned 15% of my 5-hour limit (Max 5x).
how long they stay around after the cache miss is irrelevant if I am burning all the prior tokens again. also, how much context they have depends entirely on the task and your workflow. I you have a subagent implement a feature and use the compile + test loop to ensure it is implemented correctly before a supervisor agent reviews what was implemented vs asked then yes, subagents do have a lot of context.
... so how do API users enable 1hr caching? I haven't found a setting anywhere.
would like to know this too ;D
there is env.ENABLE_PROMPT_CACHING_1H_BEDROCK - but that is - as the name says "when using Bedrock"
for the raw API the docs are also clear -> "ttl": "1h" https://platform.claude.com/docs/en/build-with-claude/prompt...
but how to make claude-code send that when paying by API-key? or when using a custom ANTHROPIC_BASE_URL? (requests will contain cache_control, but no ttl!)
The /clear nudge isn't a solution though. Compacting or clearing just means rebuilding context until Claude is actually productive again. The cost comes either way. I get that 1M context windows cost more than the flat per-token price reflects, because attention scales with context length, but the answer to that is honest pricing or not offering it. Not annoying UX nudges. What’s actually indefensible is that Claude is already pushing users to shrink context via, I presume, system prompt. At maybe 25% fill:
If there’s a cost problem, fix the pricing or the architecture. But please stop the model and UI from badgering users into smaller context windows at every opportunity. That is not a solution, it’s service degradation dressed as a tooltip.“This seems like a good opportunity to wrap it up and continue in a fresh context window.” “Want to continue in a fresh context window? We got a lot of work done and this next step seems to deserve a fresh start!”The cost issues they're seeing (at least from what they've stated) are from users, not internally. Basically, it takes either $5 or $6.25 (depending on 5m or 1h ttl) to re-ingest a 1M context length conversation into cache for opus 4.6, that's obviously a very high cost, and users are unhappy with it.
I think 400k as a default seems about right from my experience, but just having the ability to control it would be nice. For the record, even just making a tool call at 1M tokens costs 50 cents (which could be amortized if multiple calls are made in a round), so imo costs are just too high at long context lengths for them to be the default.
currently "clear makes it worse" https://github.com/anthropics/claude-code/issues/47098 + https://github.com/anthropics/claude-code/issues/47107
launching with `CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1 claude "Hello"` till those are fixed seems to be th way
For me definitely the worst regression was the system prompt telling claude to analyze file to check if it's malware at every read. That correlates with me seeing also early exhausted quotas and acknowledgments of "not a malware" at almost every step.
It is a horrible error of judgement to insert a complex request for such a basic ability. It is also an error of judgement to make claude make decisions whether it wants to improve the code or not at all.
It is so bad, that i stopped working on my current project and went to try other models. So far qwen is quite promising.
I don't think that's accurate. The malware prompt has been around since Sonnet 3.7. We carefully evaled it for each new model release and found no regression to intelligence, alongside improved scores for cyber risk. That said, we have removed the prompt for Opus 4.6 since it no longer needed it.
I started seeing "not a malware, continuing" in almost every reply since around 2 weeks ago. Maybe you just reintroduced it with some regression? Opus 4.6
That's weird. Would you mind running /feedback and sharing the id here next time you see this? I'd love to debug
Sure, I really appreciate you looking at this.
a6edd0d1-a9ed-4545-b237-cff00f5be090 / https://github.com/anthropics/claude-code/issues/47027
I'm happy to provide any other info that can be useful (as long as i'm not sharing any information about the code or tools we use into a public github issue).
Thanks for the report! This was fixed in v2.1.92.
Please:
1. Upgrade to the latest: claude update (seems like you did this already)
2. Start a new conversations (resuming an old convo may trigger this bug again in that convo)
This is bloody great Boris. Thank you.
- [deleted]
Thank you! Looking
I’ve seen this a couple of times recently. Including right after compact. I’ll /feedback it next time I see it
Same. Will run it too when I next get it.
I've been using CC a decent amount the past few weeks and have never seen this malware stanza...?
1. I've never seen this. Is there a config option to unhide it if it's happening? Is this in Claude Code? Does it have to be set to verbose or something?
2. Can we pay more/do more rigorous KYC to disable it if it's active?
This warning is not enabled for modern models. No action needed. I'm digging into the report above as soon as they're able to /feedback.
> Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead
I don’t understand this. I frequently have long breaks. I never want to clear or even compact because I don’t want to lose the conversations that I’ve had and the context. Clearing etc causes other issues like I have to restate everything at times and it misses things. I do try to update the memory which helps. I wish there was a better solution than a time bound cache
Makes me wish that shortly before the server-side expiration, we could save the cache on the client-side, indefinitely.
But my understanding is that we're talking about ~60GB of data per session, so it sounds unrealistic to do...
Where are you getting 60GB from? It shouldn’t be that large.
But yes, would love to save context/cache such that it can be played back/referred to if needed.
/compact is a little black box that I just have to trust that is keeping the important bits.
The KV cache consists of activation vectors for every attention head at every layer of the model for every token, so it gets quite large. ChatGPT also estimates 60-100GB for full token context of an Opus-sized model:
https://chatgpt.com/share/69dc5030-268c-83e8-92c2-6cef962dc5...
That is actually nuts.... I'm trying to understand the true costs of AI, wonder how I plug this in!
There are ways to quantize or compress KV cache down.
I wanted this as well. Even asked about it at an openai talk. Basically a way to get the KV cache to the client (they can encrypt it if they care about me REing it, make a compressed latent if they don't wanna egress 20GB, whatever, I'm fine with a black box) so that I can load it later and avoid these cache misses.
I think the primary reason they cannot do this is that they change the memory and communication layouts in their serving stack rather aggressively. And naturally keeping the KV cache portable across all such layouts is a very difficult task. So you'd have to version the cache down to a specific deployment, and invalidate it the moment anything even small changes. So giving the user a handle to the cache sort of prevents you from making large changes to memory layout. Which is I suppose not that enticing. Also, client side KV caches are only meaningful in today's 1M contexts. Few y back it wasn't necessary, since just recomputing would be better for everybody.
To be clear, I don't mean they send it along with every request. Rather, they do their current TTL cache, and then when I'm at the end of a session, I request it in one shot and then close the session. And it doesn't have to come to the literal client, they can egress it to a storage service that we pay for, whatever. But ya the compat problem makes it all a non starter.
I don't want a nudge. I want a clear RED WARNING with "You've gone away from your computer a bit too long and chatted too much at the coffee machine. You're better off starting a new context!"
I don’t want a scary red message chastising me for not being responsive enough!
I often leave CC hanging (or even suspended) and use /resume a lot. I’m okay with that having some negative effect on my token limits.
Product design is hard. They can’t please us all. I don’t envy the team considering these trade offs.
Is it that hard though? This kinda smacks of no research on users prior to rolling stuff out.
Ack, it is currently blue but we can make it red
I think after the TTL expires the session should be autocompacted and the user should given a choice to continue with compacted version or be hit with the full read cost of continuing with their large but expired context. At the moment users are blind what is going on.
Why is nobody even asking why that should be an issue? No other text editor shits the bed that way. The whole point of the computer is that it patiently waits for my input.
let me put this way: not your ram, not your cache, not waiting patiently for your input.
Good thing they're not charging for it, then.
Good thing they didn't silently, quietly change cache from 1 hour to 5 minutes, right?
forget the warning, just compact like someone suggested in the ticket. Who would opt for a massive cache miss?
Hey Boris - why is the best way to get support making a Hacker News or X post, and hoping you reply? Why does Anthropic Enterprise Support never respond to inquiries?
I mean if we're building an unrelated wishlist... Can 20x max users get auto mode already? Or can the enterprise plans get something equivalent to 20x max?
Given I'm running two max accounts to get the usage I want, can we get a 25x and 40x tier? :-)
It’s called /extra-usage and they really want you to use it.
OpenAI (Codex) keeps on resetting the usage limits each time they fuck up...
I have yet to see Anthropic doing the same. Sorry but this whole thing seems to be quite on purpose.
Can you clearly state what they messed up?
Suddenly burning up the quota ~4x faster than usual is not a mess up in your opinion?
It is not inherently their fault though because usage is controlled both by the user and the harness behavior. So I was asking specifically what about the harness was messed up, can you provide that info?
It's all there, including the specific version regression, unearthed bugs, workarounds: https://github.com/anthropics/claude-code/issues/45756
[flagged]
[flagged]
LOL, funny how you're so happy to dismiss dozens of reports with hard data, and confirmed by the Claude Code team member.
Issue with the confirmation: https://github.com/anthropics/claude-code/issues/45756
Looks like you have an axe to grind and facts be damned? :D
Not parent but I can guess from watching mostly from the sidelines.
They introduced a 1M context model semi-transparently without realizing the effects it would have, then refused to "make it right' to the customer which is a trait most people expect from a business when they spend money on it, specially in the US, and specially when the money spent is often in the thousands of dollars.
Unless anthropic has some secret sauce, I refuse to believe that their models perform anywhere near the same on >300k context sizes than they do on 100k. People don't realize but even a small drop in success rate becomes very noticeable if you're used to have near 100%, i.e. 99% -> 95% is more noticeable than 55% -> 50%.
I got my first claude sub last month (it expires in 4 days) and I've used it on some bigish projects with opencode, it went from compacting after 5-10 questions to just expanding the context window, I personally notice it deteriorating somewhere between 200-300k tokens and I either just fork a previous context or start a new one after that because at that size even compacting seems to generate subpar summaries. It currently no longer works with opencode so I can't attest to how it well it worked the past week or so.
If the 1M model introduction is at fault for this mass user perception that the models are getting worse, then it's anthropics fault for introducing confusion into the ecosystem. Even if there was zero problems introduced and the 1M model was perfect, if your response when the users complain is to blame it on the user, then don't expect the user will be happy. Nobody wants to hear "you're holding it wrong", but it seems that anthropic is trying to be apple of LLMs in all the wrong ways as well.
I still love Claude and nothing but a ton of respect for Boris and the team building such a phenomenal product.
That said, I feel that things started to feel a bit off usage-wise after the introduction of 1M context.
I'd personally be happy to disable it and go back to auto-compacting because that seems to have been the happy medium.
Especially since Codex faced the same issue but the team decided to explicitly default to only ~200k context to avoid surprises and degradation for users.
[flagged]
Different users do seem to be encountering problems or not based on their behavior, but for a rapidly-evolving tool with new and unclear footguns, I wouldn't characterize that as user error.
For example, I don't pull in tons of third-party skills, preferring to have a small list of ones I write and update myself, but it's not at all obvious to me that pulling in a big list of third-party skills (like I know a lot of people do with superpowers, gstack, etc...) would cause quota or cache miss issues, and if that's causing problems, I'd call that more of a UX footgun than user error. Same with the 1M context window being a heavily-touted feature that's apparently not something you want to actually take advantage of...
Me and my colleagues faced, over the last ~1 month or so, the same issues.
With a new version of Claude Code pretty much each day, constant changes to their usage rules (2x outside of peak hours, temporarily 2x for a few weeks, ...), hidden usage decisions (past 256k it looks like your usage consumes your limits faster) and model degradation (Opus 4.6 is now worse than Opus 4.5 as many reported), I kind of miss how it can be an user error.
The only user error I see here is still trusting Anthropic to be on the good side tbh.
If you need to hear it from someone else: https://www.youtube.com/watch?v=stZr6U_7S90
> past 256k it looks like your usage consumes your limits faster
This is false. My guess is what is happening is #1 above, where restarting a stale session causes a 256k cache miss.
That said, I hear the frustration. We are actively working on improving rate limit predictability and visibility into token usage.
just like everybody else I and my colleagues at work have seen major regressions in terms of available usage over the past month, seemingly unrelated to caching/resuming. On an enterprise sub doing the same work I personally went from being able to have several sessions running concurrently without hitting limits, to only having one session at a time and hitting my 5h every day twice a day in 3-4 hours tops (and due to the apparent lower intelligence I have been at the terminal watching what opus is doing like a hawk, so it's not a I went for coffee I have to hit the cache). The first day I ever hit my 5h this year was the day everybody reported it (I think it was the Monday you introduced the 2x promotion after hours? not sure, like 3 weeks ago?)
To avoid 1M issues, this week I have also intentionally used the 256k context model, disabled adaptive thinking and did the same "plans in multiple short steps with /clear in-between" to minimize context usage, and yet nothing helps. It just feels ~2x to ~3x less tokens than before, and a lot less smart than in February.
Nowadays every time I complete a plan I spend several sessions afterwards saying things like "we have done plan X, the changes are uncommitted, can you take a look at what we did" and every time it finds things that were missed or outright (bad) shortcuts/deviations from plan despite my settings.json having a clear "if in doubt ask the user, don't just take the easy way out". As a random data point, just today opus halfway through a session told me to make a change to code inside a pod then rollout restart it to use said change, and when called out on it it of course said that I was right and of course that wouldn't work...
It is understandable that given your incredible growth you are between a rock and a hard place and have to tweak limits, compute does not grow on trees, but the consistent "you are holding it wrong" messaging is not helpful. I am wondering if realistically your only option is to move everybody to metered, with clear token usage displayed, and maybe have pro/max 5/max 20 just be a "your first $x of tokens is 50/75% off". Allow folks to tweak the thinking budget, and change the system prompt to remove things like "try the easy solution first" which anecdotally has been introduced in the past while, and allow users to verify on prompt if the prompt would cause the whole context to be sent or if cache is available.
Why did it suddenly become an issue, despite prompt caching behavior being unchanged?
PEBKAC: Problem Exists Between Keyboard And Chair
Yes same here. I use CC almost constantly every day for months across personal and work max/team accounts, as well as directly via API on google vertex. I have hardly ever noticed an issue (aside from occasional outages/capacity issues, for which I switch to API billing on Vertex). If anything it works better than ever.
You know that people are not using the same resources? It's like 9 out of 10 computers get borked and you have the 1 that seems okay and you essentially say "My computer works fine, therefore all computers work fine." Come on dude.
Money money money money
Would it be possible to increase the cache duration if misses are a frequent source of problems?
Maybe using a heartbeat to detect live sessions to cache longer than sessions the user has already closed. And only do it for long sessions where a cache miss would be very expensive.
Yes, we're trying a couple of experiments along these lines. Good intuition.
I suspect 1M token context is questionable value because of the secondary effect of burning quota vs getting work done.
I think the model select that let me choose 1M made sense because I could decide if I was working on large documents and compacting more often was more effective.
Boris,
Even if Anthropic is working in good faith to lower infrastructure costs, developers need more than 5 minutes to notice that CC completed a task, review its changes and ask it to merge. Only developers who do not review code changes can live with such a TTL...
Consider making this value configurable as the ideal TTL value is different for each person. If people are willing to pay more for 30 minutes TTL than 5 minutes, they should be able to.
- [deleted]
> Since Claude Code uses a 1 hour prompt cache window for the main agent
this seems a bit awkward vs the 5 hour session windows.
if i get rate limited once, I'll get rate limited immediately again on the same chat when the rate limit ends?
any chance we can get some form of deffered cache so anything on a rate limited account gets put aside until the rate limit ends?
As another data point, I pay for Pro for a personal account, and use no skills, do nothing fancy, use the default settings, and am out of tokens, with one terminal, after an hour. This is typically working on a < 5,000 line code base, sometimes in C, sometimes in Go. Not doing incredibly complicated things.
Ah, so cache usage impacts rate limits. There goes the ”other harnesses aren’t utilizing the cache as efficiently” argument.
Claude Code is the most prompt cache-efficient harness, I think. The issue is more that the larger the context window, the higher the cost of a cache miss.
I do wonder if it's fair to expect users to absorb cache miss costs when using Claude Code given how untransparent these are.
Politely, no.
- I wrote an extension in Pi to warm my cache with a heartbeat.
- I wrote another to block submission after the cache expired (heartbeats disabled or run out)
- I wrote a third to hard limit my context window.
- I wrote a fourth to handle cache control placement before forking context for fan out.
- my initial prompt was 1000 tokens, improving cache efficiency.
Anthropic is STOMPING on the diversity of use cases of their universal tool, see you when you recover.
That might be, but the argument was that poor cache utilization was costing Anthropic too much money in other harnesses. If cache is considered in rate limits, it doesn’t matter from a cost perspective, you’ll just hit your rate limits faster in other harnesses that don’t try to cache optimize.
There were two issues with some other 3p harnesses:
1. Poor cache utilization. I put up a few PRs to fix these in OpenClaw, but the problem is their users update to new versions very slowly, so the vast majority of requests continued to use cache inefficiently.
2. Spiky traffic. A number of these harnesses use un-jittered cron, straining services due to weird traffic shape. Same problem -- it's patched, but users upgrade slowly.
We tried to fix these, but in the end, it's not something we can directly influence on users' behalf, and there will likely be more similar issues in the future. If people want to use these they are welcome to, but subscriptions clients need to be more efficient than that.
How much jitter would you prefer, how many seconds / minutes out? I have some morning tasks that run while I'm asleep via claude -p, and it sounds like I'm slightly contributing to your spikes (presumably hourly and on quarter hours).
There's prior art from Claude's own scheduled tasks' jitter: https://code.claude.com/docs/en/scheduled-tasks#jitter
> Recurring tasks fire up to 10% of their period late, capped at 15 minutes. An hourly job might fire anywhere from :00 to :06.
> One-shot tasks scheduled for the top or bottom of the hour fire up to 90 seconds early.
If you give doll a list of things you want to see from third party harnesses, a compliance checklist it will make sure the one it is building follows it to the letter.
I’m sorry but when you wake up in the morning with 12% of your session used, saying “it’s the cache” is not an appropriate answer.
And I’m using Claude on a small module in my project, the automations that read more to take up more context are a scam.
it seems if context can't be held for over an hour it should warn you a countdown or such; i already enabled the tokens verbosity thing to see what token level i'm at, but i often leave things sitting rather than complete so that i'm tying things up to start something new in the morning rather than starting on a new thing. so like i just resumed a session that was near-complete, and now it's gone and reloaded all that session in? bit i hadn't detached it. i kind of thougth /summary itself had to read the whole token flow, but that the token context was held locally for some reason..
Hi Boris,
Long term claude code user here. Is the first time i've had to setup a hook to codex to review claude output.
Is hallucinating like never before
Is missing key concepts/instructions in context like never before
Is writing bad code that will "pass test" much more. Before it use to try be critic and do good code, now it will try to hack test and bypass intructions for a green pass.
Am I so out of touch?
No! It’s the children who are wrong!
you are prompting it wrong
Have you considered poking the cache?
When a user walks away during the business day but CC is sitting open, you can refresh that cache up to 10x before it costs the same as a full miss. Realistically it would be <8x in a working day.
One thing I didn't see anywhere here, except your mention about pulling in large number of skills, is that the token consumption is significantly higher for users with many agents, skills, and MCPs installed, and many are mere ghosts. The 5m TTL from #46829 compounds the effect: in my case, I found ~20k tokens of ghost context I hadn't intentionally opened. Each idle period after 5m wastes that as a full cache miss.
Boris, would you please confirm on-record: is the current cache TTL for the main agent context 1h or 5m? Issue #46829 was closed as "not planned".
Hi, thanks for Claude Code. I was wondering though if you'd considering adding a mode to make text green and characters come down from the top of the screen individually, like in The Matrix?
I’ve seen the /clear command prompt and I found the verbiage to be a bit unclear. I think clarifying that the cache has expired and providing an understandable metric on the impact - ie “X% of your 5-hour window” for Pro/Mad users and details on token use for API users. A pop-up that requires explicit acknowledgment might also help, although that could be more of an annoyance to enterprise users.
One pattern I use frequently is using one high level design and implementation agent that I’ll use for multiple sessions and delegate implementation to lower level agents.
In this case it’d be helpful to have one of two options:
1. If Claude CLI could create an auto compaction of the conversation history before cache expiration. For example, if I’m beyond X minutes or Y prompts in a conversation and I’ve been inactive for a threshold it could auto-compact close to the expiration and provide that as an option on resume. 2. If I could configure cache expiration proactively and Anthropic could use S3 or a similar slow load mechanism to offload the cache for a longer period - possibly 24-72h.
I can appreciate that longer KV cache expiration would complicate capacity management and make inference traffic less fungible but I wouldn’t mind waiting seconds to minutes for it to load from a slower store to resume without quota hits.
you should check with people working on Claude Code, cache has been udpated to 5min ... https://github.com/anthropics/claude-code/issues/46829#issue...
So yeah, 1M window that expires every 5min .... not good
Could we get an option to use Opus with a smaller context window? I noticed that results get much worse way earlier than when you reach 1M tokens, and I would love to have a setting so that I could force a compaction at eg 300k tokens.
You probably just missed it in his post, but:
"To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude."
Maybe try changing the 4 to a 3 and see if that works for you?
Thank you, will definitely try that!
You've created quite a conundrum.
The only people who are going to run into issues are superpower users who are running this excessively beyond any reasonable measure.
Most people are going to be quite happy with your service. But at the same time, and this is just a human nature thing people are 10 times more likely we complain about an issue than to compliment something working well.
I don't know how to fix this, but I strongly suspect this isn't really a technical issue. It's more of a customer support one.
> defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred
This seems really useful!
I'm surprised that "Opus 4.6" (200K) and "Opus 4.6 1M" are the only Opus options in the desktop app, whereas in the CLI/TUI app you don't seem to even get that distinction.
I bet that for a lot of folks something like 400k, 600k or 800k would work as better defaults, based on whatever task they want to work on.
Boris, wasnt this the same thing ~2 weeks ago? Is it the same cache misses as before? What's the expected time till solved? Seems like its taking a while
Does this 60min ttl of also apply to claude code web?
I have regularly sessions open for multiple days.
Is that a pattern that is not advised?
Thank you for your responses, especially on a Sunday. They give us some insights and at least a couple temporary workarounds to use, while the issues are being addressed :) much appreciated
Hello Boris! How do I increase the 1 hour prompt cache window for the main agent? I would love to be able to set that to, say, 4 hours. That gives me enough time to work on something, go teach a class, grab a snack, and come back and pick up where I left off.
Another CC team member confirmed it's 5 minutes now, not 1 hour.
See the links in https://news.ycombinator.com/item?id=47747209
Resizing the context window seems like a very good idea to me. I noticed a decline of productivity when the 1M context window was released and I'd like to bring it back to 200k, because it was totally fine for the things I was working on.
shouldn't compaction be interactive with the user as to what context will continue to be the most relevant in the future??? what if the harness allowed for a turn to clarify the user's expected future direction of the conversation and did the consolidation based upon the addition info?
there definitely seems to be a benefit to pruning the context and keeping the signal to noise high wrt what is still to be discussed.
/loop message ping every 4 minutes
keeps the cache warm while the CC REPL is not active.
> To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session)
Is this really an improvement? Shouldn't this be something you investigate before introducing 1M context?
What is a long stale session?
If that's not how Claude Code is intended to be used it might as well auto quit after a period of time. If not then if it's an acceptable use case users shouldn't change their behavior.
> People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins.
If this was an issue there should have been a cap on it before the future was released and only increased once you were sure it is fine? What is "a large number"? Then how do we know what to do?
It feels like "AI" has improved speed but is in fact just cutting corners.
Where can i learn about concepts like prompt cache misses? I don't have a mental model how that interacts with my context of 1M or 400k tokens... I can cargo cult follow instructions of course but help us understand if you can so we can intelligently adapt our behavior. Thanks.
The docs are a good place to start: https://platform.claude.com/docs/en/build-with-claude/prompt...
Thanks. Just noting that those docs say the cache duration is 5 min and not 1 hour as stated in sibling comment:
> By default, the cache has a 5-minute lifetime. The cache is refreshed for no additional cost each time the cached content is used. > > If you find that 5 minutes is too short, Anthropic also offers a 1-hour cache duration at additional cost.
Apparently Anthropic downgraded cache TTL to 5 min without telling anyone. My biggest issue with the recent issues with Claude Code is the lack transparency, although it looks like even Boris doesn't know about one: https://news.ycombinator.com/item?id=47736476
And why does /clear help things? Doesn't that wipe out the history of that session? Jeez.
[dead]
Claude Code cache is not 1 hour. There is a "Closed as not planned" issue in GitHub that confirms that it has been moved to 5 minutes since March: https://github.com/anthropics/claude-code/issues/46829. I started seeing the massive degradation exactly on the 23rd of March, hence after a few days I unsubscribed because it was completely unusable, with a ~5h session being depleted in as little as 15-20 mins.
Looks like the cache change to 5 minutes was so secretive that even CC team doesn't know that.
Or someone just vibe coded "Hey, Claude, make them burn allowances quicker" and merged without telling anyone.
Both are plausible to me.
Why are you all of a sudden running into so many issues like this? Could it be that all of the Anthropics employees have completely unlimited and unbounded accounts, which means you don't get a feeling of how changes will affect the customers?
The number of people using Claude Code has grown very quickly, which means:
- More configurations and environments we need to test
- Given an edge/corner case, it is more likely a significant number of users run into it
- As the ecosystem has grown, more people use skills and plugins, and we need to offer better tools and automation to ensure these are efficient
We do actually dogfood rate limits, so I think it's some combination of the above.
I think the suspicion regarding skills and plugins is fair and logical. And it is absolutely the case that some use significantly more tokens.
with that said, on my 5x plan, I could have multiple sessions working and the limit was far away. Around when you introduced the whole more tokens during off-peak hours and fewer tokens during working US hours, Even with a single session, using no plugins at all (I uninstalled OMC) I run into limits very often.
I have not performed any rigorous tests but it feels like I have about 25% of what I used to have or less. This is all without using teams of agents, or ralph loops or anything like that. Just /plan and execute in a single session. I have restored the /clear context before executing plan to try and mitigate things. I will also try the 400k context since, in my experience, the 1M tokens have not made Opus 4.6 noticeably smarter for my small webapp use-case.
Best of luck to you!
ps: whenever you introduce a change, please make it optional AND ask the user about it at first. Don't just yank things suddenly (like the /clear context and apply plan option.) as I spent hours trying to figure out how I broke it before I saw your note and how to re-enable it.
With the quality trends this issue of too many users will fix itself soon.
How do ya’ll test?
Because it’s completely vibe coded? And the codebase goes through massive churn, which means things that were stable get rewritten possibly with bugs.
You can get Claude Code to write tests too...
Writing tests is easy. Writing useful tests is not so easy.
I have a feature request: I build an mcp server, but now it has over 60 tools. Most sessions i really don’t need most of them. I suppose I could make this into several servers. But it would maybe be nice to give the user more power here. Like let me choose the tools that should be loaded or let me build servers that group tools together which can be loaded. Not sure if that makes sense …
Have you tried asking Mythos for a fix?
from looking at the raw requests, that cant seem right?
its all "cache_control": { "type": "ephemeral" } there is no "ttl" anywhere.
// edit: cc_version=2.1.104.f27
How can we turn of 1m context? I don't find it has ever helped.
He mentioned this in his original comment:
"CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000"
There's also CLAUDE_CODE_DISABLE_1M_CONTEXT and I'm really not clear on what the difference is and why to pick one over the other. But I guess one disables models that have 1m and the other keeps those models but sets the limit lower?
There's an issue someone raised showing that prompt caches are only 5 minutes.
The reply seems to be: oh huh, interesting. Maybe that's a good thing since people sometimes one-shot? That doesn't feel like the messaging I want to be reading, and the way it conflicts with the message here that cache is 1 hour is confusing.
https://news.ycombinator.com/item?id=47741755
Is there any status information or not on whether cache is used? It sure looks like the person analyzing the 5m issue had to work extremely hard to get any kind of data. It feels like the iteration loop of people getting better at this stuff would go much much better if this weren't such a black box, if we had the data to see & understand: is the cache helping?
Aren’t they saying that it’s 5minutes for things like subagents (that wouldn’t benefit from it?)
Pulling all the skills and agents in the world in, when unused are a big hit. I deleted all of mine and added back as needed and there was an improvement.
Running Claude Cowork in the background will hit tokens and it might not be the most efficient use of token use.
Last, but not least, turning off 1M token context by default is helpful.
People, just switch to MiniMax and ditch CC completely. It's not worth it any more.
Number 2 makes me chuckle honestly. Too many people going down the 10x rabbit holes on youtube. Next up, a framework that 100xs your workflow. You know its good because it comes with 300 agents and 20 mcp servers and 1200 skills
- [deleted]
Can you explain why Opus 4.6 suddenly becomes dumb as a sack of potatoes, even if context is barely filled?
Can you explain why Opus 4.6 will be coming up with stupid solutions only to arrive at a good one when you mention it is trying to defraud you?
I have a feeling the model is playing dumb on purpose to make user spend more money.
This wasn't the case weeks ago when it actually working decently.
- [deleted]
Eh you say that every time and yet it keeps happening.
Boris, is the KV cache TTL now reduced to 5 minutes from 1 hour?
I think this may be the biggest concern for people building tools on the API: https://github.com/anthropics/claude-code/issues/46829
I would argue that KV caching is a net gain for Ant and a well-maintained cache is the biggest thing that can generate induced demand and a thriving third party ecosystem. https://safebots.ai/papers/KV.pdf
Wait what? If I get told to come back in three hours because I'm using the product too much, I get penalized when I resume?
What's the right way to work on a huge project then? I've just been saying "Please continue" -- that pops the quota?
[flagged]
[flagged]
This comment seems unnecessarily hostile.
Why?
It seems just fine to me. This is what Anthropic needs to do if they want to survive. I'm always looking out for someone to integrate an actually good harness to a good model. Once that happens, I'm jumping ship if Anthropic keeps playing these tricks.
It's almost unusable for me now. A simple prompt to merge 3 sub-100-line files with simple node code, on Sonnet 4.6, uses up 20% of my 5 hour quota, on a new/fresh session.
To be fair, my comment was a bit harsher before the update. The way they handle the development, communication and how they treat customers isn't fine. I've seen some angry people post and comment in manners which truly deserved the label hostile.
The whole product with the infrastructure and Claude Code's code appear to be vibe coded.
If they can’t infrastructure then perhaps they should offer the ability for customers to host themselves.
They appear to take issues seriously mostly when they become posts on hacker news and when articles are published online by major news sites. Customer support is mostly a bot. I don't even know how to reach some actual humans to get support.
I'm sorry if you and others are offended. They've had these issues for several weeks now. I haven't seen any real improvements during this time. I see more features and more bugs.
There have been several releases made over the last few days without any changelogs. The quotas are still as opaque as they've been. This company has some extremely shady business practices.
Do you really want HN by like Stepford Wives? Dang is already doing to good job on closing on that, no need to encourage them more.
The original poster edited his comment after my response, it was far more hostile before. So I assume my input worked.
The hostility is all Anthropic.
I wish people would pay more attention to:
* Anthropic is in some way trying to run a business (not a charity) and at least (eventually?) make money and not subsidize usage forever
* "What a steal/good deal" the $100-$200/mo plans are compared to if they had to pay for raw API usage
and less on "how dare you reserve the right to tweak the generous usage patterns you open-ended-ly gave us, we are owed something!"
As an (ex) paying customer, I'm expecting some consistency. I used to be satisfied with the value I got, until the limits changed overnight, and I'd get a ten of my previous usage.
If Anthropic is allowed to alter the deal whenever, then I'd expect to be able to get my money back, pro-rata, no questions asked.
yes, $200/mo is a serious subscription, we are owed something, and I won't feel ashamed for saying that
especially when you are told using the subagent for code review "claude -p" is now billed on API on top of $200 sub
All those apply to OpenAI+Codex too, but they're far more generous with limits than Anthropic, and with granting fresh limits to apologize when they fuck up.
[flagged]