Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

github.com

・

733 points

・

cmaster11

・

2 days ago

680 comments

bcherny ・ 2 days ago

Hey all, Boris from the Claude Code team here.

We've been investigating these reports, and a few of the top issues we've found are:

1. Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred. To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude.

2. People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins. This was the case for a surprisingly large number of users, and we are actively working on (a) improving the UX to make these cases more visible to users and (b) more intelligently truncating, pruning, and scheduling non-main tasks to avoid surprise token usage.

In the process, we ruled out a large number of hypotheses: adaptive thinking, other kinds of harness regressions, model and inference regressions.

We are continuing to investigate and prioritize this. The most actionable thing for people running into this is to run /feedback, and optionally post the feedback ids either here or in the Github issue. That makes it possible for us to debug specific reports.

reenorap ・ 2 days ago

Boris, you're seeing a ton of anecdotes here and Claude has done something that has affected a bunch of their most fervent users.
Jeff Bezos famously said that if the anecdotes are contradicting the metrics, then the metrics are measuring the wrong things. I suggest you take the anecdotes here seriously and figure out where/why the metrics are wrong.
- toddmorey ・ 2 days ago
  
  On the subject of metrics, better user-facing metrics to understand and debug usage patterns would be a great addition. I'd love an easier way to understand the ave cost incurred by a specific skill, for example. (If I'm missing something obvious, let me know.)
  Baking deeper analytics into CC would be helpful... similar to ccusage perhaps: https://github.com/ryoppippi/ccusage
  
  simple10 ・ a day ago
  
  This is useful if you want to keep an eye on what claude's actually doing behind the scenes: https://github.com/simple10/agents-observe
  
  shamcleren ・ a day ago
  
  [dead]
- bcherny ・ 2 days ago
  
  We are taking it seriously, and are continuing to investigate. We are not trusting the metrics.
  
  stevenae ・ 2 days ago
  ・ 17 more
  
  The quantitative ux research team at Google was created for exactly this problem: a service which became popular before the right metrics existed, meaning metrics need to be derived first, then optimized. We would observe users (irl), read their logs, then generate experiments to improve the behavior as measured by logs, and return to see if the experiment improves irl experiences. There were not many of us and we are around :)
  
  ajma ・ 2 days ago
  ・ 12 more
  
  I worked with Boris in the past and in my experience, Boris cares deeply about the customer. I'd vouch that Boris really cares about the issue people are running into.
  
  thejazzman ・ a day ago
  
  “ Hello. My name is Mr. Sirob,”
  https://amphetamem.es/meme/?id=the-simpsons_04_12_89&timesta...
  
  embedding-shape ・ 18 hours ago
  ・ 2 more
  
  But no other user has yet come and said "I worked with ajma in the past ..." so how can we trust your judgement about Boris?
  
  jmalicki ・ 15 hours ago
  
  I saw this guy named Claude saying ajma is a genius!
  
  bodegajed ・ 12 hours ago
  
  Nice try boris
  
  dkersten ・ a day ago
  ・ 7 more
  
  [flagged]
  
  stingraycharles ・ a day ago
  ・ 6 more
  
  Anthropic can't win in this case.
  They don't use Claude Code, they get accused that they don't even trust it themselves.
  They use Claude Code, they get accused the code is shit because it's slop.
  I think dogfooding is known to be a legitimate approach here.
  
  Toutouxc ・ a day ago
  ・ 3 more
  
  The idea is that Claude Code is surprisingly buggy and unrefined for something created by the very tool and processes that are supposed to be replacing us as we speak.
  
  KptMarchewa ・ a day ago
  ・ 2 more
  
  The idea is that sculpted ideal code is rarely the best choice.
  
  mrbungie ・ a day ago
  
  At the same time I'd say sloppy code (human or AI generated) is rarely the best choice. I'd say the best is in between.
  
  visarga ・ a day ago
  
  And they don't use our version of CC, or with our settings. They have flags for internal use only.
  
  latexr ・ a day ago
  
  > Anthropic can't win in this case.
  Sure they can. The solution is pretty simple and in your own post. Choose either:
  * Make the product good to the point code is no longer slop and shit.
  * Stop hyping the quality when it isn’t there.
  * Do a hybrid approach. Use their own product but actually have competent humans in the loop to make the code good.
  This is not hard. Be honest and humble and that criticism goes away. It’s no one’s fault but Anthropic’s that they hype up their product to more than it can do and use it carelessly to build itself. It’s not a no-win scenario if you’re the one causing your own obviously avoidable problems.
  
  Traubenfuchs ・ a day ago
  
  Google products ux is widely acknowledged to be a steaming pile of shit though, so I am not sure you should follow their example.
  Many of the metrics they use are obviously actively user hostile.
  
  TheLegace ・ 21 hours ago
  ・ 3 more
  
  Metrics and quantitative ux results in really bad software, making it rigid while optimizing for the wrong things.
  The most obvious example is Google creating multiple steps for Login where you have to enter your password after you put in your user.
  I wonder what metric lead to that decision or was it a political decision to make it seem like their "old" software has some new feature.
  
  saltcured ・ 17 hours ago
  ・ 2 more
  
  If you mean Google website login, that step is needed because the email address is used to determine which identity provider to use. E.g. I have three different accounts that branch off from that same initial login flow.
  One is my person "gmail.com" account, and the other two go through enteprise identity providers related to my employment and their G-Suite licenses. So after I put in one of these three email addresses, I get prompted for the appropriate next step. Only one of them involves giving a password to a Google server. The other two are redirects to completely separate login systems operated by my employer.
  
  TheLegace ・ 10 hours ago
  
  I mean I get it logically makes sense. But it still seems like a waste of time for a small percentage of use cases.
  Maybe a better approach is put in your login have it automatically detect if it requires an identity provider. Gray out the password to signal to the user password is not necessary and automatically redirect.
  Less clicking, don't break flow and think of a smoother solution.
  
  reenorap ・ 2 days ago
  
  Thank you
  
  blks ・ 2 days ago
  
  [flagged]
  
  Ucalegon ・ 2 days ago
  ・ 72 more
  
  [flagged]
  
  edmundsauto ・ 2 days ago
  ・ 2 more
  
  HN sometimes talks about pathological customers who will never be happy. Boris is probably the single best rep in the community, possibly ever.
  The way your tone and complaints come across reminds me of this. As a paying customer ($5k spend per month in my corporate job), I’d rather anthropic keep doing what they’re doing — innovating and shipping useful stuff at blinding speed — and not index on your feedback. I think the tradeoffs they would cost far outweigh the consequences.
  
  latexr ・ a day ago
  
  > Boris is probably the single best rep in the community, possibly ever.
  When you say “the community”, what exactly are you referring to?
  
  nickandbro ・ 2 days ago
  ・ 26 more
  
  Dang man, chill.
  
  Ucalegon ・ 2 days ago
  ・ 25 more
  
  Man, expecting the minimal from companies who are supposed to deliver a pro... there is no SLA for any this, so you are right.
  Also, why is there no SLA?
  
  IanCal ・ a day ago
  
  You’re not getting a worthwhile sla on a subscription at this rate. What are you going to get? A few dollars? An sla isn’t useful unless it actually bites for the provider and actually compensates the customer. And it costs money - how much are you willing to spend for this insurance?
  
  946789987649 ・ 2 days ago
  ・ 3 more
  
  because there isn't one and people still paid for it.
  My clients demand one, so there is one.
  
  Ucalegon ・ 2 days ago
  ・ 2 more
  
  Imagine if people were like your clients.
  
  otterley ・ 2 days ago
  
  If they were, they wouldn't buy your product without an SLA. But they're not.
  
  alpha_squared ・ 2 days ago
  ・ 20 more
  
  Because this is ultimately a beta service. The whole industry is.
  
  Ucalegon ・ 2 days ago
  ・ 19 more
  
  Wait, where is there a 'beta' tag to something that they are charging real money for? Why is this software any different than any other software and we should completely give away our rights as a consumer to ensure what we pay for is delivered?
  
  layer8 ・ 2 days ago
  ・ 13 more
  
  I think the parent is saying that one should be aware that the whole LLM industry is still in an experimental stage and far from mature. What you want isn’t what’s being offered. I agree that there should be higher standards, but what we currently have is an arms race. The consequence is to factor that into the value proposition and maybe not rely too much on it.
  
  Ucalegon ・ 2 days ago
  ・ 12 more
  
  SLAs should be standard for any paid service, especially on the enterprise side, but also on the consumer side. Being immature as a company does not excuse a lack of service delivery.
  
  otterley ・ 2 days ago
  ・ 10 more
  
  Not every customer, even a paying customer, demands reliability at a particular level. Market segmentation tends to address those situations: pay more, get more.
  
  visarga ・ a day ago
  
  > pay more, get more
  Users on $200 plan complaining, already at max level of subscription, I don't think a $200 subscription should make you feel like you are getting unfair advantage. Like restricting claude -p to API ... after I paid so much? Moderate use should not do that. I am not running it batch mode on a million inputs.
  
  Ucalegon ・ 2 days ago
  ・ 8 more
  
  'I don't want to hold companies to account for failing to deliver services, therefore I think everyone else should live by my permissive "standards".'
  
  otterley ・ 2 days ago
  ・ 7 more
  
  They can be held to account when they fail to deliver what they promise! But what is promised for delivery is what's in the Terms of Service (i.e. the agreement). Nothing more. If it's not in there, you can't hold them to account for it.
  
  Dylan16807 ・ a day ago
  ・ 6 more
  
  Yes, that's the problem.
  It's too easy for companies to fail to provide their service as long as they never promise to provide their service.
  
  otterley ・ a day ago
  ・ 5 more
  
  > It's too easy for companies to fail to provide their service as long as they never promise to provide their service.
  I don't even know what this means. You can't make anyone work for free, nor dictate the terms of what kind of work someone will do without their consent. I assume you are not pro-slavery.
  
  Dylan16807 ・ a day ago
  ・ 4 more
  
  I'll make a very simple example.
  The service at mcdonald's is providing food for money.
  When their ice cream machine is broken, they fail to provide part of their service.
  I'm not saying anything about "making" them do anything. I'm just calling out their failure and saying it's a bad thing.
  
  otterley ・ a day ago
  ・ 3 more
  
  You didn't merely call out their failure. You said it was "too easy," implying something more, like they owe you something. It's a pretty entitled point of view.
  
  Dylan16807 ・ a day ago
  ・ 2 more
  
  I don't think it's "entitled" to want companies to put some effort into avoiding those failures.
  If the government did something, we could think of it as similar to passing inspection.
  The other way to look at things is that the market isn't varied and competitive enough to punish the companies that fail this way.
  They don't have to "owe me" anything for me to desire a different balance. My desire is fine.
  
  otterley ・ a day ago
  
  "[W]ant[ing] companies to put some effort into avoiding ... failures" is not the same as "hold[ing] them to account". The former is "this sucks and I don't like it." The latter is "punish them or force them to do what I want!"--i.e., some sort of legal remedy.
  
  phs318u ・ a day ago
  
  If you can point to a consumer targeted service that provides and keeps their SLAs, I’ll be impressed.
  
  otterley ・ 2 days ago
  ・ 5 more
  
  What right as a consumer do you have that is pertinent here, other than to have the vendor adhere to the terms of the agreement you have with them?
  Anthropic has many customers despite the fact that they have occasional problems. They’re not suing Anthropic because Anthropic isn’t promising in its agreement something they can’t deliver.
  I think you’re reading into the agreement something that isn’t there, and that’s the cause of your confusion.
  
  Ucalegon ・ 2 days ago
  ・ 4 more
  
  I am not reading into an agreement, I am saying there is no agreement to be found to ensure service delivery and the associated liability that would come for any SLA. Also, where is the Anthorpic SLA for Enterprise?
  Does it exist?
  Just because people pay for things doesn't mean they know or understand what they are paying for. Nor is there the legal precedence to actually understand where the rub lies or how that impacts business.
  
  otterley ・ 2 days ago
  ・ 3 more
  
  > Just because people pay for things doesn't mean they know or understand what they are paying for.
  I believe, respectfully, that’s precisely what is happening in this thread because you keep complaining about the absence of an SLA that was never in the agreement, as though it is—or is supposed to be—there, and therefore the existence of some “rights” that would flow from that.
  
  Ucalegon ・ 2 days ago
  ・ 2 more
  
  There are no SLAs, in any agreement, thats the problem.
  
  otterley ・ 2 days ago
  
  We're back to square one: https://news.ycombinator.com/item?id=47741877
  
  mrcwinn ・ 2 days ago
  ・ 6 more
  
  It's incredible that Boris is here on HN being open and sharing an issue they don't fully understand yet, and offering a possible workaround. CTFO.
  Thank you Boris.
  
  Ucalegon ・ 2 days ago
  ・ 5 more
  
  I am sorry you feel this way, but the reality of the situation is there is zero reason to trust anything Anthropic or Boris says. They have no legal liability or obligation to tell the truth, besides brand risk, which to people like you is mitigated for a single person to show up, post, and thats it.
  
  mliker ・ a day ago
  ・ 2 more
  
  You should work at these companies and understand they have good intentioned employees otherwise they’d rarely pass the cultural interviews plus background checks plus backchanneling. Have a bit more faith in the employees
  
  gilrain ・ a day ago
  
  > Have a bit more faith in the employees
  Have you been asleep for a decade?
  
  trueno ・ a day ago
  
  lol it is _way_ too easy for people to talk like this behind a computer screen.
  
  otterley ・ 2 days ago
  
  What truth do you believe you are not being told, exactly?
  
  amirhirsch ・ 2 days ago
  ・ 37 more
  
  Dude is on hacker news on a Sunday. half the GDP of the world is competing with him. What metrics would you like to see?
  
  Ucalegon ・ 2 days ago
  ・ 36 more
  
  An enforceable SLA with the services that Anthropic offers rather than putting an employee to respond to things on Sunday.
  
  roamerz ・ 2 days ago
  ・ 2 more
  
  >> rather than putting an employee to respond to things on Sunday.
  Maybe just maybe they didn’t put him here, rather he just a normal guy who reads HN, who is passionate about his role, and is here on his own time.
  
  Ucalegon ・ 2 days ago
  
  Maybe... maybe... maybe... none of this builds trust when there is something that does build trust; putting revenue on the line and opening yourself to legal liability. Otherwise everything is empty and meaningless, its just PR, and nothing more.
  
  nl ・ a day ago
  
  You can get a SLA and ZDR by choosing one of the Claude partners (eg on Bedrock)
  https://platform.claude.com/docs/en/build-with-claude/api-an...
  
  otterley ・ 2 days ago
  ・ 29 more
  
  Then you should offer to pay them for one. I’m sure they’d love to hear from you, and they could probably deliver one to you for the right price. But it will be a high price.
  
  Ucalegon ・ 2 days ago
  ・ 28 more
  
  They don't offer a ZDR [0] for files, even if you have a BAA or dealing with HIPAA data, no matter how much you pay them. Trust me, we have tried.
  [0] https://code.claude.com/docs/en/zero-data-retention
  
  undefined ・ a day ago
  
  [deleted]
  
  otterley ・ 2 days ago
  ・ 26 more
  
  I’m really confused. We were talking about SLAs, not other product features. Are you moving the goalposts?
  
  Ucalegon ・ 2 days ago
  ・ 25 more
  
  There isn't an SLA nor is there any protections around file uploads to their services. Two, bad, things can be true at the same time.
  
  otterley ・ 2 days ago
  ・ 24 more
  
  Did you talk to them about purchasing an SLA? If so, what did they say?
  
  xyzzy_plugh ・ a day ago
  ・ 23 more
  
  I feel like you aren't really understanding what a Service-level Agreement actually is in practice. It's not a piece of paper with a specific number of nines and an associated price tag. They can be and often are very complicated documents that take multiple rounds of redlining to arrive at something both parties agree to.
  If zero data-retention was non-negotiable for the customer, it's totally possible that the negotiations ended there.
  I'm not sure what you're trying to accomplish or unearth beyond what's already been said, which certainly suffices for me.
  
  undefined ・ a day ago
  
  [deleted]
  
  otterley ・ a day ago
  ・ 21 more
  
  As both an attorney and SRE, I understand what an SLA is. And you can absolutely get an SLA when you buy cloud services from many vendors, including AWS. Some vendors provide it at all price points; others include it at higher service tiers, without complex negotiations needed at all. And, yes, if it’s not on the menu, you may need to negotiate one. But you can’t conclusively say “they don’t offer one” unless you’ve actually gone to the company and asked.
  https://aws.amazon.com/legal/service-level-agreements/
  https://trailhead.salesforce.com/content/learn/modules/slack...
  https://support.atlassian.com/subscriptions-and-billing/docs...
  Before you casually accuse someone of not knowing what they’re talking about, first make sure you’re on firm ground yourself.
  
  jrflowers ・ a day ago
  ・ 20 more
  
  It seems like you could save a lot of time and confusion by talking about the SLA that you pay for from Anthropic instead of establishing your bona fides by posting links to various unrelated companies’ SLA pages.
  Like how was your experience negotiating your SLA with Anthropic? What ballpark are you paying for the SLA with Anthropic that you have in place? How many 9s does your Anthropic SLA cover? Obviously you haven’t posted a half dozen times in this thread about how Anthropic by nature of existing offers SLAs without any knowledge of that, so some simple stuff about your SLA with Anthropic would be helpful.
  
  otterley ・ a day ago
  ・ 19 more
  
  I make no unqualified claims as to whether Anthropic offers an SLA. I never did. But I do know that it's unreasonable to claim they don't when you didn't even take the steps to conclusively determine it for yourself.
  As I said: "I’m sure they’d love to hear from you, and they could probably deliver one to you for the right price. But it will be a high price."
  
  jrflowers ・ a day ago
  ・ 18 more
  
  Oh, well in that case, if posting URLs counts as proof of… something, there doesn’t appear to be any SLA page anywhere in their sitemap. https://www.anthropic.com/sitemap.xml
  Maybe it is just common for enterprise SaaS businesses to offer SLAs without having a page about it though. Something like that could possibly be unjustifiably burdensome as well because it’s not like they could just type “make a page about how we offer SLAs” and have it magically appear
  
  otterley ・ a day ago
  ・ 17 more
  
  Not everything a business might be willing to do is listed on their public website.
  
  jrflowers ・ a day ago
  ・ 16 more
  
  That’s a good point. Having an SLA page is an indicator that a business offers SLAs, not having an SLA page is also an indicator that they offer SLAs, just secretly. If you think about it all of the people constantly complaining about uptime and saying stuff like “I would pay money for an SLA from Anthropic if I could” probably means that they are killing it with all those secret SLAs.
  I mean obviously they have to offer them, because they exist, as otherwise you’d have to believe something crazy like “they don’t currently offer them” for reasons “that they haven’t disclosed”
  
  otterley ・ a day ago
  ・ 15 more
  
  Again, many companies will do things they don’t ordinarily offer for the right price. I’ve seen it happen myself (on both the buyer and seller side) on many occasions.
  It goes to the extent of the company itself! Very few businesses publicize that they’re for sale or put their company’s purchase price on their website. But acquisitions happen all the time.
  Anyway, I don’t appreciate your sarcasm coupled with what seems to be willful ignorance about how the world works, so I won’t be participating in this discussion with you anymore.
  
  jrflowers ・ a day ago
  ・ 14 more
  
  I don’t get it. If you wanted to convince everybody about a vast universe of secret business and your expertise in it, why would you start with telling people that weren’t able to get an SLA from Anthropic that Anthropic offers SLAs? And then admit that you don’t actually know and then double down?
  Like if I wanted to convince people that In’N’Out has a secret menu (they do) I wouldn’t start by saying “They have the ingredients to make onion rings, therefore they sell onion rings” (they do not). They offer burgers with lettuce instead of a bun (“protein style”) though. That’s a fact that you can verify by going there or calling them and asking about it. I didn’t rely on my assumptions based on other fast food restaurants, I relied on my knowledge of the topic!
  Edit: It seems like bad faith to admit that you’re using “probably” interchangeably with “I don’t know” and then editing in “for a billion dollars” several posts into a conversation.
  I guess enjoy posting about entirely unrelated conversations in other threads though. (otterley’s post about my having previously had a short amicable exchange with dang in a different thread was deleted, but I’ll leave this part up. I think digging through people’s post histories to find unrelated grievances is icky, for lack of a better word, and wildly unhelpful for any type of discussion)
  Even with the “for a billion dollars” addition, admitting “I don’t know” and “probably” are interchangeable doesn’t really change anything from a logical standpoint. Nobody argued against you not knowing, so I don’t understand the purpose of the repetition.
  
  otterley ・ a day ago
  ・ 13 more
  
  > why would you start with telling people that weren’t able to get an SLA
  That hasn’t been established. There’s no evidence that they went to Anthropic and tried to negotiate one.
  > that Anthropic offers SLAs
  I didn’t. I said “they probably will for the right price.” There are two modifiers in that statement. And the price is unspecified. Their first offer could be a billion dollars. Too expensive? Negotiate down.
  
  throwanem ・ a day ago
  ・ 12 more
  
  I would invite you to notice your interlocutor's assumptions, especially as revealed in his prior comment. Look at how he misunderstands the situation:
  > If you wanted to convince everybody about a vast universe of secret business and your expertise in it...
  > Like if I wanted to convince people that In’N’Out has a secret menu...
  You are discussing business. He is understanding you to be attempting to "mog" him, because he cannot adopt a perspective wherein the conversation represents anything other than a vacuous social challenge or "brodown."
  In short, you're wasting your time.
  
  jrflowers ・ a day ago
  ・ 10 more
  
  I am so old :(
  I looked up “mogging” and I’d think “my assumptions about stuff are valid because I’m a lawyer and don’t know what you do” would count more as mogging than “that doesn’t quite sound right, this is a conversation about something specific and not your general cleverness” but I’ve got a Benny Hill archive to get through
  
  throwanem ・ a day ago
  ・ 9 more
  
  Those are not assumptions on your interlocutor's part. You've embarrassed yourself quite badly, I'm afraid. I know you don't understand how, but that doesn't change the fact of it.
  
  jrflowers ・ a day ago
  ・ 8 more
  
  > You've embarrassed yourself quite badly, I'm afraid.
  :( you are right. This isn’t the first time I’ve lost an argument because hours into a discussion somebody introduced “what if a billion dollars” or “magic amulet” or “ブルマの母” etc
  
  otterley ・ 20 hours ago
  ・ 3 more
  
  A billion dollars is just an example. I could have said a million. When someone says "a high price" that's unspecified, you can use your imagination to hazard a guess at what that might be. Such a figure might seem unreasonable or unrealistic to you, but deals are done between companies under terms most individuals wouldn't come close to considering.
  The only reason I mentioned being an attorney was because someone in the thread above accused me of not understanding SLAs. I don't ordinarily bring it up unless we're talking about law or contracts and I feel the need to defend myself or correct misunderstandings. I don't try to use it to browbeat anyone into submission, although I do believe that respect for others' lived experiences and education is relatively uncommon here on HN.
  I also don't care for my words to be misconstrued to mean something I didn't say. I rarely speak in absolutes because I've learned over time that there are very few absolutes in the world. Thus, I include qualifying language in nearly everything I write. So when someone accuses me of making claims of certainty that I didn't make, I can get pretty defensive about that.
  
  jrflowers ・ 3 hours ago
  ・ 2 more
  
  https://i.kym-cdn.com/entries/icons/facebook/000/034/711/Scr...
  
  throwanem ・ 23 minutes ago
  
  You know, I had come away with the impression of you as someone able to take embarrassment with good grace, to "walk it off" without either crumpling under the weight of unhandled insecurities, or letting your ego insist on turning it into an escalation dominance contest. There is always something to learn from the experience of making a fool of oneself (1), and you struck me as someone very well prepared and equipped to do so. It's a rare capacity to encounter.
  Disappointing me shouldn't make much nevermind to you; you don't know me from Adam. But think of the people in your life who care for you and vice versa, or of the kind of folks you would like to be there. Wouldn't you rather behave so they may regard you in the way I just described?
  It's hard to acknowledge a situation like this one, especially in its moment, especially when you're young. Being able to do hard things, well and gracefully, is another skill we do very well to cultivate. You were putting in some good practice, and the other gentleman (esq.) has offered some good advice in consequence.
  It'd be a shame to blow it here at the very last moment, don't you think?
  (1) Ever shit yourself in public, right there in front of God and everybody? I did, about ten years ago - there was a time in this town before the health code had teeth, when eating at the wrong place or on the wrong day could be like taking your life in your hands. Let me tell you, after that day - complete with an hour cleaning yourself up in a Subway restaurant toilet, followed by the train ride home - discovering you have inadvertently said something a little dumb on the Internet falls easily into something much more like the perspective it is due.
  
  throwanem ・ a day ago
  ・ 4 more
  
  It's just a world you've never seen. Don't take it too personally.
  
  jrflowers ・ a day ago
  ・ 3 more
  
  I appreciate your kindness. While I’ve got you, did you know that the Benny Hill show started in 1955 and a good chunk of what aired from then to 1969 was lost? There are a lot of fans that don’t even realize that what is sometimes labeled as season 1 is season 15! Crazy stuff!
  
  throwanem ・ a day ago
  ・ 2 more
  
  I had not known that! In a similar vein, there exists an Alice in Wonderland-themed Muppet Show episode, starring Brooke Shields, which has had to be left out of home video releases due to so far unresolvable music licensing issues. Not quite totally lost, but somewhat hard to find!
  
  jrflowers ・ 3 hours ago
  
  I’ll check that out! If I find a good link for it I’ll post it as a reply here.
  
  undefined ・ a day ago
  
  [deleted]
  
  aenis ・ 2 days ago
  ・ 3 more
  
  Boring corporate Ai will surely come, but hey, lets enjoy the wild west while it lasts. I am grateful to see Boris come here to address problems people face. I 100% sure nobody is making him - he has one of the coolest jobs in the world.
  
  Ucalegon ・ 2 days ago
  ・ 2 more
  
  >he has one of the coolest jobs in the world.
  So that means we just eject any critical thinking when it comes to companies, especially where they is no liability or obligation for them (Boris or Anthropic) to be honest.
  Other than 'trust'.
  
  phs318u ・ a day ago
  
  Don’t like Anthropic? Use a competing service. At this point the sheer volume of your commentary is not particularly complimentary to your own critical thinking skills. It’s not your job to correct the internet or to convince randoms of the rightness of your position. Of all the things in the world to be pissed at so insistently, this seems to be a pretty minor one.
- bpodgursky ・ a day ago
  
  But the default 1M context window just rolled out a few weeks ago. If refreshing old sessions on 1M context windows is the problem, it's completely aligned with what Boris is saying.
pu_pe ・ a day ago

So Anthropic is trying to save money on infrastructure, we all get it. However, it's not ok to degrade the performance your users have paid for. Last week the issue was that you reduced the default "effort" level, now the prompt cache is shortened. Several users experience far more restrictive usage limits lately.
There is only so much you can do through "UX improvements" or some smart routing on the backend. Your flagship product is actively getting worse, and if users need to fiddle with hidden settings and keep track of GitHub issues every week they will start voting with their money.
- baq ・ a day ago
  
  Dear sir, please think of the shareholders, they need a fair exit.
- raw_anon_1111 ・ a day ago
  
  For context, my company gives each developer a decent monthly allowance for Claude and if push comes to shove, we are allowed to fallback to using AWS Bedrock hosted Anthropic models.
  When you pay for a Claude subscription, what exactly were you promised?
  > they will start voting with their money.
  And go where? Sooner or later the party is going to be over and Claude and its competitors are going to have to start charging enough to actually be profitable when the VC money dries up.
  
  pu_pe ・ a day ago
  ・ 4 more
  
  > When you pay for a Claude subscription, what exactly were you promised?
  I was promised 5x or 20x the amount of resources that the free tier would offer. I implicitly expected the same quality too, not some watered-down version of the product they allowed me to sample before committing to a subscription.
  Sooner or later Anthropic will run out of VC money, yes. That's their problem, not mine. When I took an Uber while it was subsidized by venture capital, the driver did not drop me half way through my destination because they were having cash flow issues.
  
  raw_anon_1111 ・ a day ago
  ・ 3 more
  
  So how do you know that the free tier hasn’t been reduced by 5x?
  
  redserk ・ 20 hours ago
  ・ 2 more
  
  It’s exhausting enough to deal with services that change around on an annual/semi-annual basis with pricing and expectations.
  Now the expectation is that we should tolerate goalposts being shuffled around on a weekly/daily basis with the added requirement of digging into bug tickets because there’s no attempt at transparency? The tech is cool but this is absolutely insane.
  If you’re an individual developer paying $100-200/mo for a service that keeps changing, there is a LOT of reason to keep an eye on other products.
  
  raw_anon_1111 ・ 20 hours ago
  
  I’m not saying that there isn’t a reason to keep an eye on other products. I’m saying that every other product in the space has the same unit economics and will eventually need to charge enough to be profitable - and to continue training and hardware expansion.
  Honestly a developer paying $200 a month is a nothingburger and if using their service to the fullest is losing them money.
  For context, the company I work for gives each consultant a $2000 a month allowance and I think there are probably around 500-700 people with that allowance. I’m sure everyone doesn’t use it all.
  If they have limited hardware resources, where do you think they are going to focus?
- steveBK123 ・ a day ago
  
  Classic VC pump playbook - run it uneconomically until everyone is addicted, then 5x prices once you have enough critical mass. See 2010s "Millennial Lifestyle Subsidy"..
  
  FuckButtons ・ a day ago
  
  It seems pretty transparent that they are heavily resource constrained, (training run for Claude 5.x, higher usage / growth than anticipated). I don’t disagree that their long play is monopolistic pricing, but what we’re observing seems better explained by the fact they have a very tight compute budget they are trying to optimize over to put as much as they can into next gen experiments / training to make sure they stay competitive over the next 6-months / year.
  
  lukewarm707 ・ 21 hours ago
  
  you know once, anthropic was supposed to be a public benefit org!
- zachrip ・ a day ago
  
  Where did they say the prompt cache is shortened?
  
  visarga ・ a day ago
  ・ 2 more
  
  from 1h to 5 minutes, was in the news recently
  
  lukan ・ a day ago
  
  But not as a solid fact?
  The HN thread in question is here (and had that info edited out of the title)
  https://news.ycombinator.com/item?id=47736476
mvkel ・ 2 days ago

Why did this become an issue seemingly overnight when 1M context has been available for a while, and I assume prompt caching behavior hasn't changed?
EDIT: prompt caching behavior -did- change! 1hr -> 5min on March 6th. I'm not sure how starting a fresh session fixes it, as it's just rebuilding everything. Why even make this available?
It feels like the rules changed and the attitude from Anth is "aw I'm sorry you didn't know that you're supposed to do that." The whole point of CC is to let it run unattended; why would you build around the behavior of watching it like a hawk to prevent the cache from expiring?
- bcherny ・ 2 days ago
  
  > 1hr -> 5min on March 6th
  This is not accurate. The main agent typically uses a 1h cache (except for API customers, which can enable 1h but it is not on by default because it costs more). Sub-agents typically use a 5m cache.
  
  throwdbaaway ・ 2 days ago
  ・ 2 more
  
  https://github.com/anthropics/claude-code/issues/46829#issue... - Have you checked with your colleague? (and his AI, of course)
  
  fluidcruft ・ 2 days ago
  
  Doesn't what's said at the link approximately agree? The 5m bug was said to be isolated to use of overage (API billing).
  
  mvkel ・ 2 days ago
  
  Then my original question stands: why did this become an issue seemingly overnight if nothing changed?
  
  aaronblohowiak ・ 2 days ago
  ・ 4 more
  
  So if I run a test suite or compile my rust program in a sub agent I’m going to get cache misses? Boo.
  
  skeledrew ・ 2 days ago
  ・ 3 more
  
  Sub agents don't have much context and don't stay around for long, so misses in that case are trivial.
  
  HumanOstrich ・ 2 days ago
  
  As of yesterday subagents were often getting the entire session copied to them. Happened to me when 2 turns with Claude spawned a subagent, caused 2 compactions, and burned 15% of my 5-hour limit (Max 5x).
  
  aaronblohowiak ・ a day ago
  
  how long they stay around after the cache miss is irrelevant if I am burning all the prior tokens again. also, how much context they have depends entirely on the task and your workflow. I you have a subagent implement a feature and use the compile + test loop to ensure it is implemented correctly before a supervisor agent reviews what was implemented vs asked then yes, subagents do have a lot of context.
  
  highd ・ 2 days ago
  ・ 2 more
  
  ... so how do API users enable 1hr caching? I haven't found a setting anywhere.
  
  g4cg54g54 ・ 2 days ago
  
  would like to know this too ;D
  there is env.ENABLE_PROMPT_CACHING_1H_BEDROCK - but that is - as the name says "when using Bedrock"
  for the raw API the docs are also clear -> "ttl": "1h" https://platform.claude.com/docs/en/build-with-claude/prompt...
  but how to make claude-code send that when paying by API-key? or when using a custom ANTHROPIC_BASE_URL? (requests will contain cache_control, but no ttl!)
j-pb ・ 2 days ago
The /clear nudge isn't a solution though. Compacting or clearing just means rebuilding context until Claude is actually productive again. The cost comes either way. I get that 1M context windows cost more than the flat per-token price reflects, because attention scales with context length, but the answer to that is honest pricing or not offering it. Not annoying UX nudges. What’s actually indefensible is that Claude is already pushing users to shrink context via, I presume, system prompt. At maybe 25% fill:
```
  “This seems like a good opportunity to wrap it up and continue in a fresh context window.”
  “Want to continue in a fresh context window? We got a lot of work done and this next step seems to deserve a fresh start!”
```
If there’s a cost problem, fix the pricing or the architecture. But please stop the model and UI from badgering users into smaller context windows at every opportunity. That is not a solution, it’s service degradation dressed as a tooltip.
- foota ・ a day ago
  
  The cost issues they're seeing (at least from what they've stated) are from users, not internally. Basically, it takes either $5 or $6.25 (depending on 5m or 1h ttl) to re-ingest a 1M context length conversation into cache for opus 4.6, that's obviously a very high cost, and users are unhappy with it.
  I think 400k as a default seems about right from my experience, but just having the ability to control it would be nice. For the record, even just making a tool call at 1M tokens costs 50 cents (which could be amortized if multiple calls are made in a round), so imo costs are just too high at long context lengths for them to be the default.
- g4cg54g54 ・ a day ago
  
  currently "clear makes it worse" https://github.com/anthropics/claude-code/issues/47098 + https://github.com/anthropics/claude-code/issues/47107
  launching with `CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1 claude "Hello"` till those are fixed seems to be th way
rawicki ・ 2 days ago

For me definitely the worst regression was the system prompt telling claude to analyze file to check if it's malware at every read. That correlates with me seeing also early exhausted quotas and acknowledgments of "not a malware" at almost every step.
It is a horrible error of judgement to insert a complex request for such a basic ability. It is also an error of judgement to make claude make decisions whether it wants to improve the code or not at all.
It is so bad, that i stopped working on my current project and went to try other models. So far qwen is quite promising.
- bcherny ・ 2 days ago
  
  I don't think that's accurate. The malware prompt has been around since Sonnet 3.7. We carefully evaled it for each new model release and found no regression to intelligence, alongside improved scores for cyber risk. That said, we have removed the prompt for Opus 4.6 since it no longer needed it.
  
  rawicki ・ 2 days ago
  ・ 12 more
  
  I started seeing "not a malware, continuing" in almost every reply since around 2 weeks ago. Maybe you just reintroduced it with some regression? Opus 4.6
  
  bcherny ・ 2 days ago
  ・ 8 more
  
  That's weird. Would you mind running /feedback and sharing the id here next time you see this? I'd love to debug
  
  rawicki ・ 2 days ago
  ・ 5 more
  
  Sure, I really appreciate you looking at this.
  a6edd0d1-a9ed-4545-b237-cff00f5be090 / https://github.com/anthropics/claude-code/issues/47027
  I'm happy to provide any other info that can be useful (as long as i'm not sharing any information about the code or tools we use into a public github issue).
  
  bcherny ・ 2 days ago
  ・ 3 more
  
  Thanks for the report! This was fixed in v2.1.92.
  Please:
  1. Upgrade to the latest: claude update (seems like you did this already)
  2. Start a new conversations (resuming an old convo may trigger this bug again in that convo)
  
  egamirorrim ・ 2 days ago
  
  This is bloody great Boris. Thank you.
  
  undefined ・ 2 days ago
  
  [deleted]
  
  bcherny ・ 2 days ago
  
  Thank you! Looking
  
  obrajesse ・ 2 days ago
  
  I’ve seen this a couple of times recently. Including right after compact. I’ll /feedback it next time I see it
  
  ElFitz ・ a day ago
  
  Same. Will run it too when I next get it.
  
  bavell ・ 2 days ago
  
  I've been using CC a decent amount the past few weeks and have never seen this malware stanza...?
  
  echelon ・ 2 days ago
  ・ 2 more
  
  1. I've never seen this. Is there a config option to unhide it if it's happening? Is this in Claude Code? Does it have to be set to verbose or something?
  2. Can we pay more/do more rigorous KYC to disable it if it's active?
  
  bcherny ・ 2 days ago
  
  This warning is not enabled for modern models. No action needed. I'm digging into the report above as soon as they're able to /feedback.
yumraj ・ 2 days ago

> Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead
I don’t understand this. I frequently have long breaks. I never want to clear or even compact because I don’t want to lose the conversations that I’ve had and the context. Clearing etc causes other issues like I have to restate everything at times and it misses things. I do try to update the memory which helps. I wish there was a better solution than a time bound cache
- cowwoc2020 ・ 2 days ago
  
  Makes me wish that shortly before the server-side expiration, we could save the cache on the client-side, indefinitely.
  But my understanding is that we're talking about ~60GB of data per session, so it sounds unrealistic to do...
  
  yumraj ・ a day ago
  ・ 4 more
  
  Where are you getting 60GB from? It shouldn’t be that large.
  But yes, would love to save context/cache such that it can be played back/referred to if needed.
  /compact is a little black box that I just have to trust that is keeping the important bits.
  
  davmre ・ a day ago
  ・ 3 more
  
  The KV cache consists of activation vectors for every attention head at every layer of the model for every token, so it gets quite large. ChatGPT also estimates 60-100GB for full token context of an Opus-sized model:
  https://chatgpt.com/share/69dc5030-268c-83e8-92c2-6cef962dc5...
  
  CraigRood ・ a day ago
  
  That is actually nuts.... I'm trying to understand the true costs of AI, wonder how I plug this in!
  
  visarga ・ a day ago
  
  There are ways to quantize or compress KV cache down.
  
  porridgeraisin ・ 21 hours ago
  
  I wanted this as well. Even asked about it at an openai talk. Basically a way to get the KV cache to the client (they can encrypt it if they care about me REing it, make a compressed latent if they don't wanna egress 20GB, whatever, I'm fine with a black box) so that I can load it later and avoid these cache misses.
  I think the primary reason they cannot do this is that they change the memory and communication layouts in their serving stack rather aggressively. And naturally keeping the KV cache portable across all such layouts is a very difficult task. So you'd have to version the cache down to a specific deployment, and invalidate it the moment anything even small changes. So giving the user a handle to the cache sort of prevents you from making large changes to memory layout. Which is I suppose not that enticing. Also, client side KV caches are only meaningful in today's 1M contexts. Few y back it wasn't necessary, since just recomputing would be better for everybody.
  To be clear, I don't mean they send it along with every request. Rather, they do their current TTL cache, and then when I'm at the end of a session, I request it in one shot and then close the session. And it doesn't have to come to the literal client, they can egress it to a storage service that we pay for, whatever. But ya the compat problem makes it all a non starter.
throwaway2027 ・ 2 days ago

I don't want a nudge. I want a clear RED WARNING with "You've gone away from your computer a bit too long and chatted too much at the coffee machine. You're better off starting a new context!"
- senko ・ a day ago
  
  I don’t want a scary red message chastising me for not being responsive enough!
  I often leave CC hanging (or even suspended) and use /resume a lot. I’m okay with that having some negative effect on my token limits.
  Product design is hard. They can’t please us all. I don’t envy the team considering these trade offs.
  
  sharts ・ a day ago
  
  Is it that hard though? This kinda smacks of no research on users prior to rolling stuff out.
- bcherny ・ 2 days ago
  
  Ack, it is currently blue but we can make it red
  
  oezi ・ a day ago
  
  I think after the TTL expires the session should be autocompacted and the user should given a choice to continue with compacted version or be hit with the full read cost of continuing with their large but expired context. At the moment users are blind what is going on.
- SpaceNoodled ・ 2 days ago
  
  Why is nobody even asking why that should be an issue? No other text editor shits the bed that way. The whole point of the computer is that it patiently waits for my input.
  
  GeoAtreides ・ 2 days ago
  ・ 3 more
  
  let me put this way: not your ram, not your cache, not waiting patiently for your input.
  
  SpaceNoodled ・ 2 days ago
  
  Good thing they're not charging for it, then.
  
  subscribed ・ a day ago
  
  Good thing they didn't silently, quietly change cache from 1 hour to 5 minutes, right?
- smrtinsert ・ a day ago
  
  forget the warning, just compact like someone suggested in the ticket. Who would opt for a massive cache miss?
avree ・ 2 days ago

Hey Boris - why is the best way to get support making a Hacker News or X post, and hoping you reply? Why does Anthropic Enterprise Support never respond to inquiries?
- egamirorrim ・ 2 days ago
  
  I mean if we're building an unrelated wishlist... Can 20x max users get auto mode already? Or can the enterprise plans get something equivalent to 20x max?
  Given I'm running two max accounts to get the usage I want, can we get a 25x and 40x tier? :-)
  
  oefrha ・ a day ago
  
  It’s called /extra-usage and they really want you to use it.
denysvitali ・ 2 days ago

OpenAI (Codex) keeps on resetting the usage limits each time they fuck up...
I have yet to see Anthropic doing the same. Sorry but this whole thing seems to be quite on purpose.
- weird-eye-issue ・ 2 days ago
  
  Can you clearly state what they messed up?
  
  tigershark ・ a day ago
  ・ 6 more
  
  Suddenly burning up the quota ~4x faster than usual is not a mess up in your opinion?
  
  weird-eye-issue ・ a day ago
  ・ 3 more
  
  It is not inherently their fault though because usage is controlled both by the user and the harness behavior. So I was asking specifically what about the harness was messed up, can you provide that info?
  
  subscribed ・ a day ago
  ・ 2 more
  
  It's all there, including the specific version regression, unearthed bugs, workarounds: https://github.com/anthropics/claude-code/issues/45756
  
  weird-eye-issue ・ a day ago
  
  [flagged]
  
  yokoprime ・ a day ago
  ・ 2 more
  
  [flagged]
  
  subscribed ・ a day ago
  
  LOL, funny how you're so happy to dismiss dozens of reports with hard data, and confirmed by the Claude Code team member.
  Issue with the confirmation: https://github.com/anthropics/claude-code/issues/45756
  Looks like you have an axe to grind and facts be damned? :D
  
  nodja ・ 2 days ago
  ・ 3 more
  
  Not parent but I can guess from watching mostly from the sidelines.
  They introduced a 1M context model semi-transparently without realizing the effects it would have, then refused to "make it right' to the customer which is a trait most people expect from a business when they spend money on it, specially in the US, and specially when the money spent is often in the thousands of dollars.
  Unless anthropic has some secret sauce, I refuse to believe that their models perform anywhere near the same on >300k context sizes than they do on 100k. People don't realize but even a small drop in success rate becomes very noticeable if you're used to have near 100%, i.e. 99% -> 95% is more noticeable than 55% -> 50%.
  I got my first claude sub last month (it expires in 4 days) and I've used it on some bigish projects with opencode, it went from compacting after 5-10 questions to just expanding the context window, I personally notice it deteriorating somewhere between 200-300k tokens and I either just fork a previous context or start a new one after that because at that size even compacting seems to generate subpar summaries. It currently no longer works with opencode so I can't attest to how it well it worked the past week or so.
  If the 1M model introduction is at fault for this mass user perception that the models are getting worse, then it's anthropics fault for introducing confusion into the ecosystem. Even if there was zero problems introduced and the 1M model was perfect, if your response when the users complain is to blame it on the user, then don't expect the user will be happy. Nobody wants to hear "you're holding it wrong", but it seems that anthropic is trying to be apple of LLMs in all the wrong ways as well.
  
  atonse ・ 2 days ago
  
  I still love Claude and nothing but a ton of respect for Boris and the team building such a phenomenal product.
  That said, I feel that things started to feel a bit off usage-wise after the introduction of 1M context.
  I'd personally be happy to disable it and go back to auto-compacting because that seems to have been the happy medium.
  
  logicchains ・ 2 days ago
  
  Especially since Codex faced the same issue but the team decided to explicitly default to only ~200k context to avoid surprises and degradation for users.
- losteric ・ 2 days ago
  
  [flagged]
  
  mlinsey ・ 2 days ago
  
  Different users do seem to be encountering problems or not based on their behavior, but for a rapidly-evolving tool with new and unclear footguns, I wouldn't characterize that as user error.
  For example, I don't pull in tons of third-party skills, preferring to have a small list of ones I write and update myself, but it's not at all obvious to me that pulling in a big list of third-party skills (like I know a lot of people do with superpowers, gstack, etc...) would cause quota or cache miss issues, and if that's causing problems, I'd call that more of a UX footgun than user error. Same with the 1M context window being a heavily-touted feature that's apparently not something you want to actually take advantage of...
  
  denysvitali ・ 2 days ago
  ・ 3 more
  
  Me and my colleagues faced, over the last ~1 month or so, the same issues.
  With a new version of Claude Code pretty much each day, constant changes to their usage rules (2x outside of peak hours, temporarily 2x for a few weeks, ...), hidden usage decisions (past 256k it looks like your usage consumes your limits faster) and model degradation (Opus 4.6 is now worse than Opus 4.5 as many reported), I kind of miss how it can be an user error.
  The only user error I see here is still trusting Anthropic to be on the good side tbh.
  If you need to hear it from someone else: https://www.youtube.com/watch?v=stZr6U_7S90
  
  bcherny ・ 2 days ago
  ・ 2 more
  
  > past 256k it looks like your usage consumes your limits faster
  This is false. My guess is what is happening is #1 above, where restarting a stale session causes a 256k cache miss.
  That said, I hear the frustration. We are actively working on improving rate limit predictability and visibility into token usage.
  
  tetraodonpuffer ・ 2 days ago
  
  just like everybody else I and my colleagues at work have seen major regressions in terms of available usage over the past month, seemingly unrelated to caching/resuming. On an enterprise sub doing the same work I personally went from being able to have several sessions running concurrently without hitting limits, to only having one session at a time and hitting my 5h every day twice a day in 3-4 hours tops (and due to the apparent lower intelligence I have been at the terminal watching what opus is doing like a hawk, so it's not a I went for coffee I have to hit the cache). The first day I ever hit my 5h this year was the day everybody reported it (I think it was the Monday you introduced the 2x promotion after hours? not sure, like 3 weeks ago?)
  To avoid 1M issues, this week I have also intentionally used the 256k context model, disabled adaptive thinking and did the same "plans in multiple short steps with /clear in-between" to minimize context usage, and yet nothing helps. It just feels ~2x to ~3x less tokens than before, and a lot less smart than in February.
  Nowadays every time I complete a plan I spend several sessions afterwards saying things like "we have done plan X, the changes are uncommitted, can you take a look at what we did" and every time it finds things that were missed or outright (bad) shortcuts/deviations from plan despite my settings.json having a clear "if in doubt ask the user, don't just take the easy way out". As a random data point, just today opus halfway through a session told me to make a change to code inside a pod then rollout restart it to use said change, and when called out on it it of course said that I was right and of course that wouldn't work...
  It is understandable that given your incredible growth you are between a rock and a hard place and have to tweak limits, compute does not grow on trees, but the consistent "you are holding it wrong" messaging is not helpful. I am wondering if realistically your only option is to move everybody to metered, with clear token usage displayed, and maybe have pro/max 5/max 20 just be a "your first $x of tokens is 50/75% off". Allow folks to tweak the thinking budget, and change the system prompt to remove things like "try the easy solution first" which anecdotally has been introduced in the past while, and allow users to verify on prompt if the prompt would cause the whole context to be sent or if cache is available.
  
  mvkel ・ 2 days ago
  
  Why did it suddenly become an issue, despite prompt caching behavior being unchanged?
  
  ScoobleDoodle ・ 2 days ago
  
  PEBKAC: Problem Exists Between Keyboard And Chair
  
  extr ・ 2 days ago
  
  Yes same here. I use CC almost constantly every day for months across personal and work max/team accounts, as well as directly via API on google vertex. I have hardly ever noticed an issue (aside from occasional outages/capacity issues, for which I switch to API billing on Vertex). If anything it works better than ever.
  
  varispeed ・ 2 days ago
  
  You know that people are not using the same resources? It's like 9 out of 10 computers get borked and you have the 1 that seems okay and you essentially say "My computer works fine, therefore all computers work fine." Come on dude.
- Madmallard ・ 2 days ago
  
  Money money money money
brokencode ・ 2 days ago

Would it be possible to increase the cache duration if misses are a frequent source of problems?
Maybe using a heartbeat to detect live sessions to cache longer than sessions the user has already closed. And only do it for long sessions where a cache miss would be very expensive.
- bcherny ・ 2 days ago
  
  Yes, we're trying a couple of experiments along these lines. Good intuition.
sunir ・ a day ago

I suspect 1M token context is questionable value because of the secondary effect of burning quota vs getting work done.
I think the model select that let me choose 1M made sense because I could decide if I was working on large documents and compacting more often was more effective.
cowwoc2020 ・ 2 days ago

Boris,
Even if Anthropic is working in good faith to lower infrastructure costs, developers need more than 5 minutes to notice that CC completed a task, review its changes and ask it to merge. Only developers who do not review code changes can live with such a TTL...
Consider making this value configurable as the ideal TTL value is different for each person. If people are willing to pay more for 30 minutes TTL than 5 minutes, they should be able to.
- undefined ・ a day ago
  
  [deleted]
8note ・ 2 days ago

> Since Claude Code uses a 1 hour prompt cache window for the main agent
this seems a bit awkward vs the 5 hour session windows.
if i get rate limited once, I'll get rate limited immediately again on the same chat when the rate limit ends?
any chance we can get some form of deffered cache so anything on a rate limited account gets put aside until the rate limit ends?
apgwoz ・ 2 days ago

As another data point, I pay for Pro for a personal account, and use no skills, do nothing fancy, use the default settings, and am out of tokens, with one terminal, after an hour. This is typically working on a < 5,000 line code base, sometimes in C, sometimes in Go. Not doing incredibly complicated things.
yummytummy ・ 2 days ago

Ah, so cache usage impacts rate limits. There goes the ”other harnesses aren’t utilizing the cache as efficiently” argument.
- bcherny ・ 2 days ago
  
  Claude Code is the most prompt cache-efficient harness, I think. The issue is more that the larger the context window, the higher the cost of a cache miss.
  
  simsla ・ 2 days ago
  
  I do wonder if it's fair to expect users to absorb cache miss costs when using Claude Code given how untransparent these are.
  
  beacon294 ・ 2 days ago
  
  Politely, no.
  - I wrote an extension in Pi to warm my cache with a heartbeat.
  - I wrote another to block submission after the cache expired (heartbeats disabled or run out)
  - I wrote a third to hard limit my context window.
  - I wrote a fourth to handle cache control placement before forking context for fan out.
  - my initial prompt was 1000 tokens, improving cache efficiency.
  Anthropic is STOMPING on the diversity of use cases of their universal tool, see you when you recover.
  
  yummytummy ・ 2 days ago
  ・ 5 more
  
  That might be, but the argument was that poor cache utilization was costing Anthropic too much money in other harnesses. If cache is considered in rate limits, it doesn’t matter from a cost perspective, you’ll just hit your rate limits faster in other harnesses that don’t try to cache optimize.
  
  bcherny ・ 2 days ago
  ・ 4 more
  
  There were two issues with some other 3p harnesses:
  1. Poor cache utilization. I put up a few PRs to fix these in OpenClaw, but the problem is their users update to new versions very slowly, so the vast majority of requests continued to use cache inefficiently.
  2. Spiky traffic. A number of these harnesses use un-jittered cron, straining services due to weird traffic shape. Same problem -- it's patched, but users upgrade slowly.
  We tried to fix these, but in the end, it's not something we can directly influence on users' behalf, and there will likely be more similar issues in the future. If people want to use these they are welcome to, but subscriptions clients need to be more efficient than that.
  
  SyneRyder ・ 2 days ago
  ・ 2 more
  
  How much jitter would you prefer, how many seconds / minutes out? I have some morning tasks that run while I'm asleep via claude -p, and it sounds like I'm slightly contributing to your spikes (presumably hourly and on quarter hours).
  
  Deathmax ・ 2 days ago
  
  There's prior art from Claude's own scheduled tasks' jitter: https://code.claude.com/docs/en/scheduled-tasks#jitter
  > Recurring tasks fire up to 10% of their period late, capped at 15 minutes. An hourly job might fire anywhere from :00 to :06.
  > One-shot tasks scheduled for the top or bottom of the hour fire up to 90 seconds early.
  
  dollspace ・ 2 days ago
  
  If you give doll a list of things you want to see from third party harnesses, a compliance checklist it will make sure the one it is building follows it to the letter.
  
  eastbound ・ 2 days ago
  
  I’m sorry but when you wake up in the morning with 12% of your session used, saying “it’s the cache” is not an appropriate answer.
  And I’m using Claude on a small module in my project, the automations that read more to take up more context are a scam.
mercnz ・ 7 hours ago

it seems if context can't be held for over an hour it should warn you a countdown or such; i already enabled the tokens verbosity thing to see what token level i'm at, but i often leave things sitting rather than complete so that i'm tying things up to start something new in the morning rather than starting on a new thing. so like i just resumed a session that was near-complete, and now it's gone and reloaded all that session in? bit i hadn't detached it. i kind of thougth /summary itself had to read the whole token flow, but that the token context was held locally for some reason..
jiwidi ・ 2 days ago

Hi Boris,
Long term claude code user here. Is the first time i've had to setup a hook to codex to review claude output.
Is hallucinating like never before
Is missing key concepts/instructions in context like never before
Is writing bad code that will "pass test" much more. Before it use to try be critic and do good code, now it will try to hack test and bypass intructions for a green pass.
fps-hero ・ 2 days ago

Am I so out of touch?
No! It’s the children who are wrong!
- rimliu ・ a day ago
  
  you are prompting it wrong
samuelknight ・ 2 days ago

Have you considered poking the cache?
When a user walks away during the business day but CC is sitting open, you can refresh that cache up to 10x before it costs the same as a full miss. Realistically it would be <8x in a working day.
FabDee ・ 15 hours ago

One thing I didn't see anywhere here, except your mention about pulling in large number of skills, is that the token consumption is significantly higher for users with many agents, skills, and MCPs installed, and many are mere ghosts. The 5m TTL from #46829 compounds the effect: in my case, I found ~20k tokens of ghost context I hadn't intentionally opened. Each idle period after 5m wastes that as a full cache miss.
Boris, would you please confirm on-record: is the current cache TTL for the main agent context 1h or 5m? Issue #46829 was closed as "not planned".
taspeotis ・ a day ago

Hi, thanks for Claude Code. I was wondering though if you'd considering adding a mode to make text green and characters come down from the top of the screen individually, like in The Matrix?
- visarga ・ a day ago
  
  No, I want a little monkey doing tricks. /s
  
  taspeotis ・ a day ago
  
  This guy? https://en.wikipedia.org/wiki/BonziBuddy
anoazian ・ a day ago

I’ve seen the /clear command prompt and I found the verbiage to be a bit unclear. I think clarifying that the cache has expired and providing an understandable metric on the impact - ie “X% of your 5-hour window” for Pro/Mad users and details on token use for API users. A pop-up that requires explicit acknowledgment might also help, although that could be more of an annoyance to enterprise users.
One pattern I use frequently is using one high level design and implementation agent that I’ll use for multiple sessions and delegate implementation to lower level agents.
In this case it’d be helpful to have one of two options:
1. If Claude CLI could create an auto compaction of the conversation history before cache expiration. For example, if I’m beyond X minutes or Y prompts in a conversation and I’ve been inactive for a threshold it could auto-compact close to the expiration and provide that as an option on resume. 2. If I could configure cache expiration proactively and Anthropic could use S3 or a similar slow load mechanism to offload the cache for a longer period - possibly 24-72h.
I can appreciate that longer KV cache expiration would complicate capacity management and make inference traffic less fungible but I wouldn’t mind waiting seconds to minutes for it to load from a slower store to resume without quota hits.
guybedo ・ 10 hours ago

you should check with people working on Claude Code, cache has been udpated to 5min ... https://github.com/anthropics/claude-code/issues/46829#issue...
So yeah, 1M window that expires every 5min .... not good
danmaz74 ・ 2 days ago

Could we get an option to use Opus with a smaller context window? I noticed that results get much worse way earlier than when you reach 1M tokens, and I would love to have a setting so that I could force a compaction at eg 300k tokens.
- SyneRyder ・ 2 days ago
  
  You probably just missed it in his post, but:
  "To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude."
  Maybe try changing the 4 to a 3 and see if that works for you?
  
  danmaz74 ・ a day ago
  
  Thank you, will definitely try that!
999900000999 ・ 2 days ago

You've created quite a conundrum.
The only people who are going to run into issues are superpower users who are running this excessively beyond any reasonable measure.
Most people are going to be quite happy with your service. But at the same time, and this is just a human nature thing people are 10 times more likely we complain about an issue than to compliment something working well.
I don't know how to fix this, but I strongly suspect this isn't really a technical issue. It's more of a customer support one.
KronisLV ・ 2 days ago

> defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred
This seems really useful!
I'm surprised that "Opus 4.6" (200K) and "Opus 4.6 1M" are the only Opus options in the desktop app, whereas in the CLI/TUI app you don't seem to even get that distinction.
I bet that for a lot of folks something like 400k, 600k or 800k would work as better defaults, based on whatever task they want to work on.
ramon156 ・ 2 days ago

Boris, wasnt this the same thing ~2 weeks ago? Is it the same cache misses as before? What's the expected time till solved? Seems like its taking a while
je42 ・ 18 hours ago

Does this 60min ttl of also apply to claude code web?
I have regularly sessions open for multiple days.
Is that a pattern that is not advised?
cmaster11 ・ a day ago

Thank you for your responses, especially on a Sunday. They give us some insights and at least a couple temporary workarounds to use, while the issues are being addressed :) much appreciated
earino ・ 2 days ago

Hello Boris! How do I increase the 1 hour prompt cache window for the main agent? I would love to be able to set that to, say, 4 hours. That gives me enough time to work on something, go teach a class, grab a snack, and come back and pick up where I left off.
- subscribed ・ a day ago
  
  Another CC team member confirmed it's 5 minutes now, not 1 hour.
  See the links in https://news.ycombinator.com/item?id=47747209
ahofmann ・ 2 days ago

Resizing the context window seems like a very good idea to me. I noticed a decline of productivity when the 1M context window was released and I'd like to bring it back to 200k, because it was totally fine for the things I was working on.
mmd45 ・ 2 days ago

shouldn't compaction be interactive with the user as to what context will continue to be the most relevant in the future??? what if the harness allowed for a turn to clarify the user's expected future direction of the conversation and did the consolidation based upon the addition info?
there definitely seems to be a benefit to pruning the context and keeping the signal to noise high wrt what is still to be discussed.
richardjennings ・ a day ago

/loop message ping every 4 minutes
keeps the cache warm while the CC REPL is not active.
re-thc ・ 2 days ago

> To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session)
Is this really an improvement? Shouldn't this be something you investigate before introducing 1M context?
What is a long stale session?
If that's not how Claude Code is intended to be used it might as well auto quit after a period of time. If not then if it's an acceptable use case users shouldn't change their behavior.
> People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins.
If this was an issue there should have been a cap on it before the future was released and only increased once you were sure it is fine? What is "a large number"? Then how do we know what to do?
It feels like "AI" has improved speed but is in fact just cutting corners.
hughw ・ 2 days ago

Where can i learn about concepts like prompt cache misses? I don't have a mental model how that interacts with my context of 1M or 400k tokens... I can cargo cult follow instructions of course but help us understand if you can so we can intelligently adapt our behavior. Thanks.
- bcherny ・ 2 days ago
  
  The docs are a good place to start: https://platform.claude.com/docs/en/build-with-claude/prompt...
  
  snthpy ・ 2 days ago
  ・ 2 more
  
  Thanks. Just noting that those docs say the cache duration is 5 min and not 1 hour as stated in sibling comment:
  > By default, the cache has a 5-minute lifetime. The cache is refreshed for no additional cost each time the cached content is used. > > If you find that 5 minutes is too short, Anthropic also offers a 1-hour cache duration at additional cost.
  
  yoaviram ・ 2 days ago
  
  Apparently Anthropic downgraded cache TTL to 5 min without telling anyone. My biggest issue with the recent issues with Claude Code is the lack transparency, although it looks like even Boris doesn't know about one: https://news.ycombinator.com/item?id=47736476
- hughw ・ 2 days ago
  
  And why does /clear help things? Doesn't that wipe out the history of that session? Jeez.
  
  CWwdcdk7h ・ a day ago
  
  [dead]
tigershark ・ a day ago

Claude Code cache is not 1 hour. There is a "Closed as not planned" issue in GitHub that confirms that it has been moved to 5 minutes since March: https://github.com/anthropics/claude-code/issues/46829. I started seeing the massive degradation exactly on the 23rd of March, hence after a few days I unsubscribed because it was completely unusable, with a ~5h session being depleted in as little as 15-20 mins.
- subscribed ・ a day ago
  
  Looks like the cache change to 5 minutes was so secretive that even CC team doesn't know that.
  Or someone just vibe coded "Hey, Claude, make them burn allowances quicker" and merged without telling anyone.
  Both are plausible to me.
docheinestages ・ 2 days ago

Why are you all of a sudden running into so many issues like this? Could it be that all of the Anthropics employees have completely unlimited and unbounded accounts, which means you don't get a feeling of how changes will affect the customers?
- bcherny ・ 2 days ago
  
  The number of people using Claude Code has grown very quickly, which means:
  - More configurations and environments we need to test
  - Given an edge/corner case, it is more likely a significant number of users run into it
  - As the ecosystem has grown, more people use skills and plugins, and we need to offer better tools and automation to ensure these are efficient
  We do actually dogfood rate limits, so I think it's some combination of the above.
  
  gozucito ・ 2 days ago
  
  I think the suspicion regarding skills and plugins is fair and logical. And it is absolutely the case that some use significantly more tokens.
  with that said, on my 5x plan, I could have multiple sessions working and the limit was far away. Around when you introduced the whole more tokens during off-peak hours and fewer tokens during working US hours, Even with a single session, using no plugins at all (I uninstalled OMC) I run into limits very often.
  I have not performed any rigorous tests but it feels like I have about 25% of what I used to have or less. This is all without using teams of agents, or ralph loops or anything like that. Just /plan and execute in a single session. I have restored the /clear context before executing plan to try and mitigate things. I will also try the 400k context since, in my experience, the 1M tokens have not made Opus 4.6 noticeably smarter for my small webapp use-case.
  Best of luck to you!
  ps: whenever you introduce a change, please make it optional AND ask the user about it at first. Don't just yank things suddenly (like the /clear context and apply plan option.) as I spent hours trying to figure out how I broke it before I saw your note and how to re-enable it.
  
  rimliu ・ a day ago
  
  With the quality trends this issue of too many users will fix itself soon.
  
  sharts ・ a day ago
  
  How do ya’ll test?
- nothinkjustai ・ 2 days ago
  
  Because it’s completely vibe coded? And the codebase goes through massive churn, which means things that were stable get rewritten possibly with bugs.
  
  egamirorrim ・ 2 days ago
  ・ 2 more
  
  You can get Claude Code to write tests too...
  
  rimliu ・ a day ago
  
  Writing tests is easy. Writing useful tests is not so easy.
_fizz_buzz_ ・ 2 days ago

I have a feature request: I build an mcp server, but now it has over 60 tools. Most sessions i really don’t need most of them. I suppose I could make this into several servers. But it would maybe be nice to give the user more power here. Like let me choose the tools that should be loaded or let me build servers that group tools together which can be loaded. Not sure if that makes sense …
throwpoaster ・ 2 days ago

Have you tried asking Mythos for a fix?
g4cg54g54 ・ 2 days ago

from looking at the raw requests, that cant seem right?
its all "cache_control": { "type": "ephemeral" } there is no "ttl" anywhere.
// edit: cc_version=2.1.104.f27
fluidcruft ・ 2 days ago

How can we turn of 1m context? I don't find it has ever helped.
- mwigdahl ・ 2 days ago
  
  He mentioned this in his original comment:
  "CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000"
  
  fluidcruft ・ a day ago
  
  There's also CLAUDE_CODE_DISABLE_1M_CONTEXT and I'm really not clear on what the difference is and why to pick one over the other. But I guess one disables models that have 1m and the other keeps those models but sets the limit lower?
jauntywundrkind ・ 2 days ago

There's an issue someone raised showing that prompt caches are only 5 minutes.
The reply seems to be: oh huh, interesting. Maybe that's a good thing since people sometimes one-shot? That doesn't feel like the messaging I want to be reading, and the way it conflicts with the message here that cache is 1 hour is confusing.
https://news.ycombinator.com/item?id=47741755
Is there any status information or not on whether cache is used? It sure looks like the person analyzing the 5m issue had to work extremely hard to get any kind of data. It feels like the iteration loop of people getting better at this stuff would go much much better if this weren't such a black box, if we had the data to see & understand: is the cache helping?
- TheTaytay ・ a day ago
  
  Aren’t they saying that it’s 5minutes for things like subagents (that wouldn’t benefit from it?)
j45 ・ 2 days ago

Pulling all the skills and agents in the world in, when unused are a big hit. I deleted all of mine and added back as needed and there was an improvement.
Running Claude Cowork in the background will hit tokens and it might not be the most efficient use of token use.
Last, but not least, turning off 1M token context by default is helpful.
miroljub ・ a day ago

People, just switch to MiniMax and ditch CC completely. It's not worth it any more.
smrtinsert ・ a day ago

Number 2 makes me chuckle honestly. Too many people going down the 10x rabbit holes on youtube. Next up, a framework that 100xs your workflow. You know its good because it comes with 300 agents and 20 mcp servers and 1200 skills
undefined ・ 2 days ago

[deleted]
varispeed ・ 2 days ago

Can you explain why Opus 4.6 suddenly becomes dumb as a sack of potatoes, even if context is barely filled?
Can you explain why Opus 4.6 will be coming up with stupid solutions only to arrive at a good one when you mention it is trying to defraud you?
I have a feeling the model is playing dumb on purpose to make user spend more money.
This wasn't the case weeks ago when it actually working decently.
undefined ・ 2 days ago

[deleted]
dkersten ・ 2 days ago

Eh you say that every time and yet it keeps happening.
EGreg ・ 2 days ago

Boris, is the KV cache TTL now reduced to 5 minutes from 1 hour?
I think this may be the biggest concern for people building tools on the API: https://github.com/anthropics/claude-code/issues/46829
I would argue that KV caching is a net gain for Ant and a well-maintained cache is the biggest thing that can generate induced demand and a thriving third party ecosystem. https://safebots.ai/papers/KV.pdf
throwpoaster ・ 2 days ago

Wait what? If I get told to come back in three hours because I'm using the product too much, I get penalized when I resume?
What's the right way to work on a huge project then? I've just been saying "Please continue" -- that pops the quota?
accounting2026 ・ 2 days ago

[flagged]
foofloobar ・ a day ago

[flagged]
- stingraycharles ・ a day ago
  
  This comment seems unnecessarily hostile.
  
  prmph ・ a day ago
  ・ 3 more
  
  Why?
  It seems just fine to me. This is what Anthropic needs to do if they want to survive. I'm always looking out for someone to integrate an actually good harness to a good model. Once that happens, I'm jumping ship if Anthropic keeps playing these tricks.
  It's almost unusable for me now. A simple prompt to merge 3 sub-100-line files with simple node code, on Sonnet 4.6, uses up 20% of my 5 hour quota, on a new/fresh session.
  
  foofloobar ・ a day ago
  ・ 2 more
  
  To be fair, my comment was a bit harsher before the update. The way they handle the development, communication and how they treat customers isn't fine. I've seen some angry people post and comment in manners which truly deserved the label hostile.
  The whole product with the infrastructure and Claude Code's code appear to be vibe coded.
  
  sharts ・ a day ago
  
  If they can’t infrastructure then perhaps they should offer the ability for customers to host themselves.
  
  foofloobar ・ a day ago
  
  They appear to take issues seriously mostly when they become posts on hacker news and when articles are published online by major news sites. Customer support is mostly a bot. I don't even know how to reach some actual humans to get support.
  I'm sorry if you and others are offended. They've had these issues for several weeks now. I haven't seen any real improvements during this time. I see more features and more bugs.
  There have been several releases made over the last few days without any changelogs. The quotas are still as opaque as they've been. This company has some extremely shady business practices.
  
  rimliu ・ a day ago
  ・ 2 more
  
  Do you really want HN by like Stepford Wives? Dang is already doing to good job on closing on that, no need to encourage them more.
  
  stingraycharles ・ a day ago
  
  The original poster edited his comment after my response, it was far more hostile before. So I assume my input worked.
  
  sharts ・ a day ago
  
  The hostility is all Anthropic.
MuffinFlavored ・ 2 days ago

I wish people would pay more attention to:
* Anthropic is in some way trying to run a business (not a charity) and at least (eventually?) make money and not subsidize usage forever
* "What a steal/good deal" the $100-$200/mo plans are compared to if they had to pay for raw API usage
and less on "how dare you reserve the right to tweak the generous usage patterns you open-ended-ly gave us, we are owed something!"
- lbreakjai ・ 2 days ago
  
  As an (ex) paying customer, I'm expecting some consistency. I used to be satisfied with the value I got, until the limits changed overnight, and I'd get a ten of my previous usage.
  If Anthropic is allowed to alter the deal whenever, then I'd expect to be able to get my money back, pro-rata, no questions asked.
- visarga ・ a day ago
  
  yes, $200/mo is a serious subscription, we are owed something, and I won't feel ashamed for saying that
  especially when you are told using the subagent for code review "claude -p" is now billed on API on top of $200 sub
- logicchains ・ 2 days ago
  
  All those apply to OpenAI+Codex too, but they're far more generous with limits than Anthropic, and with granting fresh limits to apologize when they fuck up.
- oskarw85 ・ a day ago
  
  [flagged]

chandureddyvari ・ 2 days ago

Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect. Then 30 minutes later I hit session limits. Three sessions like that in a day, and suddenly 25% of the weekly limit is gone.

I ended up buying the $100 Codex plan. So far it has been much more generous with usage and more accurate than Claude for the kind of work I do.

That said, Codex has its own issues. Its personality can be a bit off-putting for my taste. I had to add extra instructions in Agents.md just to make it less snarky. I was annoyed enough that I explicitly told it not to use the word “canonical.”

On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code. Claude used to have much better finesse there. But for backend logic, hard debugging, and complex problem-solving, Codex has been clearly better for me. These days I use Impeccable Skillset inside Codex to compensate for the weaker UI taste, but it still does not quite match the polish and instinct Claude Code used to have.

I used to be a huge Claude Code advocate. At this point, I cannot recommend it in good conscience.

My advice now is simple: try the $20 plans for Codex and Cursor, and see which one matches your workflow and vibes best

cedws ・ 2 days ago

I had a weird experience at work last week where Claude was just thinking forever about tasks and not actually doing anything. It was unusable. The next day it was fine again.
- mstank ・ 2 days ago
  
  That happens to me all the time. My current working theory is when their servers are hammered there is a queueing system that invisible to end-users.
  
  mirsadm ・ a day ago
  
  The way Claude/Codex behave is entirely consistent with how every vibe coded project (of mine) has ended up so far. I bet those guys have no idea what's going on and are taking guesses because no one understands the thing they've made.
  
  jatora ・ 2 days ago
  
  i was having this issue yesterday. the same prompt would send it into a loop where it would appear to be doing nothing for 30+ minutes until i cancelled it. it would show 400 tokens used and thats it.
  I tested on a previous version (2.1.68) and it still ran into this neverending loop BUT at least the token count kept steadily increasing.
  So we are seeing 1. some sort of model degredation is my guess (why it can't break a thinking loop on some problems), as well as 2. a clear drop in thinking token UI transparency.
- cjonas ・ 2 days ago
  
  Ya I've had this experience more than a few times recently. I've heard people claiming they are serving quantized models during high loads, but it happens in cursor as well so I don't think it's specific to Anthropics subscription. It could be that the context window has just gotten into a state that confuses the model... But that wouldn't explain why it appears to be temporary...
  My best guess is this is the result of the companies running "experiments" to test changes. Or it's just all in my head :)
  
  whywhywhywhy ・ 2 days ago
  ・ 2 more
  
  Cursor one is back to Claude 4 or 3.5+ at best. Struggles to do things it did effortlessly a few weeks ago.
  It’s not under load either it’s just fully downgraded. Feels more they’re dialing in what they can get away with but are pushing it very far.
  
  cjonas ・ 2 days ago
  
  These days cursor feel more capable and reliable then Claude Code (at last for my workflow). For personal projects, I'm using cursor during planning and verification but run Claude code for just implementation to save $.
- sunaookami ・ 2 days ago
  
  Set MAX_THINKING_TOKENS to 0, Claude's thinking hardly does anything and just wastes tokens. It actually often performs worse than without thinking.
  
  gruez ・ 2 days ago
  ・ 3 more
  
  Not the guy you're responding to, but when this happens the token counter is frozen at some low value (eg. 1k-10k) value as well, so it's not thinking in circles but rather not thinking (or doing anything, for that matter) at all.
  
  jatora ・ 2 days ago
  
  i was having this issue yesterday. the same prompt would send it into a loop where it would appear to be doing nothing for 30+ minutes until i cancelled it. it would show 400 tokens used and thats it. I tested on a previous version (2.1.68) and it still ran into this neverending loop BUT at least the token count kept steadily increasing.
  So we are seeing 1. some sort of model degredation is my guess (why it can't break a thinking loop on some problems), as well as 2. a clear drop in thinking token UI transparency
  when i left it running overnight it finally sent a message saying it exceeded the 64000 output token limit
  
  egeozcan ・ 2 days ago
  
  This exact thing is happening to me since yesterday. It comes back to life when I throw the whole session away.
- freedomben ・ 2 days ago
  
  This happened to me as well! It was especially infuriating because I had just barely upgraded to the $200 per month plan because I exhausted my weekly quota. Then the entire next day was a complete bust because of this issue. I want my money back!
  
  cedws ・ 2 days ago
  ・ 3 more
  
  What day was it?
  
  freedomben ・ 2 days ago
  ・ 2 more
  
  Thursday starting mid to late morning, and ended Friday night (US timezone).
  
  cedws ・ 2 days ago
  
  Same day then. It was happening for me roughly between 9am-5pm BST time.
mixermachine ・ 2 days ago

I'm using the Codex Business subscription (about 30€) already for multiple months. Even there they cut back on the quota. A few months back it was hard for me to reach the limit. Now it is easier.
Still, in comparison with Claude Code, the quota of Codex is a much better deal. However, they should not make it worse...
- throwup238 ・ 2 days ago
  
  OpenAI had a promotion that gave everyone double their rate limits until April 2nd.
  
  virgildotcodes ・ 2 days ago
  
  Promotion has been extended til May 31st for the $100 and $200 subs.
  At the same time, they’ve been giving out a ton of additional quota resets seemingly every other week (and committed to an additional reset for every million additional users until they hit 10mil on codex).
  So they’ve really set a high bar for people’s expectations on their quota limits.
  Once they drop the 2x promotion for good and stop the frequent resets, there are going to be a lot of complaints.
- wheelerwj ・ 2 days ago
  
  I have the exact opposite experience. I can run claude forever, my codex quota was done by Wednesday morning.
dataviz1000 ・ 2 days ago

> Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.
This is what I'm working on proving now.
It is more that there is a confidence score while thinking. Opus will quit if it is too high and will grind on if the confidence score is close to the real answer. Haiku handles this well too.
If you give Sonnet a hard task, it won't quit when it should.
Nonetheless, that issue has been fixed with Opus.
I'll try to show that the speed of using Opus on tasks that have medium to hard difficultly is consistently the same price or cheaper than running them with Haiku and Sonnet. While easier tasks, the busy work that is known, is cheaper run with Haiku.
- Gareth321 ・ a day ago
  
  > This is what I'm working on proving now.
  Stella Laurenzo, AMD’s director of AI, filed a detailed GitHub issue on April 2 documenting that Claude Code reads code three times less before editing it, rewrites entire files twice as often, and abandons tasks mid-way at rates that were previously zero. Her analysis of nearly 7,000 sessions puts precise numbers on how Anthropic’s coding tool has degraded since early March.
  https://github.com/anthropics/claude-code/issues/42796
onlyrealcuzzo ・ 2 days ago

> Claude has gotten noticeably worse for me too.
My experience is limited only to CC, Gemini-cli, and Codex - not Aider yet, trying different combinations of different models.
But, from my experience, CC puts everything else to shame.
How does Cursor compare? Has anyone found an Aider combination that works as well?
- chrismustcode ・ 2 days ago
  
  Is aider even a thing considered anymore?
  It was pretty much first for CLI agents and had a benchmark that was the go to at the start of LLM coding. Now the benchmark doesn't get updated and aider never gets a mention in talking about CLI tools till now.
  
  faangguyindia ・ 2 days ago
  
  Aider is dead because it's pre function calling era of tech
- undefined ・ 2 days ago
  
  [deleted]
zozbot234 ・ 2 days ago

> It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.
Give it a custom sandbox and context for the work, so it has no opportunity to roam around when not required. AI agentic coding is hugely wasteful of context and tokens in general (compared to generic chat, which is how most people use AI), there's a whole lot of scope for improvement there.
- egeozcan ・ 2 days ago
  
  But the problem is it used to not need that before. These days, you have to think twice before you summon a subagent.
  
  lelanthran ・ 2 days ago
  ・ 5 more
  
  > But the problem is it used to not need that before. These days, you have to think twice before you summon a subagent.
  This is exactly what I (and many others) kept trying to tell the pro-AI folk 18 months ago: there is no value to jumping on the product early because any "experience" you have with it is easily gained by newcomers, and anything you learned can easily be swapped out from under you anyway.
  
  egeozcan ・ a day ago
  ・ 4 more
  
  The value is all the things I built with it? Surely, this constant change deteriorates the experience but to be clear, here we're nitpicking on the experience, not questioning the value.
  I also don't understand the "pro-AI" phrase. It's a tool, it brings results. I'm not pro-car when I drive to work.
  
  lelanthran ・ a day ago
  ・ 3 more
  
  > The value is all the things I built with it?
  To be clear, the people I were talking about were not referring to the value, but the experience in using these tools.
  > I also don't understand the "pro-AI" phrase.
  Would you prefer the phrase "AI-boosters"?
  
  egeozcan ・ a day ago
  ・ 2 more
  
  Ah, okay, I must have gotten lost in the conversation. Sorry!
  > Would you prefer the phrase "AI-boosters"?
  AI-booster folk? :)
  
  lelanthran ・ a day ago
  
  > AI-booster folk? :)
  Could be; I mean, we differentiate between people who use cars as a tool and call enthusiasts "petrol-heads".
  I use AI daily, but I certainly wouldn't consider myself either pro-AI or an AI-booster.
  (Naming is hard)
- imglorp ・ 2 days ago
  
  The sandbox is fine, but if the parent has given explicit instruction of files to inspect, why is it not centering there? Is the recent breakage that the base prompt makes it always try to explore for more context even if you try to focus it?
  
  zozbot234 ・ 2 days ago
  
  Because the "explicit instruction" you give AI is not deterministic as in a normal computer program. It's a complete black box and the context is also most likely polluted by all sorts of weird stuff. Putting it on as tight of a leash as possible should be seen as normal.
  
  zarzavat ・ 2 days ago
  
  They changed plan mode so that it's instructed to follow a multi-step plan, the first step being to explore the code base. When you tell it to focus it's getting contradictory instructions from plan mode vs your prompt and it's essentially a coin flip which one it picks.
  It does seem like a cynical attempt to make more money.
cyrusmg ・ a day ago

I also gave up on my Claude Code subscription. It's running out in 2 weeks and I have canceled it. My current MAX session got rate-limited in 2 hours of work and that's just absurd.
Codex seems to give the $20 plan for free for 1 month and that's what I signed up for.
Let's see how it compares when I can't use my Claude max sub for 3 more hours.
esperent ・ a day ago

> the $100 Codex plan. So far it has been much more generous
Be aware Codex is currently doing a 2x usage promo. So 5x is actually 10x and 20x is actually 40x until the end of May.
yaur ・ 2 days ago

When they bumped the context size up to 1m tokens they made it much easier to blow through session limits quickly unless you manually compact or keep sessions short.
comboy ・ 2 days ago

Any good reasonable alternatives? Gemini is like prodigious 3yo hopeless for my projects, anybody tested some opencode with kimi or something?
- eurekin ・ 2 days ago
  
  I'm adding two extra gpus to my local rig. Turns out qwen 3.5 122b is already enough to handle (finish with moderate guidance) non-planning parts of my tasks.
  
  cdelsolar ・ a day ago
  
  what kinda gpus are you using?
zamalek ・ 2 days ago

I am also on Codex while Claude seems to be blatantly ignoring instructions (as recently as Thursday: when I made the switch). The huge Claude context helps with planning, so that's all it does now.
Codex consumes way fewer resources and is much snappier.
dvfjsdhgfv ・ 2 days ago

By the way, what are you using it for? I bought Max and Pro plans for Claue and Codex, developed a few apps with it, and after the initial excitation ("Wow I can get results 10x faster!") I felt the net sum is negative for me. In the end I didn't learn much except the current quirks of each model/tool, I didn't enjoy the whole process and the end result was not good enough for my standards. In the end I deleted all these projects and unsubscribed.
- chandureddyvari ・ 2 days ago
  
  For me it’s mostly useful in day-to-day coding, not “build an entire app and walk away” coding.
  TDD was never really my natural style, but LLMs are great at generating the obvious test cases quickly. That lets me spend more of my attention on the edge cases, the invariants, and the parts that actually need judgment.
  Frontend is another area where they help a lot. It’s not my strongest side, so pairing an LLM with shadcn/ui gets me to a decent, responsive UI much faster than I would on my own. Same with deployment and infra glue work across Cloudflare, AWS, Hetzner, and similar platforms.
  I’m basically a generalist with stronger instincts in backend work, data modeling, and system design. So the value for me is that I can lean into those strengths and use LLMs to cover more ground in the areas where I’m weaker.
  That said, I do think this only works if you’re using them as leverage, not as a substitute for taste or judgment.
varispeed ・ 2 days ago

I wonder if this is in the system prompt: "Go round in circles to make us more money."
jen20 ・ 2 days ago

> On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code.
OpenCode is great though, and can (for now) use an OpenAI subscription.
stavros ・ 2 days ago

Codex has been better for me, but it's WAY too nitpicky/defensive. It always wants to make changes that add complexity and code to solve a problem that's impossible to happen (e.g. a multiprocess race condition on a daemon I only ever run one instance of).
- comboy ・ 2 days ago
  
  You just convinced me to try it. Claude just copy pastes, does search and replace, zero abstractions and I'm the one that needs to think about the edge cases.
  
  dns_snek ・ 20 hours ago
  
  You may think that's a good thing but it's not. Codex is great at coming up with solutions to problems that don't exist and failing to find solution to problems that do. In the end you have 300 new lines of code and nothing to show for it.
  
  stavros ・ 2 days ago
  ・ 3 more
  
  That's why I have Claude write the code and Codex review.
  
  bdangubic ・ 2 days ago
  ・ 2 more
  
  that’s like having oleg kiselyov’s code reviewed by my middle school daughter :)
  
  stavros ・ 2 days ago
  
  I didn't know your middle school daughter is a genius coder, congratulations!
- hk__2 ・ 2 days ago
  
  Same here; that’s very annoying because it adds a lot of entropy to the code, and people don’t always take the time to clean things up.
bethekidyouwant ・ 2 days ago

[flagged]
Rekindle8090 ・ 2 days ago

The product was performing badly and you thought this would be solved by spending more money on it?
When will people realize this is the same as vendor lock-in?
"Maybe if I spend more money on the max plan it will be better" > no it will be the same "Maybe if I change my prompt it will work" > no it will be the same "Maybe if I try it via this API instead of that API it will improve" > no it will be the same.
Claude, ChatGPT, Gemini etc all of these SOTA models are carefully trained, with platforms carefully designed to get you to pay more for "better" output, or try different things instead of using a different product.
It's to keep you in the ecosystem and keep you exploring. There is a reason you can't see the layers upon layers of scaffolding they have. And there's a reason why after 2 weeks post major update, the model is suddenly "bad" and "frustrating". It's the same reason its done with A/B testing, so when you complain, someone else has no issues, when they complain, you have no issues. It muddies the water intentionally.
None of it is because you're doing anything wrong, it's not a skill issue, it's a careful strategy to extract as much engagement and money from customers as possible. It's the same reason they give people who buy new gun skins in call of duty easier matches in matchmaking for the first couple games.
The only mistake you made was paying MORE, hoping it would get better. It won't, that's not what makes them money. Making people angry and making people waste their time, while others have no issues, and making them explore and try different things for longer so they can show to investors how long people use these AI tools is what makes them money.
When competitors have a better product these issues go away When a new model is released these issues don't exist
I was paying a ton of money for claude, once I stopped and cancelled my subscription entirely, suddenly sonnet 4.6 is performing like opus and I don't have prompts using 10% of my quota in one message despite being the same complexity.
- athorax ・ 2 days ago
  
  Do you realize Claude and Codex are different products by different companies?
  
  ImPostingOnHN ・ 2 days ago
  
  You ask that as if there is some insight to the question, but the insight is hard to find. What the person you replied to is saying, applies to both Claude and Codex.

SkyPuncher ・ 2 days ago

I skimmed the issue. No wonder Anthropic closes these tickets out without much action. That’s just a wall of AI garbage.

Here’s what I’ve done to mostly fix my usage issues:

* Turn on max thinking on every session. It save tokens overall because I’m not correcting it of having it waste energy on dead paths.

* keep active sessions active. It seems like caches are expiring after ~5 minutes (especially during peak usage). When the caches expire it sees like all tokens need to be rebuilt this gets especially bad as token usage goes up.

* compact after 200k tokens as soon as I reasonably can. I have no data but my usage absolutely sky rockets as I get into longer sessions. This is the most frustrating thing because Anthropic forced the 1M model on everyone.

losvedir ・ 2 days ago

Haha. yeah my eyes glazed over immediately on the issue. Absolutely this was someone telling their Claude Code to investigate why they ran out of tokens and open the issue.
Good chance it's not real or misdiagnosed. But it gives me some degree of schadenfreude to see it happening to the Claude Code repo.
- ares623 ・ a day ago
  
  I love the irony of it all. You reap what you sow.
- subscribed ・ a day ago
  
  But they (CC team) confirmed this is the case and intended behaviour, and closed the issue as not planned.
- maerF0x0 ・ 2 days ago
  
  And you think companies aren't doing the same back to us? Are you sure you're speaking to a human?
  
  Jensson ・ 2 days ago
  ・ 2 more
  
  Its your claude speaking to their claude, which is fair, but it makes this whole discussion a bit dumb since we are basically talking about two bots arguing with each other.
  
  maerF0x0 ・ 19 hours ago
  
  This was part of Sam Altman's (supposed) concerns about AI not being open and equally available. It a dystopian future it might be their cluster of 1000 agents using a GWhr of power to argue against your open weights agent who has to run on a M5.
Chaosvex ・ 2 days ago

I love how some comments tell you to turn max thinking on and others tell you to turn thinking off entirely. Apparently, they both save tokens!
Vibes, indeed.
- zmmmmm ・ a day ago
  
  Could be some logic to it - bad thinking is worse than no thinking and good thinking
himata4113 ・ 2 days ago

The problem is actually because their cache invalidates randomly so that's why replaying inputs at 200k+ and above sucks up all usage. This is a bug within their systems that they refuse to acknowledge. My guess is that API clients kick off subscription users cache early which explains this behavior, if so then it's a feature not a bug.
They also silently raised the usage input tokens consume so it's a double whammi.
coderbants ・ 2 days ago

Can’t you turn the 1M off with a /model opus (or /model sonnet)?
At least up until recently the 1M model was separated into /model opus[1M]
- ac29 ・ 2 days ago
  
  1M context window is still a separate, non-default model in Claude Code and not included with subscriptions (billed at API rates only)
  
  SparkyMcUnicorn ・ 2 days ago
  ・ 2 more
  
  Opus[1m] has heen the default model for max subscriptions since 2.1.75.
  https://github.com/anthropics/claude-code/commit/48b1c6c0ba0...
  
  einsteinx2 ・ 16 hours ago
  
  It depends on your account and seems to be random.
  On my personal Max 5x account it’s not default and if I force it, it says I’ll pay API rates past 200k. On my other account that I use for work (not an enterprise account just another regular Max 5x account) the 1M model has been the default since that rollout. I’ve tried updating and reinstalling etc, and I can’t ever get the 1M default model on my personal account.
  Based on other comments and discussion online as well as Claude code repo issues, it seems I’m not the only one not getting the 1M model for whatever reason and the issue continues to be unresolved.
  
  lavezzi ・ 2 days ago
  
  what? Opus 1m has been in place for at least a few weeks for plan users.
- martinp ・ 2 days ago
  
  [dead]
stldev ・ 2 days ago

Can confirm. Max effort helps; limiting context <= ~20-25% is crucial anymore.
> * keep active sessions active. It seems like caches are expiring after ~5 minutes (especially during peak usage). When the caches expire it sees like all tokens need to be rebuilt this gets especially bad as token usage goes up.
Is this as opaque on their end as it sounds, or is there a way to check?
ayhanfuat ・ 2 days ago

> * Turn on max thinking on every session. It save tokens overall because I’m not correcting it of having it waste energy on dead paths.
This is definitely true. Ever since I realized there is an /effort max option I am no longer fighting it that much and wasting hours.
danmaz74 ・ 2 days ago

> This is the most frustrating thing because Anthropic forced the 1M model on everyone.
This is spot on. It would be great (and very easy for them) to have a setting where you can force compaction at a much lower value, eg 300k tokens.
- naasking ・ a day ago
  
  CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000
hartator ・ 2 days ago

Everything starts to feel like AI slop these days. Including this comment.

geeky4qwerty ・ 2 days ago

I'm afraid the music may be slowly fading at this party, and the lights will soon be turned on. We may very well look back on the last couple years as the golden era of subsidized GenAI compute.

For those not in the Google Gemini/Antigravity sphere, over the last month or so that community has been experiencing nothing short of contempt from Google when attempting to address an apparent bait and switch on quota expectations for their pro and ultra customers (myself included). [1]

While I continue to pay for my Google Pro subscription, probably out of some Stockholm Syndrome, beaten wife level loyalty and false hope that it is just a bug and not Google being Google and self-immolating a good product, I have since moved to Kiro for my IDE and Codex for my CLI and am as happy as clam with this new setup.

[1] https://github.com/google-gemini/gemini-cli/issues/24937

dgellow ・ 2 days ago

For what it’s worth, that was pretty obvious from the get go it wasn’t a realistic long term deal. I’ve been building all the libraries I hoped existed over the past 1-2y to have something neat to work with whenever the free compute era ends. I feel that’s the approach that makes sense. Take the free tokens, build everything you would want to exist if you don’t have access to the service anymore. If it goes away you’re back to enjoying writing code by hand but with all the building blocks you dreamt of. If it never goes away, nothing wasted, you still have cool libs
- apgwoz ・ a day ago
  
  Yes! I’ve been trying (and failing!) to get people to understand this. Build the high leverage tools while the tokens are cheap. Unfortunately, I haven’t figured out the right set of high leverage tools. :)
hacker_homie ・ a day ago

Maybe I missed the party, but it feels like it's just starting.
I have only been running local models and we are finally at the point with gemma4 and Qwen3.5 where they can start doing coding work.
And the quota can't change.
- Gareth321 ・ a day ago
  
  I am surprisingly optimistic about local LLMs. Their progress (especially with regards to distillation) over the last year has been remarkable. Qwen 3.5 is amazing for what it is. It think it's production capable - for many use cases, but not all. It does require more careful alignment of instructions, and offers a smaller context (even with very large unified memory). But with some care, one can code all day, every day, without limits. The Mac Mini 64GB is probably sufficient for Qwen 3.5 35B. Go larger for larger contexts.
  Of course it's not as easy as pointing February Opus 4.6 at a folder and giving it one-sentence instructions.
- ainiriand ・ a day ago
  
  The only viable future-proof solution to this hellscape is what you mention, local models and/or corporate models for work.
asdfasgasdgasdg ・ 2 days ago

So, antigravity will definitely quickly eat up your pro quota. You can run out of it in an hour (at least on the $20/mo plan) and then you'll be waiting five days for it to refresh.
However, I've found that the flash quota is much more generous. I have been building a trio drive FOC system for the STM32G474 and basically prompting my way through the process. I have yet to be able to run completely out of flash quota in a given five hour time window. It is definitely completing the work a lot faster than I could do myself -- mainly due to its patience with trying different things to get to the bottom of problems. It's not perfect but it's pretty good. You do often have to pop back in and clean up debris left from debugging or attempts that went nowhere, or prompt the AI to do so, but that's a lot easier than figuring things out in the first place as long as you keep up with it.
I say this as someone who was really skeptical of AI coding until fairly recently. A friend gave me a tutorial last weekend, basically pointing out that you need to instruct the AI to test everything. Getting hardware-in-loop unit tests up and running was a big turning point for productivity on this project. I also self-wired a bunch of the peripherals on my dev board so that the unit tests could pretend to be connected to real external devices.
I think it helps a lot that I've been programming for the last twenty years, so I can sometimes jump in when it looks like the AI is spinning its wheels. But anyway, that's my experience. I'm just using flash and plan mode for everything and not running out of the $20/mo quota, probably getting things done 3x as fast as I could if I were writing everything myself.
alecco ・ 2 days ago

> I'm afraid the music may be slowly fading at this party, and the lights will soon be turned on. We may very well look back on the last couple years as the golden era of subsidized GenAI compute.
Indeed. Anthropic is just leading the pack switching to juicy corporate users who are happy to pay thousands per month per dev and leave the fans behind. And now OpenAI is following suit. They lowered significantly the limits for the Plus $20 plan and answered concerns with vague confusing tweets about promotions.
All this is pushed by the fastest rising demand (Codex growing +50% monthly) while having a serious bottleneck building data centers and getting parts (permits, energy, memory, flash, etc).
Users on reddit and Discord are trying to switch to open models or Chinese alternatives. But there's no real replacement.
- muyuu ・ a day ago
  
  I don't know about users on reddit and discord, but the open models are essentially at SotA with a 3-4 months delay. That puts a hard backstop at what OpenAI and Anthropic can do before I personally can cut them off entirely without losing too much.
  Granted the experience can be worse, esp. if you're using it very hands-off and not like a junior assistant who's extremely fast but doesn't know what he's doing at the architecture and strategy level. But even for that I'm relatively confident the Chinese will be competitive pretty soon, and they won't be too expensive. And we know this because we can see their current models and we know what it takes to run them.
  Currently my Strix Halo computer that costed me under £3k can do a lot of LLM stuff that is perfectly useful. In some ways, it's better than "cloud" models, I have models that essentially don't say "no" and I have relatively predictable setups. If you want to get fancy, you can right now rent compute to run models that are extremely capable like the latest ones from Kimi, GLM, Qwen, Minimax at full size from providers that are not operating at a loss and it won't be too expensive. You can pool resources to do the same locally. You can do stuff that cloud providers are unlikely to market, like distillation and abliteration to serve your specific needs.
  I'm very optimistic about open weights models just the way they are right now.
  But I agree with you that OpenAI will likely play similar games to Anthropic and it could be soon.
- rachel_rig ・ 7 hours ago
  
  [dead]
rr808 ・ 2 days ago

I still remember those $3 uber rides.
1970-01-01 ・ 2 days ago

Lights on = Ads in your output. EOY latest; they can't keep kicking the massive costs down the road.
- fooster ・ 2 days ago
  
  Where is your evidence of this "massive cost"? Inference is massively profitable for both anthropic and openai. Training is not.
  
  kibwen ・ 2 days ago
  ・ 6 more
  
  The evidence is that quotas exist, as seen here, and are low enough that people are hitting them regularly. When was the last time you hit your quota of Google searches? When was the last time you hit your quota of StackOverflow questions? When was the last time you hit your quota of YouTube videos? Any service will rate limit abuse, but if abuse is indistinguishable from regular use from the provider's perspective, that's not a good sign.
  
  jerf ・ 2 days ago
  ・ 2 more
  
  It's also kind of interesting that they don't think they can do what an economy would normally do in this situation, which is raise prices until supply matches. Shortages generally imply mispricing.
  There's a lot of angles you take from that as a starting point and I'm not confident that I fully understand it, so I'll leave it to the reader.
  
  8note ・ a day ago
  
  the sales pitch is that you can keep throwing more and more tokens at a problem to solve it.
  if the prices dont keep going down, the pitch falls apart, that you need a specialist to come in and make it work
  
  caminante ・ 2 days ago
  ・ 3 more
  
  Great point.
  The parent's argument is that the marginal cost of inference is minimal. However, the fundamental flaw is that he's separating inference from the high cost frontier models. It's a cross-subsidy that can't be ignored.
  
  bachmeier ・ 2 days ago
  ・ 2 more
  
  Without any insider knowledge on the economics of these companies, I suspect it's that the amount of infrastructure you have to build is determined by peak usage rather than average usage. If peak usage is much higher for a small part of one day a week (say on Monday morning as software developers across the US get back to work) the cost of fulfilling demand at all times can be insane. That's why companies are implementing batch/standard/priority pricing for the API.
  
  caminante ・ 18 hours ago
  
  Check out this article from today [0].
  It sounds like it's more of a profit maximization function (and not just demand) with GPU rental prices increasing 48% since Feb.
  > Renting one of Nvidia’s most-advanced Blackwell generation of chips for one hour costs $4.08, up 48% from the $2.75 it cost two months ago, according to the Ornn Compute Price Index.
  [0] https://www.wsj.com/tech/ai/ai-is-using-so-much-energy-that-...
  
  weakfish ・ 2 days ago
  ・ 3 more
  
  This article convinced me otherwise https://www.wheresyoured.at/the-subprime-ai-crisis-is-here/
  
  Narciss ・ 2 days ago
  
  This is a great article, thanks for sharing
  
  byzantinegene ・ a day ago
  
  good article!
  
  reppap ・ a day ago
  
  Inference cannot happen without training the model first, so the distinction is quite pointless.
  
  scrollop ・ 2 days ago
  
  The majority of accounts are free - these are profitable?
  IMO they need as many users before their IPO - then the changes will really begin.
  
  quikoa ・ 2 days ago
  
  Inference for API or subscriptions? There is a massive price difference between the two.
  
  ares623 ・ a day ago
  
  You're assuming they can just stop training. For the entirety of these companies' existence, they have done training. It is part of their price. They must keep pushing out better and better models. That's like saying Nvidia can just stop making new GPUs, they're obviously making so much money with their current models now.
  
  wesammikhail ・ 2 days ago
  ・ 8 more
  
  source?
  
  KaoruAoiShiho ・ 2 days ago
  ・ 7 more
  
  After googling https://www.reddit.com/r/singularity/comments/1psesym/openai...
  
  wesammikhail ・ 2 days ago
  ・ 5 more
  
  I've seen sources like this before. It's all hearsay and promo. I was asking for any publicly available verifiable information regarding the cost of inference at scale. I haven't seen any such info personally which is why I asked.
  I'm dying to see S-1 filing for Anthropic or OpenAI. I don't actually think inference is as cheap as people say if you consider the total cost (hardware, energy, capex, etc)
  
  KaoruAoiShiho ・ 2 days ago
  ・ 4 more
  
  Well they're not public yet so you'll have to put up with rumors. But the numbers are available for companies like DeepSeek say they have an 80% profit margin, so it stands to reason OAI etc would do similar numbers considering they charge much more.
  
  caminante ・ 2 days ago
  ・ 3 more
  
  AFAIK,
  1. the 80% margin from 2025 was theoretical,
  2. they're relying on distillation/synthetic data for training,
  3. and have been very opaque about cross-subsidization of R&D with their models.
  The distillation alone adds a big asterisk for comparisons.
  
  KaoruAoiShiho ・ 2 days ago
  ・ 2 more
  
  Talking nonsense.
  
  caminante ・ a day ago
  
  > you'll have to put up with rumors.
  > But the numbers are available for companies like DeepSeek
  You'd rather trust self-reported figures? LMAO
  
  caminante ・ 2 days ago
  
  >OpenAI's compute margin, referring to the share of revenue excluding the costs of running its AI models for paying users
  Huh?
  The reddit summary comment makes no sense. How are they getting revenues without ads or paying customers?
  "After" makes more sense.
  FTA:
  >The company has yet to show a profit and is searching for ways to make money to cover its high computing costs and infrastructure plans.
- jerf ・ 2 days ago
  
  Ads do not pay enough to cover AI usage. People see the big numbers Google and Facebook make in ads and forget to divide the number by the number of people they serve ads to, let alone the number of ads they served to get to that per-user number. You can't pay for 3 cents of inference with .07 cents of revenue.
  You also can't put ads in code completion AIs because the instant you do the utility to me of them at work drops to negative. Guess how much money companies are going to pay for negative-value AIs? Let's just say it won't exactly pay for the AI bubble. A code agent AI puts an ad for, well, anything and the AI accidentally puts it into code that gets served out to a customer and someone's going to sue. The merits of the case won't matter, nor the fact the customer "should have caught it in review", the lawsuit and public reputation hit (how many people here are reading this and salivating at the thought of being able to post an angrygram about AIs being nothing but ad machines?) still cost way too much for the AI companies creating the agents to risk.
  
  willio58 ・ a day ago
  ・ 3 more
  
  Agreed, and the answer is pretty obvious as to how they start making profit. The answer is in this thread, CRANKING the cost up immensely once they establish agreements between the duopoly leaders in the field to do so in tandem and buy up any competition that seeks to challenge them.
  I’m thinking 20x what the cost is now is where they’ll land. It’ll be a massive line item for software dev shops.
  
  byzantinegene ・ a day ago
  ・ 2 more
  
  the problem now is how much can they hike the costs without people just going back to coding by hand.
  
  einsteinx2 ・ 16 hours ago
  
  Or switch to using the way cheaper open weight models from various providers who don’t have to subsidize training costs so can just race to the bottom on inference pricing…
  The quality isn’t really SOTA yet but at some point I assume they’ll be good enough (maybe already are?).
- downrightmike ・ 19 hours ago
  
  Pouring 99% of your funds into massive datacenters you can't actually build and ignore product seems short sighted.
faangguyindia ・ 2 days ago

Ultimately we'll find more efficient techniques and hardware and AI companies will end up owning Nuclear Power Stations and continue providing models capable of 10x of what they are now.
Valuation have already reached point where these companies can run their nuclear power station, fund developement of new hardware and techniques and boost capabilities of their models by 10x
- Root_Denied ・ 2 days ago
  
  There's not enough nuclear to go around, and the approval/permitting process for new nuclear power plants is nothing to sneeze at, both in terms of time and cost.
  That's also ignoring that nuclear power plants also consume quite a bit of water, which may be a more difficult bottleneck in and of itself even without trying to add nuclear into the mix.
- croes ・ 2 days ago
  
  Too bad the models collapse because the lack of nee good training data.
  How many companies will generate profit in the end, what will happen with all those power stations and data centers ?
rzkyif ・ 2 days ago

Fellow annoyed Google AI Pro subscriber here!
Can confirm, I initially enjoyed the 5-hour limits on Gemini CLI and Antigravity so much that I paid for a full year, thinking it was a great decision
In the following months, they significantly cut the 5-hour limits (not sure if it even exists anymore), introduced the unrealistically bad weekly limit that I can fully consume in 1-2 hour, introduced the monthly AI credits system, and added ads to upgrade to Ultra everywhere
At the very least the Gemini mobile app / web app is still kinda useful for project planning and day-to-day use I guess. They also bumped the storage from 2TB to 5TB, but I don't even use that
- stavros ・ 2 days ago
  
  It should be illegal to change the terms of the subscription mid-period. If you paid for the full year, you should get that plan for the whole year. I don't understand how it's ok for corporations to just change the terms mid-way, and we just have to accept it.
  
  bachmeier ・ 2 days ago
  
  > It should be illegal to change the terms of the subscription mid-period
  Unfortunately, at least for those of us in the US, there isn't legally much that can be done. It's simply not possible to make a contract that would obligate a company to fulfill its promises on this type of sale.
  
  bobmcnamara ・ 2 days ago
  ・ 2 more
  
  T&C?
  
  stavros ・ 2 days ago
  
  I'm sure the T&C say something like "you're going to pay us money, and we reserve the right to give you something for it, or maybe nothing, and you should thank us for the privilege".
- logicchains ・ 2 days ago
  
  It's the exact same thing they did with Google BigQuery, which initially was an absolutely amazing piece of technology before they smothered it with more and more limits and restrictions. It's like they're putting SREs first, customers second.
- nprateem ・ 2 days ago
  
  Don't bother upgrading to ultra. It's also now easy to burn all your credits where in Jan it was almost impossible
elephanlemon ・ 2 days ago

IMO we are currently in the ENIAC era of LLMs. Perhaps there will be a brief moment where things get worse, but long term the cost of these things will go way down.
- pbmonster ・ a day ago
  
  I assume the "briefly gets worse" is when a buch of hyperscalers do a complete write-off of their entire AI investments, bankrupting several of them (which, in turn, bankrupts several large banks and most current venture capital firms)?
  Cumulative AI capex will hit $2T this year. Cumulative opex is on the same order. Unless the models get real good (as in: can fully replace many engineers) right quick, nobody is even going to see interest getting paid on those investments. The only alternative is model access costing 5 figures per (replaced) seat.
  But yes, once GPU racks can be had at auction for pennies on the dollar, inference of open source models might be an... OK low margin commodity business.
- pier25 ・ 2 days ago
  
  Cost will probably go down but nobody knows when or how. It might take 10 years for all we know as training costs have only been rising.
  A huge difference is early computers were not subsidized. It took decades until most people could afford to own a computer at home.
- croes ・ 2 days ago
  
  Or we are in the early Netflix era where profit wasn’t as important as customer growth.
palata ・ 2 days ago

> We may very well look back on the last couple years as the golden era of subsidized GenAI compute.
Looks like enshittification on steroids, honestly.
- omosubi ・ 2 days ago
  
  Getting $5000 worth of product essentially free and then being told to pay is not enshittification.
  
  Chaosvex ・ 2 days ago
  
  Another take: perhaps they shouldn't have been pricing it at that point if they weren't capable of actually delivering.
  
  knollimar ・ 2 days ago
  
  It absolutely is. Loss leading is their fault and anticompetitive.
  
  subscribed ・ a day ago
  
  If the vendor says it's worth $200, then it's worth at most $200 unless it's a preclude to the predatory bait&switch or undercutting the normal market.
  
  byzantinegene ・ a day ago
  
  it's not worth $5000 if people are not willing to pay that amount for it
  
  quikoa ・ 2 days ago
  
  The cost for AI companies might be $5000 but the "essentially free" could be close to the limit of what people are willing to spend. If that's the case then enshittification will continue and/or many AI companies will never be profitable.
  
  zzzoom ・ 2 days ago
  
  It's predatory pricing.
  
  tvbusy ・ 2 days ago
  
  We have seen this before. Companies using VC money to take over the market and then increase prices. In the end, we're worse off without these scumbags but some will still sing that we got free service do it's bot enshitification.

comandillos ・ 2 days ago

Quite scared by the fact that the original issue pointing out the actual root cause of the issue has been 'Closed as not planned' by Anthropic.

https://github.com/anthropics/claude-code/issues/46829

hrimfaxi ・ 2 days ago

The response doesn't even make sense and appears to be written by AI.
> The March 6 change makes Claude Code cheaper, not more expensive. 1h TTL for every request could cost more, not less
Feels very AI. > Restore 1h as the default / expose as configurable? 1h everywhere would increase total cost given the request mix, so we're not planning a global toggle.
They won't show a toggle because it will increase costs for some unknown percentage of requests?
- stingraycharles ・ 2 days ago
  
  Sounds like a decision I would make when memory is expensive and you want to get rid of the very long (in time) tail of waiting 1h to evict cache when a session has stopped.
  There must be a better way to do this. The consumer option is the pricing difference. If they’d make cache writes the same price as regular writes, that would solve the whole problem. If you really want to push it, use that pricing only for requests where number of cache hits > 0 (to avoid people setting this flag without intent to use it), and you solved the whole issue.
  
  zozbot234 ・ 2 days ago
  ・ 2 more
  
  Memory is expensive? If reads are as rare as they claim you can just stash the KV-cache on spinning disk.
  
  stingraycharles ・ 2 days ago
  
  Aren’t those latency sensitive though?
sdevonoes ・ 2 days ago

Why scared? Like, if theit software gets bad, we stop using it.
- subscribed ・ a day ago
  
  In my case T&C on using inout/output is so bad in almost Lal the other providers, that I'm forbidden from using them for work (and it doesn't make sense to pay a separate sub if I have basically two at this point, one direct with Anthropic, one via github.com copilot).
- comandillos ・ 2 days ago
  
  Maybe scared wasn't the best word... but we cannot deny Opus is a great - if not greatest - model at coding and Anthropic is the only one serving it a reasonable prices when going through their subscription model.
  
  byzantinegene ・ a day ago
  
  how have you coded before the era of llms?
  
  sdevonoes ・ 2 days ago
  
  Sounds like an addiction to me
  
  cmrdporcupine ・ 2 days ago
  
  I mean this is blatantly false. Codex just rolled out a $100 a month plan with higher usage and lower quotas than Claude and GPT 5.4 is more capable than Opus 4.6. At least for the systems work I do.
  And if you can't stomach OpenAI, GLM 5.1 is actually quite competent. About Opus 4.5 / GPT 5.2 quality.
rvz ・ 2 days ago

When a casino is making a lot of money from gamblers, they don't care about their customers losing money, given the machines are rigged against you.
Anthropic sells you 'knowledge' in the form of 'tokens' and you spend money rolling the dice, spinning the roulette wheels and inserting coins for another try. They later add limits and dumb down the model (which are their gambling machines) of their knowledge for you to pay for the wrong answers.
Once you hit your limit or Anthropic changes the usage limits, they don't care and halt your usage for a while.
If you don't like any of that, just save your money and use local LLMs instead.

oldnewthing ・ 2 days ago

If this helps, I rolled back to version 2.1.34. Here is the ~/.claude/settings.json blurb I added:

  "effortLevel": "high",
  "autoUpdatesChannel": "stable",
  "minimumVersion": "2.1.34",
  "env": {
      "DISABLE_AUTOUPDATER": 1,
      "CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING": 1
  }

I also had to:

1. Nuke all other versions within /.local/share/claude/versions/ except 2.1.34. 2. Link ~/.local/bin/claude to claude -> ~/.local/share/claude/versions/2.1.34

This seems to have fixed my running out of quota issues quickly problems. I have periods of intense use (nights, weekends) and no use (day job). Before these changes, I was running out of quota rather quickly. I am on the same 100$ plan.

I am not sure adaptive thinking setting is relevant for this version but in the future that will help once they fix all the quota & cache issues. Seriously thinking about switching to Codex though. Gemini is far behind from what I have tried so far.

oldnewthing ・ 2 days ago

I also have the following in ~/.bashrc
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 export MAX_THINKING_TOKENS=31999 export DISABLE_AUTOUPDATER=1 export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1

WarmWash ・ 2 days ago

I did my (out of the ordinary) taxes this year using agents, kind of as an experiment and kind of to save ~$750. Opus 4.6 max in CC, 5.4 xhigh in codex, and 3.1 high in antigravity. All on the $20/mo plans.

I have a day job, a side business, actively trade shares options and futures, and have a few energy credit items.

All were given the same copied folder containing all the needed documents to compose the return, and all were given the same prompt. My goal was that if all three agreed, I could then go through it pretty confidently and fill out the actual submission forms myself.

5.4 nailed it on the first shot. Took about 12 minutes.

3.1 missed one value, because it decided to only load the first 5 pages of a 30 page document. Surprisingly it only took about 2 minutes to complete though. A second prompt and ~10 seconds corrected it. GPT and Gemini now were perfectly aligned with outputs.

4.6 hit my usage limit before finishing after running for ~10 minutes. I returned the next day to have it finish. It ran for another 5 minutes or so before finishing. There were multiple errors and the final tax burden was a few thousand off. On a second prompt asking to check for errors in the problem areas, it was able to output matching values after a couple more minutes.

For my first time using CC and 4.6 (outside of some programming in AG), I am pretty underwhelmed given the incessant hype.

toddmorey ・ 2 days ago

My taxes are rather complex, so I ran the same exercise to see if Claude agreed with my accountant. An automated second opinion, so to speak. Spent about 6 minutes analyzing all the PDFs and basically nailed it perfectly in one shot.
My only point here is it sure seems the same activity / use case can have wildly different results across sessions or users. Customer support and product development in the age of non-deterministic software is a strange, strange beast.
- ozozozd ・ 2 days ago
  
  What does nailing mean when you ask whether it agreed with your accountant?
  
  toddmorey ・ 2 days ago
  ・ 2 more
  
  Given the same inputs but not provided the results (output) from our accountant, did it come to the same conclusions or have good analysis as to why it differed?
  Obviously, accounting is "spreadsheet math" intensive, so Claude wrote some python scripts for that which kept the math very stable. But there were some complex nuances that had taken the accountant and I quite a bit of work to track down and clarify. Claude quickly had a very accurate read on the situation and knew all the right clarifying questions.
  I'm not yet ready to ever sign a return that's been entirely AI prepared, but I left the exercise pretty impressed.
  
  croes ・ a day ago
  
  Which AI does your accountant use?
gverrilla ・ 21 hours ago

Did the prompt include "do research" or equivalent?
- WarmWash ・ 20 hours ago
  
  No, the prompt was basically
  Hey model, I need your help to complete my (federal/state) tax return for 2025. My tax situation looks like (list of job, personal business, stock trading, credits). I have included all the applicable tax forms in the folder, as well as a spreadsheet of my business's general ledger for the year.

jameson ・ a day ago

I'm noticing a fair number of degradation of Claude infrastructure recently and makes me wonder why they can't use Claude to identify or fix these issues in advance?

It seems a counter intuitive to Anthropic's message that Claude uncovered bugs in open source project*.

[*] https://www.anthropic.com/news/mozilla-firefox-security

alex_duf ・ a day ago

timing wise this seems to match the Claude Mythos story.
So maybe they're trying to free-up some GPU capacity to run audit of projects in need? I'm assuming Mythos is not cheap to run.
The cache TTL story is also probably link to the RAM price going up like mad so they're trying to save on future expenditure here maybe?
I do understand why people are pissed though

wg0 ・ 2 days ago

Been experiencing similar issues even with the lower tier models.

Fair transactions involve fair and transparent measurements of goods exchanged. I'm going to cancel my subscription this month.

eastbound ・ 2 days ago

Yes: Claude Code “consumes tokens” and starts a session when the computer is asleep without anything started. Or consumes 10% of my session for “What time is it?”
- j45 ・ 2 days ago
  
  During run the desktop app.
  Running non deterministic software for deterministic tasks is still an area for efficiency to improve.
TacticalCoder ・ 2 days ago

[dead]

delbronski ・ 2 days ago

Ever since this change they announced:

https://www.reddit.com/r/ClaudeAI/comments/1s4idaq/update_on...

It’s been unusable for me as my daily coding agent. I run out of credits in the pro account in an hour or so. Before that I had never reached the session limit. Switched back to Junie with Gemini/chatgpt.

meetingthrower ・ 2 days ago

I don't get it. Last week on the 100 bucks plan I generated probably 50k LOC (not a quality measure for sure!) and just barely kissed the weekly limit. I did get rate limited on some sessions for sure, but that's to be expected.

I'm curious what are people doing that is consuming your limits? I can't imagine filling the $200 a month plan unless I was essentially using Claude code itself as the api to mass process stuff? For basic coding what are people doing?

SkyPuncher ・ 2 days ago

This is the problem most people are facing. Before March, I had hit the rate limit as single time. That involved security audit of our entire code base from a few different angles.
As of now, I’m consistently hitting my 5 hour limit in less than 1 hour during N/A business hours. I’m getting to the point where I basically can’t use CC for work unless I work very early or late in the day.
dgellow ・ 2 days ago

I have the same experience as you. I’m wondering if it is regional? I’m in Europe so don’t overlap much with US usage, which is likely to be way higher
- danielbln ・ 2 days ago
  
  Also in Europe and can only agree. Granted I'm on the 20x plan, but I have yet to hit a limit once and I'm using Claude 12h+ per day on multiple projects.
fluidcruft ・ 2 days ago

I don't hit limits either on $100, it's more that claude-code seems to be constantly broken and they added some vague bullshit about not using claude-code before 2pm so I just don't expect it to work anymore and tend to use codex-cli as my driver nowadays. I also never hit limits in codex but... codex is $20/mo not $100/mo so it's making me consider relocating the $100 I spend to Anthropic as play money for z.ai and other tools. I think claude-code has great training wheels (codex does not) but once the training wheels come off, and claude-code becomes as unreliable as it has been then it makes you consider alternatives.
- hgoel ・ 2 days ago
  
  They had a 2x usage cap promotion that was active after 2pm. They weren't saying that you shouldn't use Claude before 2pm.
  
  fluidcruft ・ 2 days ago
  ・ 5 more
  
  Is that true? What I saw was an official announcement linked on the claude code subreddit that said that if you claude code within the high-demand times using a subscription account, then you will now burn through your usage faster than previously. They did have a promotion as a carrot but the stick is the stick.
  
  hgoel ・ 2 days ago
  ・ 4 more
  
  https://support.claude.com/en/articles/14063676-claude-march...
  "Usage remains unchanged" between 8am and 2pm.
  I feel the Claude subreddits are mostly full of speculation and dramatics, not much productive discussion, like endless exaggerated complaining about downtime. Pretty much the same as a pretty significant chunk of reddit nowadays.
  Edit: the rumor was probably stemming from this https://www.theregister.com/2026/03/26/anthropic_tweaks_usag...
  It does look pretty bad, especially not announcing it on a primary channel, but also they claim it's balanced out by efficiency gains and would affect 7% of users overall and 2% of 20x users.
  
  fluidcruft ・ 2 days ago
  ・ 3 more
  
  https://www.reddit.com/r/Anthropic/comments/1s4iefu/update_o...
  is an official post by Anthropic.
  > Your weekly limits remain unchanged. During peak hours (weekdays, 5am–11am PT / 1pm–7pm GMT), you'll move through your 5-hour session limits faster than before. Overall weekly limits stay the same, just how they're distributed across the week is changing.
  I'm Eastern time and peak usage works out as 8am-2pm (the bulk of my work day). It's nice that Europe gets to use it in the morning and Pacific gets to use it in the afternoon, but this is completely bullshit and infuriating. I would have no problem if it were 2x outside peak but that's NOT what they're saying.
  
  hgoel ・ 2 days ago
  ・ 2 more
  
  Yeah I hadn't been aware of that change previously. I'm also ET, but perhaps I just don't use it enough to hit the limits. They could definitely do to be more transparent, maybe instead of percentages, show a "credit" allocation such that the time-based variation in 5-hour windows is visible.
  
  fluidcruft ・ 2 days ago
  
  I am also not a heavy user, it's just so frustrating because nothing is ever defined. Something is always under one promotion or another. Abd the vagueness about weekly limits remain unchanged is they refuse to clarify whether that means tokens or number of 5hr blocks per week (that now get used faster during peak).
ryandrake ・ 2 days ago

Yea, I found myself maxing out the $20/mo plan occasionally, so I tried the $100/mo, but I don't think I even once even approached the session limit, let alone the weekly limit. And this is doing what I would consider heavy, continuous programming. I probably ought to go back down to $20 one. It would be nice if they had a cheaper tier in between them, but the tiers they have are probably a good business trick to get people to buy much more than they need.
- hk__2 ・ 2 days ago
  
  I’m on the $20/mo plan right now and I hit the limit in under an hour, sometimes 20-30 minutes. I don’t understand how people can work with this plan; maybe it was better before?
  
  Jensson ・ 2 days ago
  
  Probably depends on the size of your codebase and how AI friendly it is.
bmurphy1976 ・ 2 days ago

Anthropic is going through major growing pains, both technical and organizational. The left hand doesn't know what the right hand is doing. It's chaos, things are changing too quickly, and us users are getting caught in the middle of it.
Think Twitter's fail-whale problems. Sometimes you are lucky, sometimes you aren't. Why? We won't know until Anthropic figures it out and from the outside it sure looks like they're struggling.
rafaelmn ・ 2 days ago

I'm in the same camp but I mostly do backed. My coworker doing frontend is chewing through rate limits consistently. React code is quite logic shallow, stuff gets pulled in all over so not localized, especially when you start using js styling frameworks - hundreds of k of tokens to do simple changes.
If you start to parallelize and you have permission prompts on you're likely missing cache windows as well.
wellthisisgreat ・ 2 days ago

$200 plan and VERY tame usage (not 24/7, not every day even, maybe 8-10 hours for ~4 days). Suddenly I am at 96% weekly (!) limit, multiple session limits, two daily limits.
Either they decimated the limits internally, or they broke something.
Tried all the third-party tricks (headroom, etc.), switched to 200k context window, switched back to 4.5.
I hope 4.5 will help, but the rest of the efforts didn’t move the needle much
freedomben ・ 2 days ago

What does it look like when you get rate limited? Does the instance just kind of sit and spin?
I suspect I was getting rate limited very aggressively on Thursday last week. It honestly infuriated me, because I'm paying $200 a month for this thing. If it's going to rate limit me, at least tell me what it's doing instead of just making it seem like it's taking 12 hours to run through something that I would expect to be 15 minutes. The worst part is that it never even finished it.
- SkyPuncher ・ 2 days ago
  
  I’ve never been actually rate limited. Usage limits display in yellow when you’re above 90%. At the limit, you’ll get a red error message.
- gedy ・ 2 days ago
  
  > because I'm paying $200 a month for this thing.
  My gut feeling is this is not enough money for them by far (not to mention their investors), and we'll eventually get ratcheted up inline with dev salaries. E.g. "look how many devs you didn't have to hire", etc.
prmoustache ・ 2 days ago

Looks like the enshittification simply started much quicker than other disruptive techs due to operating costs.

pxc ・ 2 days ago

It's a bit shocking to me how opaque the pricing for the subscription services by the frontier labs is. It's basically impossible for people to tell what they're actually buying, and difficult to even meaningfully report or compare experiences.

How is this normal?

parasti ・ 2 days ago

Yeah, I cancelled the moment I realized that the subscription is a scheme to get you to constantly dip into extra usage. I get more benefit out of Claude on the free tier than on Pro.
- pxc ・ 2 days ago
  
  I think the right way to think of it, from a self-protective perspective, is this: the real offer is the per-token pricing. Use that for a while, and iff you are consistently spending more than $20/mo, treat the subscription offering as a discount on some of that usage. So only on that condition, try the subscription and see if your monthly costs go down (because of the short term rate limits, they may not, depending on your usage patterns!).
  But the opacity itself is a bit offensive to me. It feels shady somehow.
gverrilla ・ 21 hours ago

Free market.

tedivm ・ 2 days ago

Something similar is happening with GitHub Copilot too. It's impossible to know what a "request" is and some change in the last couple of months has seen my request usage go up for the same style of work. Toss in the bizarre and impossible to understand rate limiting that occurs with regular usage and it's pretty obvious that these companies are struggle to scale.

rnadomvirlabe ・ 2 days ago

I find copilot to be much more straightforward, and I can track per request against my credits. Here is the explanation of what a request is:
https://docs.github.com/en/copilot/concepts/billing/copilot-...
- tedivm ・ 2 days ago
  
  > A request is any interaction where you ask Copilot to do something for you—whether it's generating code, answering a question, or helping you through an extension. Each time you send a prompt in a chat window or trigger a response from Copilot, you're making a request. For agentic features, only the prompts you send count as premium requests; actions Copilot takes autonomously to complete your task, such as tool calls, do not. For example, using /plan in Copilot CLI counts as one premium request, and any follow-up prompt you send counts as another.
  This clearly isn't true for agentic mode though. This document is extremely misleading. VSCode has the `chat.agent.maxRequests` option which lets you define how many requests an agent can use before it asks if you want to continue iterating, and the default is not one. A long running session (say, implementing an openspec proposal) can easily eat through dozens of requests. I have a prompt that I use for security scanning and with a single input/request (`/prompt`) it will use anywhere between 17 and 25 premium requests without any user input.
  
  0xffff2 ・ 2 days ago
  
  Do you have any evidence to support your claims? I keep a pretty close eye on my usage and have never seen it deviate from "1x/3x requests per time I hit enter". Is there a reproducible scenario I can try that will charge multiple requests for a single prompt?
alienbaby ・ 2 days ago

I'm finding the oppostire with copilot. A request is a prompt, with some caveats around whats generating the prompt. I am quite happily working with opus 4.6 at 3x cost and about 1/3 oor the month in I'm stting at ~25% usage of a pro+ subscription. I find it quite easy to track my usage and rate of usage.
The overall context windows are smaller with copilot I believe, but it dfoesnt appear to be hurting my work.
I'm using it for approx 4 hours a day most days. Generally one shotting fun ideas I thoroughly plan out in planning mode first, and I have my own verison of the idea->plan->analyse-> document implementation phases -> implement via agent loop. simulations, games, stuff-im-curious about and resurrecting old projects that never really got off the ground.

MeetingsBrowser ・ 2 days ago

I pay for the lowest plan. I used to struggle to hit my quota.

Now a single question consistently uses around 15% of my quota

hirako2000 ・ 2 days ago

My take is that was the plan all along.
Once people won't be able to think anymore and business expect the level of productivity witnessed before, will have no choice but cough up whatever providers bill us.
- Cpoll ・ 2 days ago
  
  Didn't they move too soon then? People haven't forgotten how to tie their shoelaces (yet). And anyway, they'll just move to a different model; last holdout wins.
  
  gedy ・ 2 days ago
  
  They probably don't have much choice with burn rates and investors, tbh. Market is shaky, etc.
  
  hirako2000 ・ 2 days ago
  
  Too abruptly for sure.
- gruez ・ 2 days ago
  
  >and business expect the level of productivity witnessed before, will have no choice but cough up whatever providers bill us.
  Is that bad? After all, even if they hiked to price infinity, you wouldn't worse off than if AI didn't exist because you could still code by hand. Moreover if it's really in a "business" (employment?) context, the tools should be provided by your employer, not least for compliance/security reasons. The "expectation" angle doesn't make sense either. If it's actually more efficient than coding by hand, people will eventually adopt it, word will get around and expectations will rise irrespective of whether you used it or not.
  
  maerF0x0 ・ 2 days ago
  
  The insidious part is the thought that if you spend your limited learning and recall on AI Tools, then you wont be able to "still code by hand" because you'll have lost the skill, then there will be a local minima to cross to get back to human level productivity. Of course you'll get PIPed before you get back to full capacity.
  
  hirako2000 ・ 2 days ago
  
  This comment reads as trying on principle to defend the use of AI.
  My argument was not about AI. Rather about the practice of Anthropic and the likes.
  
  ImPostingOnHN ・ 2 days ago
  ・ 3 more
  
  > if they hiked to price infinity, you wouldn't worse off than if AI didn't exist because you could still code by hand
  This was addressed by the words that you perhaps mistakenly omitted from your quote:
  > Once people won't be able to think anymore...
  People who aren't able to think anymore, can't still code by hand. Think "Idiocracy".
  
  gruez ・ 2 days ago
  ・ 2 more
  
  >People who aren't able to think anymore,
  OpenAI and Anthropic have been getting stingy with their plans and it's only it's been what, 1 year, maybe 2 since vibecoding was widely used in a professional context (ie. not just hacking together a MVP for a SaaS side hustle in a weekend)? I doubt people are going to lose their ability to think in that timespan.
  
  bleepblap ・ a day ago
  
  I think you're 100% correct that people won't lose the ability. There's a scary thing I see as a person who works with and recruits students and fresh graduates -- they might not have spent the time to get the skills in the first place.
- SkyPuncher ・ 2 days ago
  
  And it’s working larger because the other models haven’t figured out how to provide a consistent, long running experience.
- hdndjsbbs ・ 2 days ago
  
  "enshittification" gets thrown around a lot, but this is the exact playbook. Look at the previous bubble's cash cow: advertising.
  Online advertising is now ubiquitous, terrible, and mandatory for anyone who wants to do e-commerce. You can't run a mass-market online business without buying Adwords, Instagram Ads, etc.
  AI will be ubiquitous, and then it will get worse and more expensive. But we will be unable to return to the prior status quo.
  
  chasebank ・ 2 days ago
  ・ 3 more
  
  But why would they make the product shittier and not just more expensive? A lot of the complaints have been the model getting lost and going rogue.
  
  hdndjsbbs ・ a day ago
  
  Why isn't there a premium, ad-free Google Search (or Facebook, or Instagram)? Because the most valuable customers (with the most money) self-select out of seeing ads. It would collapse the 2-sided market and create a race to the bottom. There is a dollar amount of advertising revenue per customer, but as John Wannamaker said - "Half the money I spend on advertising is wasted, but I don't know which half".
  If the AI companies made their pricing "pay as you go" without quotas, a few insane zealots (power users) would occupy all the capacity and choke everyone else out. Regardless of the cost, the AI providers would lose the ubiquity they currently enjoy, and become a niche tool for rich tech people. They would rather be a mile wide and an inch deep, doing a worse job serving millions of users, because there's a better scaling narrative for legislating and fundraising that way. Like the advertisers there are intolerable indirect effects of letting valuable "power users" spend more money to get a better experience.
  
  ImPostingOnHN ・ 2 days ago
  
  Because sometimes you can make more money by reducing costs and making something shittier (especially if you do it covertly), compared to increasing prices.
  I suspect more customers are lost a lot faster when you increase prices, compared to enshittifying the product. It's also a lot more directly attributable to an action, and thus easier for an executive to be blamed if they choose the former over the latter.
  
  hirako2000 ・ 2 days ago
  ・ 2 more
  
  The odds of that happening are high. Trillions invested.
  It occurred to me an outright rejections of these tools is brewing but can't quite materialise yet.
  
  rimliu ・ 21 hours ago
  
  Trillions promised.
onlyrealcuzzo ・ 2 days ago

What plan are you using?
- hk__2 ・ 2 days ago
  
  > What plan are you using?
  OP wrote "I pay for the lowest plan", so that’s the $20/mo one.
szmarczak ・ 2 days ago

I'm on the Free tier using Claude exclusively for consultation (send third party codebase + ask why/where is something done). I also used to struggle to hit limits. Recently I was able hit the limit after a single prompt.

themantalope ・ 2 days ago

I’ve switched to open code and openrouter.

I only did the $20/month subscription since 9/2025

It was great for about 5 months, amazing in fact. I under utilized it.

For the past month, it’s basically unusable, both Claude code and just Claude chat. 1-2 prompts and I’m out. Last week I prob sent a total of 15 messages to Claude and was out of daily and weekly usage each day.

I get that the $20/month subscription isn’t a money maker for them, and they probably lose money. But the experience of using Claude has been ruined

jxmesth ・ a day ago

I'm curious if people are going to be switching to something else. OpenAI perhaps?

monological ・ 6 hours ago

My usage limits were reset this morning. I'm already 90% through my weekly limits. This have _never_ happened before. They should reset the limits for everyone.

cmaster11 ・ 2 days ago

For whoever else is having the same problems, worth voting these kind of issues. There needs to be more transparency over what goes on with our subscriptions.

TacticalCoder ・ 2 days ago

We vote here on HN and it's much more effective. Anyone from Anthropic reading conversations on HN like this one can be scared. We'll jump ship if they don't address such glaring issues.
- scrollop ・ 2 days ago
  
  There are MANY accounts of claude degradation (intelligence, limits) over the past week on reddit and here with many posts describing people moving. Nothing is changing. You'd think they'd at least give a statement.
  
  HauntingPin ・ 2 days ago
  
  People need to start cancelling their subscriptions. That's the only language these companies understand.
- GorbachevyChase ・ 2 days ago
  
  The nice iOS app is a big convenience for me, but I’m starting to think I should just put my $20 in Open Router. It seems like minimax is a pretty solid competitor. I’m curious if the US-centric “frontier” is just marketing.
  
  dividedcomet ・ 2 days ago
  
  imo that’s what I’m doing. Trialing the Hermes harness since I can hook it up to signal. StepFun 3.5 Flash for general assistant stuff and Kimi/Minimax for software development

GodelNumbering ・ 2 days ago

In the anticipation of a future where,

a) quotas will get restricted

b) the subscription plan prices will go up

c) all LLMs will become good enough at coding tasks

I just open sourced a coding agent https://github.com/dirac-run/dirac

The entire goal is to be token efficient (over 50% cheaper), and by extension, take advantage of LLM's better reasoning at shorter context lengths

This really started as an internal side project that made me more productive, I hope it will help others too. Apache 2.0

Currently it still can't compete the subsidized coding plan rates using Anthropic API pricing though (even though it beats CC while both use API key), which tells me that all subscription plan operators are losing money on such plans

hgoel ・ 2 days ago

I've experienced none of the problems I've seen people complaining about here (5x plan), Claude has been working pretty well and I've been using it constantly without exhausting any of my quotas.

Yet, there must obviously be something different for so many people to be reporting these issues.

I feel for the Anthropic devs that have to deal with this, having to figure out what setup everyone has, what their usage patterns are to filter out the valid reports, and then also deal with the backlash from people that were just pulling obvious footguns like having a ton of skills/MCPs polluting their context window.

gverrilla ・ 21 hours ago

are you on high effort too?
what continent?

weavie ・ 2 days ago

How good are local LLMs at coding these days? Does anyone have any recommendations for how to get this setup? What would the minimum spend be for usable hardware?

I am getting bored of having to plan my weekends around quota limit reset times...

throwaway2027 ・ 2 days ago

Some claim that some of the recent smaller local models are as good as Sonnet 4.5 of last year and the bigger high-end models can be as almost as good as Claude, Gemini and Codex today, but some say they're benchmaxed and not representative.
To try things out you can use llama.cpp with Vulkan or even CPU and a small model like Gemma 4 26B-A4B or Gemma 4 31B or Qwen 3.5 35-A3B or Qwen3.5 27B. Some of the smaller quants fit within 16GB of GPU memory. The default people usually go with now is Q4_K_XL, a 4-bit quant for decent performance and size.
https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF
https://huggingface.co/unsloth/gemma-4-31B-it-GGUF
https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF
https://huggingface.co/unsloth/Qwen3.5-27B-GGUF
Get a second hand 3090/4090 or buy a new Intel Arc Pro B70. Use MoE models and offload to RAM for best bang for your buck. For speed try to find a model that fits entirely within VRAM. If you want to use multiple GPUs you might want to switch to vLLM or something else.
You can try any of the following models:
High-end: GLM 5.1, MiniMax 2.7
Medium: Gemma 4, Qwen 3.5
https://unsloth.ai/docs/models/minimax-m27
https://unsloth.ai/docs/models/glm-5.1
https://unsloth.ai/docs/models/gemma-4
https://unsloth.ai/docs/models/qwen3.5
https://github.com/ggml-org/llama.cpp
- weavie ・ 2 days ago
  
  Thank you, I'll look into it. For someone who is used to just working with second hand thinkpads, this stuff gets expensive fast!
ac29 ・ 2 days ago

The very best open models are maybe 3-12 months behind the frontier and are large enough that you need $10k+ of hardware to run them, and a lot more to run them performantly. ROI here is going to be deeply negative vs just using the same models via API or subscription.
You can run smaller models on much more modest hardware but they aren't yet useful for anything more than trivial coding tasks. Performance also really falls off a cliff the deeper you get into the context window, which is extra painful with thinking models in agentic use cases (lots of tokens generated).
- rexreed ・ a day ago
  
  You can also run these models on the cloud with Ollama. You might say what's the difference, but these are models whose performance will stay consistent over time, whether run locally or in the cloud. For $200 a year I'm getting some pretty fantastic results running GLM 5.1 and even Minimax 2.7 and Kimi 2.5 and Gemma 4 on Ollama's cloud instances. And if you don't like Ollama's cloud instance, you can run it on your own cloud instance from the very same providers that Ollama is using. They use NVIDIA cloud providers (NCPs) although not sure which ones specifically and claims that the "cloud does not retain your data to ensure privacy and security." [https://ollama.com/blog/cloud-models]
  
  weavie ・ 21 hours ago
  
  Interesting. On the pricing page, there are still limits placed on the usage. How restrictive have you found them?
- weavie ・ 2 days ago
  
  What are the best open models?
  
  mathieudombrock ・ a day ago
  
  This includes open and closed models ranked by popularity and other metrics.
  https://openrouter.ai/rankings
  
  mathieudombrock ・ a day ago
  
  This includes open and closed models ranked by popularity and other metrics.
  https://openrouter.ai/rankings

zkmon ・ 2 days ago

Unless the agent code is open-sourced, there is hardly any transparency in how the agent is spending your tokens and how does it calculate the tokens. It's like asking your lawyer why they charged some amount.

gessha ・ 2 days ago

Lawyers can give you a breakdown by the minute in some cases. A better example can be military contracting.
Havoc ・ a day ago

You can insert a proxy in between and look at precisely what it is sending if you’re so inclined
CC accepts http endpoints so doesn’t require anything too complicated

Nic0 ・ 2 days ago

I'm i alone to think that it become slower that usual to get responses?

SyneRyder ・ 2 days ago

Anecdotal, but after playing with the API this week (building a minimal harness for an OS where Claude Code isn't supported), the API felt faster to respond. It did seem like maybe the Max subscriptions are lower priority than API requests. (I hadn't enabled priority service on the API either.)
I don't have metrics, so I could be imagining this, or finally noticing extra lag of the Claude Code client. On the other hand, the API was giving me range anxiety, I won't be pushing a 300k context window into that anytime soon, like I occasionally need to do in Claude Code.
kif ・ 2 days ago

Nope. It has become much much slower for me as well. It’s weird cause at times I will get a response very quickly, like it used to be. But most of the time I have to wait quite a bit for the simplest tasks.

nickstinemates ・ 2 days ago

It feels so weird to me - people are exhausting their quotas while I am trying very hard to even reach mine with the $200 plan.

We're generating all of the code for swamp[1] with AI. We review all of that generated code with AI (this is done with the anthropic API.) Every part of our SDLC is pure AI + compute. Many feature requests every day. Bug fixes, etc.

Never hit the quota once. Something weird is definitely going on.

1: https://github.com/systeminit/swamp

brookst ・ 2 days ago

My hypothesis is that people who have continuous sessions that keep the cache valid see the behavior you’re describing: at 95% cache hits (or thereabouts), the max plan goes a long way.
But people who go > 5 minutes between prompts and see no cache, usage is eaten up quickly. Especially passing in hundreds of thousands of tokens of conversation history.
I know my quote goes a lot further when I sit down and keep sessions active, and much less far when I’m distracted and let it sit for 10+ minutes between queries.
It’s a guess. But n=1 and possible confirmation bias noted, it’s what I’m seeing.
- HauntingPin ・ 2 days ago
  
  Why is it our job to micromanage all this when it used to work fine without? Something's clearly changed for the worse. Why are people insisting on pushing the responsibility on paying users?
  
  brookst ・ 2 days ago
  
  Huh? Did you reply to the wrong comment?
- nickstinemates ・ 2 days ago
  
  I run dozens, hundreds? of new sessions every day. I don't have long lived sessions. 1 session = 1 task.
emptysongglass ・ 2 days ago

Man what the hell happened to System Initiative. It was a super weird pivot from sociotechnical proclamations to a tool I honestly have no idea what it does for me? Is it n8n for agents? Is it needed when I have a bunch of skills that approximate whatever swamp is trying to do? Who knows!
- nickstinemates ・ 2 days ago
  
  I can't really speak to the sociotechnical proclamations, because I didn't make them.
  What it does for you is simple: if you want to automate something, it does. Load the AI harness of your choice, tell it what to automate, swamp builds extensions for whatever it needs to to accomplish your task.
  It keeps a perfect memory of everything that was done, manages secrets through vaults (which are themselves extensions it can write) and leaves behind repeatable workflows. People have built all sorts of shit - full vm lifecycle management, homelab setups, manage infrastructure in aws and azure.
  What's also interesting is the way we're building it. I gave a brief description in my initial comment.
  
  emptysongglass ・ 2 days ago
  ・ 2 more
  
  Ah, interesting, thanks! I think you might consider elevating some of that kind of copy.
  The sociotechnical stuff with System Initiative was made by your CEO? The guy who is really into music? And I don't even know how long that product was a thing before the pivot. Not long!
  
  nickstinemates ・ 2 days ago
  
  Does https://swamp.club do a better job?
  System Initiative was a thing for ~6.5 years. I talked to every person who ever used it or was interested in using it in the last 2.5 years. Thousands of them.
  Swamp is better by every metric; has a lot more promise, is a lot more interesting.

yalogin ・ a day ago

So this is trending towards new prices and quotas just like your Netflix pricing. The cost of this infra is high or they have realized they have hit a tipping point in usage and they can raise prices and people will pay, just like Netflix.

0xbadcafebee ・ 2 days ago

Please remember you do not need Anthropic. There are cheaper subscriptions with higher rate limits. Comparison of subscriptions to API: https://codeberg.org/mutablecc/calculate-ai-cost/src/branch/... Score/price comparison: https://benchlm.ai/llm-pricing

Opus is not worth the moat, there are multiple equivalent models, GLM 5.1 and Kimi K2.5 being the open ones, GPT 5.4 and Gemini 3.1 Pro being closed. https://llm-stats.com/ https://artificialanalysis.ai/leaderboards/models https://benchlm.ai/

Even API use (comparatively expensive) can be cheaper than Anthropic subscriptions if you properly use your agents to cache tokens, do context-heavy reading at the beginning of the session, and either keep prompt cache alive or cycle sessions frequently. Create tickets for subagents to do investigative work and use smaller cheaper models for that. Minimize your use of plugins, mcp, and skills.

Use cheaper models to do "non-intelligent" work (tool use, searching, writing docs/summaries) and expensive models for reasoning/problem-solving. Here's an example configuration: https://amirteymoori.com/opencode-multi-agent-setup-speciali... A more advanced one: https://vercel.com/kb/guide/how-i-use-opencode-with-vercel-a...

rachel_rig ・ a day ago

[dead]
onedotnet ・ a day ago

[dead]
Johny_SoxAI ・ a day ago

[dead]

bushido ・ 2 days ago

Tangentially related to some of the issues a lot of people are facing, especially the ones where Claude keeps rechecking/scanning the same files over and over.

Ask claude code to give you all the memories it has about you in the codebase and prune them. There is a very high chance that you have memories in there which are contradicting each other and causing bad behavior. Auto-saved memories are a big source of pollution and need to be pruned regularly. I almost don't let it create any memories at all if I can help it.

Disclaimer: I'm also burning through usage very quickly now - though for different reasons. Less than 48 hours to exhaust an account, where it used to take me 5-6 days with the same workload.

voisin ・ 2 days ago

It is pretty obvious to me that Anthropic wasn’t prepared with sufficient infrastructure to handle the wave of OpenAI/DoD refugees. Now everyone is getting throttled excessively and Claude is essentially unusable beyond chatting. Their big new release of Cowork is even worse than Claude Code for blasting through session limits.

I am tired of all the astroturf articles meant to blame the user with “tips” for using fewer tokens. I never had to (still don’t) think of this with Codex, and there has been a massive, obvious decline between Claude 1 month ago and Claude today.

rzkyif ・ 2 days ago

My personal experience is way different: I struggle to burn through more than 50% of the 5 hour limit

For context, with Google AI Pro, I can burn through the Antigravity weekly limit in 1-2 hours if I force it to use Gemini 3.1 Pro. Meanwhile Gemini 3 Flash is basically unlimited but frequently produces buggy code or fail to implement things how I personally would (felt like it doesn't "think" like a software dev)

I also tried VS Code + Cline + OpenRouter + MiniMax M2.7. It's quite cheap and seems to be better than Gemini 3 Flash, but it gets really pricy as the context fills up because prompt caching is not supported for MiniMax on OpenRouter. The result itself usually needs 3-6 revisions on average so the context fills up pretty often

Eventually I got Claude Max 5x to try for a month. VS Code + Claude Code extension on a ~15k lines codebase, model set to "Default", and effort set to "Max". So far it's been really good: 0-2 revisions on average, and most of the time it implements things exactly how I would or better. And, like I said, I can only consume 40-60% of the 5-hour limits no matter how hard I try

Granted, I'm not forcing it to use Opus like OP (nor do I use complicated skills or launch multiple tasks at the same time), but I feel like they really nailed the right balance of when to use which model and how to pass context between the them. Or at least enough that I haven't felt the need to force it to use Opus all the time

rzkyif ・ 2 days ago

Reading the other negative comments makes me wonder if this is only because I'm getting a hidden newcomer's limit bonus or something though hahah
- muyuu ・ 2 days ago
  
  it may also be local/timezone effects
  it has been reported that it behaves very differently depending on those factors, presumably because people are placed in best-effort buckets, who knows

wolvoleo ・ 2 days ago

Yeah perplexity used to be great but they've also clamped down on the 20€ plan. Only one deep research query was enough to block me until the end of the month.

The thing is, if it's going to be this expensive it's not going to be worth it for me. Then I'll rather do it myself. I'm never going to pay for a €100 subscription, that's insane. It's more than my monthly energy bill.

Maybe from a business standpoint it still makes sense because you can use it to make money, but as a consumer no way.

sailingcode ・ 2 days ago

I had Max plan and never reached its limit despite constantly working. Now I use the Pro plan and regularly reach the 5h limit as well as the weekly limit, as expected. I found that it makes a huge difference if you provide clear context when developing code. If you leave open room for interpretation, Claude Code uses tokens up much faster than in a defined context. The same is true for his time to answer getting longer if there isn't much documentation about the project.

anonfunction ・ 2 days ago

A little off topic, but did Anthropic distill from an older OpenAI model? All the sudden over the last few days I'm getting a ton of em dashes in claude code responses!

mchinen ・ 2 days ago

I've been feeling the squeeze too. I've tried switching between different models as a test, I can at least say it feels like the limits are about half of what they used to be a few months ago. I'd be totally willing to concede that this is just my perception if Anthropic would only release some tools for measuring your usage.

In theory the /stats command tells you how many tokens you've used, which you could use to compute how much you are getting for your subscription, but in practice it doesn't contain any useful info, it may be counting what is printed to the terminal or something - my stats suggest my claude code usage is a tiny amount of tokens, but they must be an extremely underestimated token count, or they are charging much more for the subscription than the API per token (which is not supposed to be the case).

Last week's free extra usage quota shed some light on this. It seems like the reported tokens are probably are between 1/30th to 1/100th of the actual tokens billed, from looking at how they billed (/stats went up 10k tokens and I was billed $7.10). With the API it should be $25 for a million tokens.

railsgirls112 ・ 19 hours ago

The only other pricing data available suggests its the context cache that's eating usage. If you have a big context on a 5 min cache (default) you're essentially sending the whole context every time you take a break from the api. You can configure 1 hr TTL which should help if you run long heavy sessions like me. That's been my theory lately. Still need to get my company admin to let me test lol.

time4tea ・ 21 hours ago

Cancelled today after responses became code soup, skills ignored completely, and in response to a question told me "its A, no thats wrong, its B, no actually i dont know, please look for the answer".

Something materially changed in last 4 weeks.

Also, see made up boosterism about finding security holes everywhere. Its just fanning the flames of the industry worries about all the stupid account take overs.

agrippanux ・ 2 days ago

For me, iterating with Claude begins to degrade at 200k context used, by 350k it’s crossed-fingers time, by 500k it’s essentially useless. Starting a fresh context after 300k is usually the best move imho. I wonder if people are hitting a case where Claude becomes both dumb and increasingly more expensive, essentially a doom loop.

caprock ・ 2 days ago

Roughly agreed. I'm a bit baffled when it seems like someone is having long conversations with multiple tasks and loads of add-ons. I generally have one or two iterations and then a new session.
I'm using another tool, not claude code, but I don't think that matters much.

siliconc0w ・ 2 days ago

Switched back to codex for the promotion. Opus at the start of the year was GOAT- just relentless at chewing through hard problems. Now it spins on pretty easy work (took three swings just to edit a ts file) and my session is like 1-3 prompts (downgraded to the $20 plan but still)

HauntingPin ・ 2 days ago

Had a single prompt the other day where it just tried to examine dependencies that weren't relevant until it hit the rate limit. That was my first prompt of the day. On a task that it was able to do quickly and successfully many times before.

postalcoder ・ 2 days ago

I had used Claude Code max as my daily driver last year and this sort of drama was par for the course. It's why I migrated entirely to Codex, despite liking Claude, the harness, more.

There's this honeymoon period with Claude you experience for a month or two followed by a trough of disillusionment, and then a rebound after a model update (rinse and repeat). It doesn't help that Anthropic is experiencing a vicious compute famine atm.

sleepytimetea ・ 2 days ago

I like the term "compute famine" - it appears that all AI infrastructure is maxed out globally.
cmaster11 ・ 2 days ago

I've been using Code for half a year, these past couple weeks have been a totally different experience I'm on max 20, and seeing my weekly quota going bust in ~3 days is a bit absurd when nothing has significantly changed in the way I work
- SkyPuncher ・ 2 days ago
  
  This is my exact experience as well.
  It’s further frustrating that I have committed to certain project deadlines knowing that I’d be able to complete it in X amount of time with agent tooling. That agentic tooling is no longer viable and I’m scrambling to readjust expectations and how much I can commit to.
- hirako2000 ・ 2 days ago
  
  I refuse to use anthropic's models (and openai, gemini) because the math simply doesn't add up.
  To add the fact we are being taken for fools with dramatic announcements, FOMOs messages. I even suspect some reaction farms are going on to boost post from people boasting Claude models.
  These don't happen for codex. Nor for mistral. Nor for deepseek. It can't just be that Claude code is so much better.
  There are open weight models that work perfectly fine for most cases, at a fraction of the cost. Why are more people not talking about those. Manipulation.
  
  throwaway2027 ・ 2 days ago
  ・ 2 more
  
  Mistral isn't that great. Deepseek was good when they first had thinking. But most people just try something out and if that doesn't work on that model then it's bad and for Claude and Codex and Gemini they just are that much better now, but if they quantize or cut limits they destabilize and you're right you might as well just use something worse but reliable.
  
  hirako2000 ・ 2 days ago
  
  I regularly compare models. You are right Deepseek was more impressive when the latest came out. But since then they accepted to slow down throughout and keep the same quality.
  I often compare with Gemini. Sure those Google servers are super fast. But I can't see it better. Qwen and deepseek simply work better for me.
  Haven't tested Mistral in a while, you may be right.
  People try out and feel comfortable: using U.S models (I can see the logic), but mostly for brand recognition. Anthropic and OpenAi are the best aren't they? When the models jam they blame themselves.
- wellthisisgreat ・ 2 days ago
  
  Same exact experience. I never expected to depleted a weekly quota when not working every day of the week.
HauntingPin ・ 2 days ago

This past week was a nightmare in trying to get Claude to do any useful work. I've cancelled my subscription and everybody else here having problems should too. I don't think Anthropic cares about anything else.

bob1029 ・ 2 days ago

I've got a dual path system to keep costs low and avoid TOS violations.

For general queries and investigation I will use whatever public/free model is available without being logged in. Not having a bunch of prior state stacked up all the time is a feature for me. This is essentially my google replacement.

For very specific technical work against code files, I use prepaid OAI tokens in VS copilot as a "custom" model (it's just gpt5.4).

I burn through maybe $30 worth of tokens per month with this approach. A big advantage of prepaying for the API tokens is that I can look at everything copilot is doing in my usage logs. If I use the precanned coding agent products, the prompts are all hidden in another layer of black box.

danbots ・ 2 days ago

Codex can feel standoffish at times. I can tell very quickly we wont become friends. The personality feels like an employee in another department that while gifted- is merely lending you a slice of their clearly precious time. I get the impression from codex that **gives me the feeling that I am wasting it’s time. That it will help me but deep down- it dos not want to, it does not care if we succeed toether. What I am saying, frinds, is that when I use codex and iterate, I get the impression that Codex does not like me, that deep down it truly does not want to help me, that it has better things to do.

On the flip side- Using Opus with a baby billy freeman persona has never been more entertaining.

peyton ・ 2 days ago

I prompt it and check CI later. I couldn’t tell you how Codex feels. I’ve never had any conversation. You may want to try this sort of workflow if you’re affected personally in a negative way.

danbots ・ 2 days ago

Codex can feel standoffish at times. I can tell very quickly we wont become friends. The personality feels like an employee in another department that while gifted- is merely lending you a slice of their clearly precious time. I get the impression from codex that *gives me the feeling that I am wasting it’s time. That it will help me but deep down- it dos not want to, it does not care if we succeed toether. What I am saying, frinds, is that when I use codex and iterate, I get the impression that Codex does not like me, that deep down it truly does not want to help.

For something I spend all my time using- I’d rather iterate with Claude. The personality makes a big difference to me.

cmrdporcupine ・ 2 days ago

I don't care about "personality" I want quality.
Honestly when I get codex to review the work that Claude does (my own or my coworker's) it consistently finds terrible terrible bugs, usually missing error handling / negative conditions, or full on race conditions in critical paths.
I don't trust code written by Claude in a production environment.
All AI code needs review by human, and often by other AIs, but Opus 4.6 is the worst. It's way too "yeet"
The opus models are for building prototypes, not production software.
GPT 5.4 in codex is also way more efficient with tokens or budget. I can get a lot more done with it.
I don't like giving money to sama, but I hate bugs even more.

mridulmalpani ・ 2 days ago

I extensively used Claude till now and just tested Genini 3.1 pro yesterday via AI studio. In gemini cli, they don't offer this, i don't know why?

Taking a second opinion has significantly helped me to design the system better, and it helped me to uncover my own and Claude blindspots.

Also, agree that, it spent and waist a lot of token on web search and many a times get stuck in loop.

Going forward- i will always use all 3 of them. Still my main coding agent is Claude for now.. but happy to see this field evolving so fast and it's easy to switch and use others on same project.

No network effects or lock in for a customer. Great to live in this period of time.

pks016 ・ 2 days ago

If the Claude team care for feedback for the free model.

I'm using the free model via chat from the beginning. This is the first time, I'm seriously considering moving away from Claude. Before last month, Claude's Sonnet model was consistent in quality. But, now the responses are all over the place. It's hard to replicate the issue as it happens once in a while. I rarely encountered hallucinations from Claude models with questions from my domain however since last month I have observed abundance of them.

ianberdin ・ 2 days ago

Yesterday I faced 5h window limit for the first time. I was surprised. Max 20x plan. Usually I work 12-15 hours per day 7 days a week with no limits. But yesterday it was under 3 hours… what a pity.

kirby88 ・ 2 days ago

I've been building an AI coding agent that using the exact same prompt than claude code, but uses a virtual filesystem to minify source code + the concept of stem agents (general agents that specializes during the conversation for maximum cache hit). The results on my modest benchmark is 50% of claude code cost and 40% of the time. https://github.com/kirby88/vix-releases

ofjcihen ・ 2 days ago

Been running into the same issue since a week or 2 ago on Opus.

To be fair I have a pretty loose harness and pattern but it’s been enough to pull in 20k in bounties a month for a long time without going over plan with very little steering (sometimes days of continuous work)

That being said I’ve figured this was coming for a long time and have been slowly moving to local models. They’re slower but with the right harnesses and setup they’re still finding much the same amount in bounties.

swordsith ・ 2 days ago

You're really completing bug bounties with found with AI? are companies honoring these?
- ofjcihen ・ 2 days ago
  
  Yeah definitely. To be fair before LLMs I was a security researcher for years so with that experience I was more or less able to replicate most of an acceptable process (even up to report generation).
  I still review and make a decision about every report though.
  In contrast I think a lot of people are just pointing agents at websites and then telling them to create and send a report which is a great way to produce trash and a ban.

hyperionultra ・ 2 days ago

Vote with wallet. The voting continues until product improve or die.

aeneas_ory ・ 2 days ago

Besides some of the obvious hacks to reduce token usage, properly indexed code bases (think IntelliJ) reduce token usage significantly (30%-50%, while keeping or exceeding result quality compared with baseline) as shown with https://github.com/ory/lumen

Anthropic is not incentivized to reduce token use, only to increase it, which is what we are seeing with Opus 4.6 and now they are putting the screws on

bojangleslover ・ 2 days ago

That's weird, I'm on the $100/mo and I use it for around 2-4hrs a day often with multiple terminal windows and I never even hit 20% of my quota.

alphabettsy ・ a day ago

This is my experience too, and I always find these posts confusing. I consider myself a very heavy user 4-6 hrs a day and I never hit limits. I have on the $20 plan but not with Max.

mrbonner ・ 2 days ago

Unverifiable software stack now amplified with LLM undetermistic. This while thing starts to feel like we are building on top a giant house of card!

anonyfox ・ 2 days ago

Essentially I also am now using sonnet instead of opus most of the time as a default. Even a single project only coding session with opus without any external plugins or skills won’t make it to the 5hr mark now before limits claw in. And the weekly limit is even more brutal now it seems, reaching 50%+ in like ~2 days now easily … with mostly sonnet! On the highest 20x plan!

auggierose ・ a day ago

I switched to Codex, it's a monster compared to Opus.

algoth1 ・ 2 days ago

Wasn't Antrophic previously offering double the token usage outside busy hours? Now they are counting tokens back at normal rate. But yeah, it's not good. I use codex because claude insists in peaking at and messing with folders and file outside its work area though

10keane ・ 2 days ago

this same pattern seems to occur every time a new model is about to release. i didnt notice the usage problem - i am on 20x. but opus 4.6 feels siginificantly dumber for some reason. i cant qualitify it, but it failed on everyday tasks where it used to complete perfectly

peterpanhead ・ 2 days ago

Every time there is a new model coming I think they deteriorate the current. This happens every darn time. Opus 4.6 isn't as sharp, not even close to as it was few weeks ago.

jubilanti ・ 2 days ago

I'm also hitting the limits in a day when it would last the entire week. The service is literally worth 4x to 6x less. Imagine I go to my favorite restaurant and I pay the same for 1/5th of the food. Bye bye, you have to vote with your wallet.

heyitsaamir ・ 2 days ago

It would be really nice to have improved transparency in token usage and throttling imo.

tiku ・ 2 days ago

Went with Kimi and z.ai a while back, no regrets yet. When I started using it the limit was far away but Anthropic moves the goalposts, tried to get my money back but they rejected it. Lesson learned, never buy a full year.

jatora ・ 2 days ago

Absolutely. Full year subs are all designed to lock you in. For a product with so little transparency and so much volatility in competition, this is a utility loss for nearly every consumer

mortsnort ・ 2 days ago

I think this comes from Anthropic recently implementing auto routing of model effort. You can manually set effort with /effort in CC.

It does seem like this new routing is worse for the consumer in terms of code quality and token usage somehow.

bad_haircut72 ・ 2 days ago

They also need to fix the 30 second lag between submitting the request and actually starting to get tokens back - it used to be instant, and still is at work where we use Anthropic models via an enterprise copilot subscription.

armchairhacker ・ 2 days ago

Make an AI usage tracker like https://marginlab.ai/trackers/codex/. These hearsay anecdotes prove nothing.

oybng ・ 2 days ago

Cancelled my subscription after repeatedly hitting ridiculously low limits. Unfortunately since off-peak free usage was increased there are way more timeouts and failed requests, but hey at least it's free.

dr_dshiv ・ 2 days ago

"Hey Claude, can you help me create a strategy to optimize my token use so I don't run into limits so often?" --> worked for me! I had two $200 plans before and now I am cool despite all day use

wellthisisgreat ・ 2 days ago

How and when do you apply the strategy?
- dr_dshiv ・ 2 days ago
  
  I've only had to do a major token optimization once. It reduced my memory, claude.md, mcps, etc... that's usually the big issue. but of course it gets dumber without the context of the tools but smarter with the cleaner window. so you have to find your balance.
  But like most challenges with claude, if you can just express them clearly, there are usually ways of optimizing further

catketch ・ 2 days ago

stuff is getting goofy. I can blow through claude's session limit on sonnet, i don't even bother with opus now. same prompts and code for codex and it will hardly put a dent in the quota ($200/yr claude vs $20/mo codex). This is not with any crazy parallel agents, mcps, or skills.... pretty much vanilla installs, with some projects using beads.

I don't have the receipts, but I think they were somewhat closer in Jan/Feb.

pawelduda ・ 2 days ago

50 days ago I wrote this [1] as the world seemed high on AI and it gave me crypto bubble vibe.

Since then, I've been seeing increased critique of Anthropic in particular (several front page posts on HN, especially past few days), either due to it being nerfed or just straight up eating up usage quota (which matches my personal experience). It appears that we're once again getting hit by enshittiffication of sorts.

Nowadays I rely a lot on LLMs on a daily basis for architecture and writing code, but I'm so glad that majority of my experience came from pre-AI era.

If you use these tools, make sure you don't let it atrophy your software engineering "muscles". I'm positive that in long run LLMs are here to stay. The jump in what you can now self-host, or run on consumer hardware is huge, year after year. But if your abilities rely on one vendor, what happens if you come to work one day and find out you're locked out of your swiss army knife and you can no longer outsource thinking?

[1] https://news.ycombinator.com/item?id=47066701

elthor89 ・ 2 days ago

Are there local models dedicated to programming already any good? That could be a way to deal with anthropic or others flipflopping with token usage or limits

smrtinsert ・ a day ago

Short answer no. Less short answer, the science is catching up to big ones quickly.

niklasd ・ 2 days ago

We also experienced hitting our Claude limits much earlier than before during the last two weeks. Up to a degree where we were thinking it must be a bug.

docheinestages ・ 2 days ago

Anthropic paved the path for agentic coding and their pricing made it possible for masses of people to discover and experiment with this new style of development. Their Claude Code plans subsidized usage of models so much that I'm sure they must've had negative margin for quite some time. But now that they have acquired a substantial user base, it makes sense for them to dial back and become more greedy. These quiet and weird changes to the behavior of Claude in the recent weeks must have been due to both this increased greed and their struggles with scaling.

What I wish for right now is for open-weight models and hardware companies (looking at you Apple) to make it possible to run local models with Opus 4.6-level intelligence.

@Anthropic I've cancelled my subscription. Good luck :)

ozozozd ・ 2 days ago

Pretty sure OpenCode is not subsidizing, and across Codex 5.x always on xhigh, Claude Opus 4.6 on high effort and a bunch of Chinese models, I only burned about $50 over the last month.

I don’t understand why people insist on these subscriptions and CC.

Fanboyism is a bit too hardcore at this point. Apple fanboys look extremely prudent compared to this behavior.

jatora ・ 2 days ago

For reference, users on claude max 20x who hit their weekly quota would have spent roughly ~$6,000/month in the API. (Source: my own usage)
So you just aren't in the same realm of usage. Maybe that is why you don't understand?
- ozozozd ・ a day ago
  
  I guess I could’ve been clearer.
  What I don’t understand is why people aren’t trying models that are 10x and in some cases 100x cheaper.
  Though unclear why you’d assume all my usage would be on Claude Opus when I mentioned “a bunch of Chinese models?”
  Unless this is a flex about how many tokens you burned. In which case, congrats...?

spiderfarmer ・ 2 days ago

That’s why I switched to Codex. It’s so much more generous and in my experience, just as good. Also, optimizing your setup for working with agents can easily make a 5x difference.

vr46 ・ 2 days ago

Isn't the generous Codex plan ending? Possibly yesterday?
> As the Codex promotion on Plus winds down today
laksjhdlka ・ 2 days ago

> Also, optimizing your setup for working with agents can easily make a 5x difference.
Any highlights you can share here? I'm always looking to improve me setup.
- spiderfarmer ・ 2 days ago
  
  I mostly use Laravel in my projects. Laravel Boost and the PAO package by Nuno Maduro are awesome. One makes it use make commands for example, the other reduces output for tests and errors.
quotemstr ・ 2 days ago

Plus, whenever Codex does something you dislike, you can just tell Codex to fix itself. Open source software is wonderful.
Especially when it's on purpose.

delduca ・ 2 days ago

I noticed the same in last weeks. I canceled my Max 5X and subscribed to Copilot (with Opus 4.6).

It is hard now to hit the limit...

sdevonoes ・ 2 days ago

I guess it’s better to step down now that we can rather than wait until it becomes impossible (Stockholm syndrome)

No FOMO

semiquaver ・ 2 days ago

As an anecdote, I use the pro max 5x plan heavily for coding and have almost never hit a limit.

lforster ・ 2 days ago

Lol imagine how much overcharging is going on for enterprise tokens. This is just the beginning.

peterpanhead ・ 2 days ago

I don't understand Anthropic. Be consistent. Why do models deteriorate to shit, this is not good for workflows and or trust. What Opus 4.7 is gonna come out and again the same thing? Come on.

gessha ・ 2 days ago

I’m processing some images(custom board game images -> JSON) with a common layout and basic structure and I exhausted my quota after just 30 images(pleb Pro account). I have 700 images to process…

What I did instead is tune the prompt for gemma 4 26b and a 3090. Worked like a charm. Sometimes you have to run the main prompt and then a refinement prompt or split the processing into cases but it’s doable.

Now I’m waiting for anyone to put up some competition against NVIDIA so I can finally be able to afford a workstation GPU for a price less than a new kidney.

bit1993 ・ 2 days ago

You know Emacs still works.

softwaredoug ・ 2 days ago

So glad I just pay by the token.

qwertyforce ・ 2 days ago

thats exaclty why i prefer codex

Achshar ・ 2 days ago

I feel like I am living in a bubble, no one seems to mention Antigravity in these discussions and I have not had any issues with Ultra subscription yet. It seems to go on forever and the Interface is so much better for dev work as compared to CC. (Though admittedly my experience with cc is limited).

I strongly believe google's legs will allow it to sustain this influx of compute and still not do the rug-pull like OAI or Anthropic will be forced to do as more people come onboard the code-gen use case.

behole ・ 2 days ago

I shred my Maxx5 in 2 hours on the reg this week! Glm here I come!

iLoveOncall ・ 2 days ago

It's very easy to calculate the actual cost given they list the exact tokens used. If I take the AWS Bedrock pricing for Opus 4.6 1M context (because Anthropics APIs are subsidized and sold at a loss), here's what each costs:

Cache reads cost $0.31

Cache writes cost $105

Input tokens cost $0.04

Output tokens cost $28.75

The total spent in the session is $134.10, while the Pro Max 5x subscription is $100.

Even taking the Anthropics API pricing, we arrive at $80.58. Below the subscription price, but not by much.

It's just the end of the free tokens, nothing to see here. It's easy to feel like you're doing "moderate" or even "light" usage because you use so little input tokens, but those "agentic workflows" are simply not viable financially.

stavros ・ 2 days ago

It's crazy, a few weeks ago the limits would comfortably last me all week. This week, I've used up half the limit in a day.

tiahura ・ 2 days ago

Also pro max 5x and hit quota for first time yesterday.

jedisct1 ・ 2 days ago

GPT-5.4 works amazingly well.

I’ve moved away from Claude and toward open-source models plus a ChatGPT subscription.

That setup has worked really well for me: the subscription is generous, the API is flexible, and it fits nicely into my workflow. GPT-5.4 + Swival (https://swival.dev) are now my daily drivers.

turblety ・ 2 days ago

Yeah it's much better, another plus is you can use it with OpenCode (or other 3rd party tools) so you can easily switch between Codex and most other models by alright companies (not Anthropic or Google).
- BoredPositron ・ 2 days ago
  
  The two comments together sound like 2000s infomercial.
scrollop ・ 2 days ago

Chatgpt has better limits however the responses even on 5.4 xtra thinking are not as good as sonnets. Wish Claude would get their house in order.
- ejpir ・ 2 days ago
  
  Show us some reciepts in the form of a exported session. I've been a heavy user of Claude up untill the end of feb, but switched to Codex because it's better at handling large code bases, following the "plan", implementing the backend changes in Zig. If you ask Claude to do a review of the code and suggest fixes, then let it Codex review it, then again ask Claude, it will 99% of the time say. Oh yes you are right, let me fix that.
  Either you are using it wrong or you are working in a totally different field.
vidarh ・ 2 days ago

I hit the limits on the lower tiers of Codex just as fast as with Claude. At the moment I'm cycling between Claude, Codex, GLM5.1, and Kimi. The latter two are getting good enough, though, that I can make things go really far by doing planning with Opus and then switching to one of the cheap models for execution.
- jedisct1 ・ 2 days ago
  
  I have a ChatGPT Pro plan, I use it a ton, and I've never hit the limit in the past few months.
  
  vidarh ・ 2 days ago
  
  We have very different levels of use. I hit the weekly limit in two days last week.

gavinray ・ 2 days ago

Codex is the only CLI I've had purely positive experiences with. Take that for what you will

blastro ・ 2 days ago

Codex is my preferred, I use it at work. The whole "Department of War" fiasco was enough for me to say Goodbye to OAI for personal. I'm a Claude person now. It's about the same level of performance really.

jLaForest ・ a day ago

After last week I canceled my claude subscription and bought the GitHub copilot subscription ($40/mo tier) so far I've been very happy, haven't hit any usage limits yet and seems like I won't ever at this rate

jandrese ・ 2 days ago

I mean this is expected is it not? These companies burned unimaginable amounts of investor cash to get set up and now they have to start turning a profit. They can't make up for the difference with volume because the costs are high, so the only option is to raise prices.

Traubenfuchs ・ a day ago

Seems like the math ain‘t mathing for any ither but Anthropics pay-per-token API plan.

Try it out and you will quickly see how much money they‘d really like for your excessive usage.

x86hacker1010 ・ 2 days ago

Im sorry but I have to finally cancel, it’s gotten abysmal.

dboreham ・ 2 days ago

Random data point: I beat on Claude pretty much every day and have never run into limits of any kind.

TheRealPomax ・ 2 days ago

And in classic Anthropic fashion at this point, their issues appear to just be for show. No one triages them, no one responds to them.

desireco42 ・ 2 days ago

I don't use Claude so this doesn't affect me, but I worry it will spoil the fun for me for following reason.

They inflated how much their tools burn tokens from day one pretty much,remember all the stupid research and reports Claude always wanted to do, no matter what you asked it. Other tools are much smarter so this is not such a big deal.

More importantly, these moves tend to reverberate in the industry, so I expect others will clamp down on usage a lot and this will spoil my joy of using AI without countring every token.

Burning tokens doesn't just wastes your allotment, it also wastes your time. This gave rise to turbo offering where you get responses faster but burn 2x your tokens.

nprateem ・ 2 days ago

I've seen ridiculously fast quota usage on antigravity too, where sometimes lots of work is possible, then it all goes literally within 4 questions.

Probably a combination of it being vibe coded shit and something in the backend I expect.

lvl155 ・ 2 days ago

Constant complaints about Anthropic. Not much on OAI/Codex. It seems people should just use OAI and come back when they realize compute isn’t free elsewhere.

Rekindle8090 ・ 2 days ago

I put this in a reply but I'm also posting it as a general comment:

Please unsubscribe to these services and see how they perform:

"Maybe if I spend more money on the max plan it will be better" > no it will be the same "Maybe if I change my prompt it will work" > no it will be the same "Maybe if I try it via this API instead of that API it will improve" > no it will be the same.

Claude, ChatGPT, Gemini etc all of these SOTA models are carefully trained, with platforms carefully designed to get you to pay more for "better" output, or try different things instead of using a different product.

It's to keep you in the ecosystem and keep you exploring. There is a reason you can't see the layers upon layers of scaffolding they have. And there's a reason why after 2 weeks post major update, the model is suddenly "bad" and "frustrating". It's the same reason its done with A/B testing, so when you complain, someone else has no issues, when they complain, you have no issues. It muddies the water intentionally.

None of it is because you're doing anything wrong, it's not a skill issue, it's a careful strategy to extract as much engagement and money from customers as possible. It's the same reason they give people who buy new gun skins in call of duty easier matches in matchmaking for the first couple games.

Stop paying more, stop buying these pro max plans, hoping it will get better. It won't, that's not what makes them money. Making people angry and making people waste their time, while others have no issues, and making them explore and try different things for longer so they can show to investors how long people use these AI tools is what makes them money.

When competitors have a better product these issues go away When a new model is released these issues don't exist

I was paying a ton of money for claude, once I stopped and cancelled my subscription entirely, suddenly sonnet 4.6 is performing like opus and I don't have prompts using 10% of my quota in one message despite being the same complexity.

maerF0x0 ・ 2 days ago

so more or less this? https://www.pcmag.com/news/tinder-hinge-sued-5-million-preda...

holoduke ・ 2 days ago

I spend full 20x the week quota in less than 10 hours. How is that possible? Well try to mass translate texts in 30 languages and you will hit limits extremely quick.

pxc ・ 2 days ago

Translation generally works well with very small models compared to the frontier LLMs. You can definitely run a model on your own hardware for this.
- holoduke ・ 2 days ago
  
  Maybe words. But quality texts in even with opus not perfect. But good enough.
  
  pxc ・ 2 days ago
  ・ 2 more
  
  For short texts, the translation I usually want the most is fast translation, and local models are actually great for this.
  But for high-ish quality translations of substantive texts, you typically want a harness that's pretty different from Claude Code. You want a glossary of technical terms or special names, a structured summary of the wider context, a concise style guide, and you have to chop the text into chunks to ensure nothing is missed. Even with super long context models, if you ask them to translate much at once they just translate an initial portion of it and crap out.
  Are you using it for localization or short strings of text in an app? I wonder what you can do to get better results out of smaller models. I'm confident there's a way.
  
  holoduke ・ 2 days ago
  
  Yea. I agree. In our case we are creating short news articles of max 3 or 4 paragraphs. The texts are translated in multiple passes into various languages. We use a simple system prompt that instructs the llm to ensure simple authentic language output. With Opus we get seriously good results. The goal is not literal translation, but good translations. I tried hoiku for a while, but its not good in many languages. Sonnet is okaish, but not good enough.
peterpanhead ・ 2 days ago

That's a really gnarly task but I'm shocked it burns 20x that fast. How large is the text? That matters more than anything.
- scrollop ・ 2 days ago
  
  I imagine book sized texts.
Perz1val ・ 2 days ago

Should have switched the model to Haiku

rdevilla ・ 2 days ago

Bubble's bursting, get in.

ozim ・ 2 days ago

I have the opposite conclusion.
Demand is higher than supply it is just the start of bubble.
Everyone and their dog is burning tokens on stupid shit that would be freed up if they would ask to make deterministic code for the task and run the task. OpenAI, Anthropic are cutting free use and decreasing limits because they are not able to meet the demand.
When general public catches up with how to really use it and demand will fall and the today built supply will become oversupply that’s where the bubble will burst.
I say 5 more years.

bakugo ・ 2 days ago

This is your regular friendly reminder that these subscriptions do not entitle you to any specific amount of usage. That "5x" is utterly meaningless because you don't know what it's 5x of.

This is by design, of course. Anyone who has been paying even the slightest bit of attention knows these subscriptions are not sustainable, and the prices will have to go up over time. Quietly reducing the usage limits that they were never specific about in the first place is much easier than raising the prices of the individual subscription tiers, with the same effect.

If you want to know what kind of prices you'll be paying to fuel your vibe coding addiction in a few years, try out API pricing for a bit, and try not to cry when your 100$ credit is gone in 2 days.

mannanj ・ 2 days ago

so basically the anthropic employee who responded says those 1h caches were writes were almost never accessed, so a silent 5m cache change is for our best interest and saves cost. (justifying why they did this silently)

however his response gaslights us because in the OPs opening post his math demonstrates this is not true, it shows reads 26x more so at least in his case the cache is not doing what the anthropic employee describes.

clearly we are being charged for less optimization here and being given the message (from my perspective by anthropic) that if you are in a special situation your needs don't matter and we will close your thread without really listening.

hirako2000 ・ 2 days ago

What also gives it away is the refusal to at least expose this TTL via parameter. In the same sentence as informing the 5m won't change since it's your interest.
It's also in the interest of the users to keep certain params private, we are meant to deduce that. Did you not ?
SkyPuncher ・ 2 days ago

My suspicion is the have an overall fixed cache size that dumps the oldest records. They’re now overflowing with usage and consistently dumping fresh caches.
During core US business hours, I have to actively keep a session going or I risk a massive jump in usage while the entire thread rebuilds. During weekend or off-hours, I never see the crazy jumps in usage - even if I let threads sit stale.
love2read ・ 2 days ago

> that if you are in a special situation your needs don't matter and we will close your thread without really listening.
Are there any other $50B+ Valuation companies that care about special situations? If so, who?

a7om_com ・ 17 hours ago

[dead]

bustah ・ a day ago

[dead]

alexwelsh ・ a day ago

[dead]

hadifrt20 ・ 2 days ago

[dead]

undefined ・ 2 days ago

[deleted]

brunooliv ・ 2 days ago

[flagged]

HauntingPin ・ 2 days ago

How is any of what you wrote relevant? People aren't using Claude for the first time and hitting rate limits. They've been using Claude for months, at the very least, and they're hitting rate limits without significant changes to how they prompt.
> People need to understand a few things: vague questions make the models roam endlessly “exploring” dead ends.
> If people were considerably more willing to aggressively prune their context and scope tasks well, they could get a lot more done with it
If this were the problem, people would've encountered this when they started using Claude. The problem is not that they can't get anything done. It's being able to get things done for months, but suddenly hitting rate limits way too easily and response quality being clearly degraded, so they can't get things done that used to be possible.
- brunooliv ・ 2 days ago
  
  I think in this case, we probably have different experiences that shape how we see some things differently: I see many (very smart) people doing certain things that are not optimal (eg: copy-paste entire files instead of referencing them or tell claude at every message to "read CLAUDE.md and follow its instructions precisely") which can lead to a lot of token waste. If certain system prompts were tweaked internally or some models now read more files than before, keeping these "inneficient prompts" will make limits exhaust faster. Sub-agents or this new agent teams feature didn't exist until a few months ago: that alone eats A LOT of tokens, not intended for this pre-paid API usage, etc.
  The ecosystem is evolving super quickly so, our own experiences and workflows must keep adapting with it to experiment, find limitations and arrive at the "tightest possible scope" that still allows you to get things done, because it is possible.
  Another example: pre-paid monthly subscription aggregates usage towards web and Claude Code, for eg. So if you're checking for holiday itineraries over your lunch break, then decide to sit down and ask a team of agents to refactor a giant codebase with hundreds or thousands of files, context will be exhuasted quickly, etc, etc.
  I see this "context economy" as a new way of managing your "mental models": every token counts, and every token must bear its weight for the task at hand, otherwise, I'm "wasting budget". I am also still learning how to operate in this new way of doing things, and, while there have been genuine issues with Claude Code, not every single issue that people encounter is an upstream problem.
  
  HauntingPin ・ 2 days ago
  ・ 2 more
  
  This is literally victim blaming. When people haven't been having issues until now, why is it their fault? Anthropic is providing a paid service to paying users. It's not acceptable that they degrade our experience to save some money and it's not acceptable to blame everybody else who didn't cause the issue.
  
  brunooliv ・ 2 days ago
  
  In the end, Anthropic is a company and needs to make money, my best bet is that even those of us who pay 100/mo to use Claude Code are costing Anthropic money, besides all the rest they’re burning on inference.
  Again, I agree with you and the service should be at least reliable but to be completely fair, if I had to bet, the amount of usage people get for 100/mo is probably only balanced out by the corporate/entreprise customers paying their bill to Anthropic via API usage.
  If we look at it through this lens, this limits are not surprising at all, except maybe on how generous they are/were. It’s pretty obvious that they want to force people to pay as they go….
npn ・ 2 days ago

This is a copypasta right? I'm damn confident I have read the same content before.
stavros ・ 2 days ago

> Anthropic CAN change their limits and rates as they see fit, there’s never been hard promises or SLOs on these plans.
No they can't. When I buy an annual subscription and prepay for the year, they can't just go "ok now you get one token a month" a day in. I bought the plan as I bought it. They can't change anything until the next renewal.
- palata ・ 2 days ago
  
  Agreed. I wouldn't have bought an annual subscription under the current conditions.
- maerF0x0 ・ 2 days ago
  
  That probably is somewhere in the EULA or other contract you agreed to. I'm not arguing it's any kind of fair, nor am I a lawyer so IDK if it's enforceable, but I bet it's in there somewhere.
  
  stavros ・ 2 days ago
  
  Well yes, that's how they get you, but at some point the law will have to change to become fair.
- hk__2 ・ 2 days ago
  
  > I bought the plan as I bought it. They can't change anything until the next renewal.
  So no new models, no new features?
  
  stavros ・ 2 days ago
  
  That's up to them. I'd be fine to not get access to new models or features, which is why I'm fine to pay $XX to buy some desktop software and use it forever as-is.
  If they're selling me compute and bundling the features in, they better not go back on the compute I paid for.
- brunooliv ・ 2 days ago
  
  It's the nature of SaaS software, right? It doesn't need to be an enforced "hard change", but, let's say that they trained Opus 4.6 to be more "verbose" or to explore more files to gain more context for it's own tasks.
  If your limits stay "the same", but you then use Opus 4.6, your quota will be exhausted much faster, it's just how it works.
  Note that some features are simply NOT made for these Pro, Max, Max 5x or whatever pre-paid plans. I'm pretty sure this is by design and not an accident or a bug: If you have 6/7 MCP servers configured or if you want to use this new feature of "Agent Teams", you will exhaust your entire quota before ANY work is even done. This is not a bug. Each agent has its own context window and tools and they all count separately.
  MCP servers, when active, add A LOT of context to your sessions before you even use them, etc, etc.
  It feels to me that people want to have their cake and eat it too, but, that would NOT be a sustainable business model. You can not complain about the tools if you can't understand them in-depth.
  I want to state that I don't think Anthropic are fully aware of the ramifications that ANY small change in ANY of their models might have, because their entire ecosystem is a bit messy atm, but, I'm certain they're aware that if people dont like it, they will cancel the subscription and flock to a competitor very quickly, since there's no real moat anymore. So, it's in their own interest to keep things minimally usable even on the "cheaper plans".
  I have seen people with 5-10 "active MCP servers" that they "wanted to try out" then they forget about it and wonder why their context is always full... Cmon... that's almost bad faith.
  I don't fully defend Anthropic as they've had several issues with degraded model quality after releasing "the latest model", and CLI usability that cost me real money and real tokens, so, there's a lot of room for improvement, but, to claim that quota gets exhausted after 1h it points out to either some forgotten MCP servers, skills or giant files being accidentally read in, or some sort of mis-use which these limits were put in place to prevent exactly.
  There's a very thin line between: quota is exhuasted on a regular, normal session after 1h and I think there's a bug versus I had 3-4 MCP servers active that I am not using at all but forgot to disable and my CLAUDE.md file is 1000 lines...
  
  stavros ・ 2 days ago
  ・ 5 more
  
  What you say makes sense, but they very actually reduced the token limits. We had, say, 20M tokens/week before, now we have 18M tokens/week (example numbers). They didn't just make a model that eats tokens faster.
  
  brunooliv ・ 2 days ago
  ・ 4 more
  
  Is that documented somewhere? Do you have a link? I might have just missed it, and if they did it, I will take my words back.
  
  stavros ・ 2 days ago
  ・ 3 more
  
  https://www.ghacks.net/2026/03/27/anthropic-reduces-claude-s...
  
  brunooliv ・ 2 days ago
  ・ 2 more
  
  Ah yes this is sad to see and a lame move for sure… It’s indeed dependent on usage hours but it’s a bad move even if I’m personally not affected since I use it outside of those hours but I agree it’s lame….
  
  stavros ・ 2 days ago
  
  I've recently noticed my limits getting used up much more quickly whether this is on peak hours or off. Maybe it's just me, maybe not.
throwaway2027 ・ 2 days ago

They rolled out 1M context then they start doing this shit? I know Pro doesn't have access to the 1M context but what a joke.

rvz ・ 2 days ago

Why so many 'developers' complaining about Claude rate limiting them? You know you can actually....use local LLMs? instead of donating your money to Anthropic's casino?

I guess this is fitting when the person who submitted the issue is in "AI | Crypto".

Well there's no crying at the casino when, you exhaust your usage or token limit.

The house (Anthropic) always wins.

mr_mitm ・ 2 days ago

Local LLMs are nowhere near as powerful as commercial models. Plus, they have hefty hardware requirements.
benjiro3000 ・ 2 days ago

[dead]

vfalbor ・ 2 days ago

Some months ago, I created a software for this reason, it has no success, but the thing is that communities could reduce tokens consumption, not all is LLM, you can share things from API calls between agents. Even my idea was no success I think it is a good concept share things each others, if you have some interest it's called tokenstree.com