For about three weeks I thought it was me. I'd ask Claude to do something I'd asked it to do a hundred times before, and it would come back with a half-baked answer, or a lazy one-shot, or “you should check your IAM policy” when I had specifically asked it to write the IAM policy. I reloaded the session. Re-read CLAUDE.md. Tightened up the prompt. Added more context. Pulled context out. Switched from Opus to Sonnet to Haiku and back to Opus. Same flavor of dumb every time.

Turns out it wasn't me. It was them.

In March, Anthropic quietly dropped the reasoning effort on Claude Code from high to medium. No banner, no changelog item, no “hey heads up the model is doing less thinking now.” Just a knob got turned. Then in late March they shipped a cache bug that made the model lose track of its own context across calls. Then in April they tuned it to be less verbose, which apparently meant less thorough, which on a coding assistant translates pretty directly to “less correct.”

Three regressions in five weeks. Stacked on top of each other like a bad change calendar. And the public-facing response was, for a while, basically “some of these changes were for your benefit.” Reader, they were not for my benefit. I am the customer. I had been told I was paying for the model that thinks hard about my problem. I was not getting that model. Nobody told me.

SRE rule of thumb: If I shipped three silent regressions to a production service in five weeks at any of my SRE jobs, I would be sitting in a conference room explaining myself with the lights on.

To Anthropic's credit — and I do mean this — they eventually published a real postmortem on April 23rd. Three separate root causes, real timeline, no blame-shifting, no marketing language about “an opportunity to improve.” Just: here is what we changed, here is what broke, here is how we found it, here is what we are doing differently. As far as engineering postmortems from a vendor go, it was one of the better ones I have read this year. I save them. I have a folder.

So I will give them that. But I want to talk about everything that happened before April 23rd, because that is the part that should make every single person paying for an AI tool nervous.

You can't see the changes. There is no version pin. There is no “I'd like to stay on the model from last Tuesday, please.” When the vendor turns a knob, your tool changes underneath you, and the only way you find out is by feel. You start typing a thing you've typed a hundred times and the response comes back wrong, and you sit there for a minute trying to figure out if you've forgotten how to write a prompt.

In SRE we have a name for this. It's called a silent regression. The service is technically up. The error rate looks fine. The dashboards are green. But the thing it's doing is no longer the thing it was doing, and the only customer-visible signal is “it kinda sucks now.” Silent regressions are the worst class of incident, because by the time the SLO actually trips, the damage has been done for weeks. You can't roll back what you don't know broke.

I have monitoring on every important thing in my infrastructure. I have a daily Terraform drift check. I have a credential expiry notifier. I have a synthetic that hits the chatbot Lambda from three regions and screams if it gets a 5xx. The one piece of my stack with no monitoring, no version pin, no rollback path, no SLA I can read, and no signal when it changes — is the AI itself. The thing I'm supposedly building everything else around.

The lesson I'm taking from all of this is the unsatisfying one. You cannot trust the vendor's silent state. You have to build your own ground truth. I'm now keeping a folder of “known good” prompts with their expected outputs, run weekly against whatever Claude version is live, with a diff against the last run. If today's Claude can't pass yesterday's Claude's bar, I want to know about it before the blog post starts coming out half-baked at 7:01am. It's regression testing for the AI.

Which sounds insane to type and is insane to type, but here we are.

The good news: when your AI suddenly feels stupid, it is statistically more likely that the AI got stupid than that you did. Trust the feeling.

The bad news: the only person who is going to verify that for you is you.