Claude 3.5 Sonnet Got End-of-Lifed on Bedrock While I Was Deploying

I was three weeks into a chatbot rollout for a client. Friday morning, mid-deploy, my Lambda started returning 400s. The API Gateway was happy. CloudWatch showed the invocations going out. Bedrock was returning, with admirable consistency, an error that said the model I was calling did not exist.

I checked the model ID. Same one I’d been using for three months. I checked the region. Same one. I checked the IAM policy. Same one. Then I opened my email and saw the AWS notice that I had not, in fact, read when it arrived eight days earlier. The model my chatbot was built around — Claude 3.5 Sonnet on Bedrock — had hit the end of its deprecation window. It was now gone from on-demand invocation. I could still reach it through legacy paths for another month. I had not configured any legacy paths.

The Bedrock console had been showing a “Deprecated” badge on that model for a month. I had not noticed because I do not browse the Bedrock console for fun. The console is a place I go to confirm something is working. It is not a place I go to see what is being taken away.

The email was polite. The timeline was textbook. The recommended replacement was Amazon Nova Lite. AWS did everything a vendor should do to give a customer a runway. The customer — me — had simply not used the runway.

So. Switch to Nova Lite. Same product family. Same Bedrock API surface. Same calling pattern. How different could it be.

Reader, it was different.

Cascade one: inference profile IDs. Newer Bedrock models don’t accept a raw model ID like anthropic.claude-3-5-sonnet-20240620-v1:0. They want an inference profile ID — something like us.amazon.nova-lite-v1:0. The us. prefix is mandatory. It indicates cross-region inference, which is the only way to invoke the newer models from most regions, because AWS routes the underlying calls across regions for capacity reasons. This is not documented on the Bedrock landing page. It is in a docs page titled “Cross-region inference,” two clicks deep. The error message you get without it is “Model not found,” which is not a clue. It is a wall.

Cascade two: IAM. When you call an inference profile, Bedrock fans out under the hood to one of several foundation models in several regions. Your IAM policy needs to allow invoking the profile AND invoking any of the underlying foundation models. Two ARNs, two statements: one for the profile, one for the foundation models with a wildcard region. My first deploy after fixing the model ID still got AccessDeniedException. I spent forty minutes thinking my IAM policy was correct because the profile ARN was right there in the resource list. It was. It just wasn’t enough.

Cascade three: request and response format. Nova doesn’t use the same message shape as Claude on Bedrock. The roles are named differently. The content array is structured differently. The system prompt lives in a different field. The handler I had written, the one that worked for 3.5 Sonnet without me thinking about it, had to be rewritten. Not a sweeping rewrite. A surgical one. But “just swap the model ID” was, in retrospect, a fantasy I had told myself at the start of the morning.

Cascade four: prompting. Nova Lite has different tendencies than Claude 3.5. It’s terser. It interprets instructions slightly more literally. The few-shot examples I had tuned to Claude’s voice produced shorter, blunter answers from Nova. That’s not a bug. That’s a different model. But it meant another half-hour of prompt tuning to get the chatbot’s tone back to where it was on Thursday.

One model deprecation. Four cascading infrastructure changes: inference profile, IAM policy, request/response handler, prompt tuning. The blast radius of “we are sunsetting this model” extends through everything that touches the model.

The lesson, expensively learned: If your code thinks the model is a stable interface, your code is wrong.

Here is the SRE framing I should have been using from the start. AI model deprecation is now its own class of vendor-controlled infrastructure churn. Like a managed database engine EOL. Like a TLS protocol version retirement. Like a Python runtime hitting end-of-life on Lambda. It has a vendor, a clock, an email notification you will probably not read in time, and a migration cost that scales with how tightly your code is coupled to the deprecated thing.

We have known how to handle this class of churn for decades. The pattern is boring and it works. Put an abstraction layer between your code and the vendor-controlled thing. Make the model ID a config value, not a hardcoded string. Make the request/response format something a wrapper normalizes, not something your handler knows about. Pin known-good versions in your IaC. Maintain a runbook: “if model X is deprecated, change config to Y, run regression suite Z.” Have a regression suite at all.

I had none of this on the chatbot. I had a Lambda that called Bedrock with a hardcoded model ID, in line, no wrapper, no fallback, no test harness. The handler assumed the response shape would never change. The IAM policy named the model directly. I had built the chatbot the way I build things when I am working alone on a deadline: as fast as possible, with all the corners cut that you cut when nobody is going to deprecate anything on you next quarter. Then somebody deprecated something on me next quarter.

The client doesn’t know any of this happened. From their side, the chatbot was working on Thursday and is working again now. They didn’t see the email. They didn’t see the four cascading changes. They didn’t see the forty minutes I spent on IAM. That’s actually the deliverable: the chatbot keeps working, regardless of what AWS does on a Friday. Part of what I sell as Three Moons Network — and I am only now putting words to it — is the work of absorbing this kind of churn before it reaches the client. The chatbot keeps working. That’s the product. Everything underneath, the model swaps and the IAM updates and the regression tests, is the part the client never has to see and never has to think about.

The actual product: The chatbot keeps working. Everything underneath is the part the client never has to think about.

AWS is not the villain in this story. They sent the email. They posted the timeline. They gave me a year of warning. The villain, to the extent there is one, is the assumption I had been carrying without examining it — that a model invocation in my Lambda was a stable interface. It was not. It was always a vendor-controlled dependency with a sunset clock. I just hadn’t priced that in.

The Lambda now has an abstraction layer over the Bedrock client. The model ID is a config value. The request and response normalization happens in a wrapper. The IAM policy is parameterized. There is a regression test that hits the model with five known inputs and confirms the outputs are within an expected envelope. If Nova Lite gets deprecated next year — which is plausible, it is not even Nova’s flagship — I will change three config values, run the regression suite, and re-deploy. Hopefully.

Hopefully is not a strategy. It is what I have until the next deprecation proves me wrong, at which point I will add the next layer of abstraction, and the layer after that, and the architecture will keep getting more boring, which in this business is the same thing as getting more correct.