AWSPRODUCTIONARCHITECTURE

The six pillars of production GenAI, in practice

28 May 2026 · 5 min read · Daniyal Malik

When AWS published the Well-Architected Generative AI Lens, my first reaction was: finally, the unglamorous stuff has a name. The six pillars are exactly the things that separate a demo from a system you can put in front of customers. Here's what each one means in practice — the version I actually use on builds.

1. Operational excellence

Can you deploy a prompt change the way you deploy code? That means versioned prompts, a rollback path, and monitoring KPIs that go beyond uptime — answer quality, refusal rate, latency. At OO7 AI, a bad prompt is a production incident, so it ships through the same gates as any code change.

2. Security

IAM around the model, data protection in transit and at rest, and guardrails on both input and output. The mental model that keeps you safe: the model is an untrusted text processor pointed at your data. Treat it like one.

3. Reliability

The pillar people skip. Service quotas, retries, fallbacks across providers — and, uniquely for GenAI, handling the model being confidently wrong. Reliability isn't "the API responded." It's "the answer was trustworthy, or the system knew it wasn't."

4. Performance efficiency

Picking the right model for the job rather than the biggest model for everything; provisioned vs on-demand throughput; prompt caching; explicit latency budgets. A live voice agent and a nightly batch job have opposite performance profiles and should not be architected the same way.

5. Cost optimisation

Token-based FinOps. Caching, right-sized models, and knowing your cost-per-request before you scale. The line between a viable product and a money pit is often one routing decision and a healthy cache-hit rate.

6. Sustainability

Right-sized models and serverless footprints. The smallest model that passes your evals is also the greenest — efficiency and sustainability point in the same direction, which is rare and worth exploiting.

Why this matters

Most AI projects die not because the model couldn't do the task, but because nobody architected for these six things — and the demo never survived contact with production load, real cost, or a single wrong answer in front of a customer. The Lens is a checklist for not dying that way.

If you want a read on where your AI workload sits against these six, that's exactly what a scoping call is for.

← all field notes Ask my AI twin about this →