RAG or fine-tuning — how I actually decide
20 May 2026 · 4 min read · Daniyal Malik
"Should we fine-tune a model?" is the second question almost every team asks me. The answer is usually "not yet — and probably not at all." Here's how I actually decide.
The default is RAG, and here's why
Fine-tuning teaches a model how to behave. RAG gives it what to know. Most business problems are knowledge problems, not behaviour problems — you want the model to answer from your documents, your policies, your data, and to show its work. That's retrieval, not training.
RAG also wins on the things that decide whether a system survives production:
- Freshness — update a document and the answer updates. No retraining cycle.
- Attribution — you can cite the source. At Preamble, every quote line cites the exact source page. You cannot do that with a weight buried in a fine-tune.
- Governance — delete a document and the model can no longer use it. Try un-learning a fine-tune.
- Cost — no training runs, no GPU bill, no MLOps pipeline to keep alive.
When fine-tuning actually earns its place
Reach for it when the problem is behavioural, not informational:
- A format or voice you can't prompt reliably at high volume, where prompt tokens get expensive.
- A narrow, stable task — classification, extraction, routing — where a small fine-tuned model beats a frontier model on latency and cost.
- Scale economics — when a smaller fine-tuned model plus RAG is cheaper than a big model doing everything in-context.
Note the "plus RAG." It is rarely fine-tuning instead of retrieval. It's fine-tuning the behaviour and retrieving the knowledge.
The order I actually go in
- Prompt + retrieval first. Get a grounded RAG pipeline working and measured. Most "we need fine-tuning" requests disappear right here.
- Optimise retrieval. Chunking, reranking, hybrid search, a real eval set. Most quality problems are retrieval problems wearing a model costume.
- Only then consider fine-tuning — for the specific behaviour prompting couldn't reach, with an eval set that proves it helped.
The trap
Teams want to fine-tune because it feels like "real AI." It's the expensive, slow, hard-to-govern option, and it usually solves a problem that better retrieval would solve for a fraction of the cost. Fine-tuning is a scalpel. Most teams reach for it like a hammer.
If you're weighing this for a real build, I'm happy to pressure-test it with you.