Journal | Field Notes on Data & AI Delivery

01AI delivery notes

RAG in Production: Chunking, Retrieval Quality, and the Problems the Demo Hides

A RAG assistant becomes a product the moment someone relies on it during real work. From that point on, the engineering around the model matters more than the demo.

Context window limits, suboptimal chunking, retrieval evaluation, hybrid search, observability, and cost control. The real engineering behind a RAG system that works past the demo.

RAGRetrievalEvaluationLLM Ops

March 15, 20267 min read

Read article See portfolio

What matters

Index like a real pipeline

Stable IDs, source versions, and diff-based ingestion matter before prompt tuning does.

Score retrieval separately

If the right evidence is not showing up, the model never had a fair chance.

Plan for refusal and tracing

A trustworthy assistant cites, abstains, and leaves behind a debuggable trail.

Working rule

Trace every answer

First fix

Ingestion quality

Failure to avoid

Confident guesswork

Field notes from shipping data and AI systems

RAG in Production: Chunking, Retrieval Quality, and the Problems the Demo Hides

Building a Data Platform People Actually Trust