Retrieval-augmented generation—RAG—feels like a solved problem on paper. You chunk your documents, embed them, store them in a vector database, retrieve relevant chunks when a user asks a question, feed them to the LLM, and done.
We've now built RAG systems for three different clients in different domains. Along the way, we've learned that the basic idea is simple, but getting it to work reliably for real users takes attention to a lot of subtle details.
Here are some of the lessons we've learned.
Chunking strategy matters more than you think
If you just split every 500 tokens regardless of content structure, you'll get mediocre results. We've found that it's worth spending time on a chunking strategy that respects the natural structure of your documents.
For example, if you're working with technical documentation, you probably want to keep each section together instead of splitting it in the middle. If you're working with a knowledge base of articles, keep the article as one chunk if it's not too long, or split by logical subsections.
One trick that's worked well for us: use larger chunks than you think you need, then use a smaller window for the final prompt context. This gives the model more context around the retrieved chunk, which helps it understand what it's looking at.
Embedding model choice makes a difference
It's tempting to just use whatever embedding model comes with your framework, but different models work better for different types of content.
We've had good results with recent open embedding models that are optimized for retrieval. They're not that much bigger, and the retrieval quality is noticeably better.
And don't forget: you can fine-tune embeddings on your own data if you have the right training data. Even a small amount of fine-tuning on your specific domain can improve results a lot.
It's not just about similarity
Basic RAG just retrieves the most similar chunks to the query. But that doesn't always give you what you need. Sometimes you need to think about more than just semantic similarity.
For example, do you need to prioritize more recent information? Should documentation from the product team get higher priority than internal meeting notes? Do you need to make sure you're getting information from multiple sources instead of just one very similar chunk?
Adding a bit of custom ranking after retrieval can go a long way.
Hybrid search helps
We started with pure vector search on every project, and we ended up adding keyword-based hybrid search on every project. It doesn't add that much complexity, and it catches cases where vector search misses the exact term the user is asking about.
This is particularly helpful when users are searching for specific product names, error codes, or other specific terms where exact matching matters.
Always show your sources
When a user gets an answer from your RAG system, they need to see where the information came from. This builds trust, and it lets them go read the original source if they want more details.
It also helps you debug the system. If the answer is wrong, you can look at the retrieved sources and see if the problem was bad retrieval or bad summarization.
Start simple, iterate
There are a lot of fancy RAG techniques out there: hierarchical indexing, multi-query retrieval, reranking, recursive retrieval, graph RAG...
Our advice: start with the simplest thing that could possibly work. Get that working, measure where the problems are, then add more complexity only where you need it.
Most of the time, you don't need the fanciest approach. Good chunking, good embeddings, and hybrid search will get you 90% of the way there for most use cases.
Wrapping up
RAG is a powerful pattern for building knowledge-based AI systems, and it's definitely here to stay. But it's not a magic button you can press and get perfect results every time.
The good news is that most of the improvements come from paying attention to the specifics of your data and your use case, not from adding the latest complex technique.
Understand your documents, test with real user queries, and iterate. That's still the best advice we can give.