When you have a RAG based system in place it becomes necessary to have methods of evaluating it, much like one evaluates a Machine Learning model by various criteria.

Some metrics for evaluation are:

  1. Search precision
  2. Contextual relevance
  3. Response accuracy

<aside> 💡

During the generation phase the LLM may overlook the context and hallucinate i.e. fabricate information. Such responses are not grounded in reality

</aside>

Solution

As with any evaluation, there will be some metrics. By understanding these metrics we can understand what we can do differently with each phase of the RAG pipeline.

Frameworks

Ragas