It takes question, answer, context and optionally a "reference answer" called a ground truth
Then it analyzes them for relevancy, how faithful to the context, precision (how much junk was brought along with good stuff), and if the context has ALL the answers that are in the ground truth (Meaning is what is retrieved a superset of the truth. It is ok to have junk)
The article "Evaluating RAG Applications with RAGAs by Leonie Monigatti" is really good introduction
The second set of references are at the docs.ragas.io, the official docs, especially the explanation of their metrics and how they do it.
Here at their doc site you have some information on how to install and some getting started deal
Ragas does use an LLM to make these metrics.
So the official docs has a section on how to customize the LLMs for it
satya - 4/21/2024, 9:11:45 PM
It is not clear however
How many round trips it does to the LLM
As you analyze by varying the embeddings and other chunking methodologies, and measure their metrics, is this only of value during "development" time?
Can this be used for production such as analyzing the response?
It definitely does not have the wherewithal to decide on the relevancy or appropriateness of the question itself! That may need to be another module!