Ragas and Evaluation Modules

satya - 4/21/2024, 2:50:11 PM

What are good prompt examples to answer questions solely based on provided context in a RAG application?

satya - 4/21/2024, 2:59:14 PM

One RAG prompt to rule them all: article: Dhar Rawal, Medium

satya - 4/21/2024, 3:05:36 PM

This covers

Covers a series of instructions to the LLM how to structure the answer by just using the context
Step by step instructions
Instructions to adjust the response to get a final response
Structure of the prompt
An example
intermediate response
How to test manually how the prompting works

satya - 4/21/2024, 3:12:43 PM

An evaluation approach video with RAGas

satya - 4/21/2024, 3:28:04 PM

Few terms

satya - 4/21/2024, 3:42:48 PM

Additional notes

satya - 4/21/2024, 3:43:41 PM

and....

satya - 4/21/2024, 3:46:49 PM

Understand

satya - 4/21/2024, 4:15:17 PM

In summary

Not a very good video! :(
Important stuff is covered too quickly
It is not clear if this is a QA and optimization time utility or do you put this on Prod
Probably not on prod
To tune the retrieval params, human intervention on the data set first, and then use that tuning to decide on the development and production
In other words it could be a way to decide on what chunking and retrieval strategy works best for a given data set.

satya - 4/21/2024, 4:45:10 PM

LangChain and Ragas sample code and examples article

satya - 4/21/2024, 4:55:35 PM

Langchain, ragas, langsmith

satya - 4/21/2024, 4:57:31 PM

Evaluate RAG with ragas: article

satya - 4/21/2024, 7:58:40 PM

Ragas docs for langchain at ragas

satya - 4/21/2024, 8:36:08 PM

ragas documentation

satya - 4/21/2024, 8:40:51 PM

Here is how you customize LLMs for Ragas: yes it needs an LLM

satya - 4/21/2024, 8:42:12 PM

Getting started with ragas: Ragas docs

satya - 4/21/2024, 9:05:14 PM

Quick summary

Ragas is just a python library
Installed via pip install
Well integrated with langchain
It takes question, answer, context and optionally a "reference answer" called a ground truth
Then it analyzes them for relevancy, how faithful to the context, precision (how much junk was brought along with good stuff), and if the context has ALL the answers that are in the ground truth (Meaning is what is retrieved a superset of the truth. It is ok to have junk)
The article "Evaluating RAG Applications with RAGAs by Leonie Monigatti" is really good introduction
The second set of references are at the docs.ragas.io, the official docs, especially the explanation of their metrics and how they do it.
Here at their doc site you have some information on how to install and some getting started deal
Ragas does use an LLM to make these metrics.
So the official docs has a section on how to customize the LLMs for it

satya - 4/21/2024, 9:11:45 PM

It is not clear however

How many round trips it does to the LLM
As you analyze by varying the embeddings and other chunking methodologies, and measure their metrics, is this only of value during "development" time?
Can this be used for production such as analyzing the response?
It definitely does not have the wherewithal to decide on the relevancy or appropriateness of the question itself! That may need to be another module!