Ragas and Evaluation Modules

What are good prompt examples to answer questions solely based on provided context in a RAG application?

Search for: What are good prompt examples to answer questions solely based on provided context in a RAG application?

One RAG prompt to rule them all: article: Dhar Rawal, Medium

  1. Covers a series of instructions to the LLM how to structure the answer by just using the context
  2. Step by step instructions
  3. Instructions to adjust the response to get a final response
  4. Structure of the prompt
  5. An example
  6. intermediate response
  7. How to test manually how the prompting works

  1. Context recall: How good is the retrieval based on the question
  2. Answer Relevancy: How relevant is the answer from LLM
  3. Faith fullness: How faithful to the retrieved context is this answer?
  1. Focus on retrieval
  2. Manually test the retrieval system how good it is
  3. Find out what are the "parameters" that effect retrieval?
  1. Using GPT 3.5 to answer
  2. Using GPT 4 for ground truth
  1. ResponseSchema
  2. StructuredOutputParser
  3. How to use them
  1. Not a very good video! :(
  2. Important stuff is covered too quickly
  3. It is not clear if this is a QA and optimization time utility or do you put this on Prod
  4. Probably not on prod
  5. To tune the retrieval params, human intervention on the data set first, and then use that tuning to decide on the development and production
  6. In other words it could be a way to decide on what chunking and retrieval strategy works best for a given data set.

LangChain and Ragas sample code and examples article

Search for: LangChain and Ragas sample code and examples article

Langchain, ragas, langsmith

Evaluate RAG with ragas: article

Ragas docs for langchain at ragas

ragas documentation

Here is how you customize LLMs for Ragas: yes it needs an LLM

Getting started with ragas: Ragas docs

  1. Ragas is just a python library
  2. Installed via pip install
  3. Well integrated with langchain
  4. It takes question, answer, context and optionally a "reference answer" called a ground truth
  5. Then it analyzes them for relevancy, how faithful to the context, precision (how much junk was brought along with good stuff), and if the context has ALL the answers that are in the ground truth (Meaning is what is retrieved a superset of the truth. It is ok to have junk)
  6. The article "Evaluating RAG Applications with RAGAs by Leonie Monigatti" is really good introduction
  7. The second set of references are at the docs.ragas.io, the official docs, especially the explanation of their metrics and how they do it.
  8. Here at their doc site you have some information on how to install and some getting started deal
  9. Ragas does use an LLM to make these metrics.
  10. So the official docs has a section on how to customize the LLMs for it
  1. How many round trips it does to the LLM
  2. As you analyze by varying the embeddings and other chunking methodologies, and measure their metrics, is this only of value during "development" time?
  3. Can this be used for production such as analyzing the response?
  4. It definitely does not have the wherewithal to decide on the relevancy or appropriateness of the question itself! That may need to be another module!