www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1?utm_source=gradientflow&utm_medium=newsletter
1 Users
0 Comments
27 Highlights
0 Notes
Tags
Top Highlights
This means that if our chunk contains more than 512 sub-word tokens (4 chars ≈ 1 token), the embedding wouldn't account for it anyway
extract the text in between them
We can then compute the distance between all of the chunk embeddings and our query embedding to determine the top-k chunks.
And then we could use LLMs on the fewer k chunks to determine the <k chunks to use as the context to answer our query. We could also use reranking
combine embeddings with traditional information retrieval methods such as keyword matching,
a list of dictionaries
If we were to use these large sections, then we'd be inserting a lot of noisy/unwanted context and because all LLMs have a maximum context length,
split the text within each section into smaller chunks.
index (store)
(text, source, embedding) triplets for each embedded chunk
top-k most relevant chunks
cosine distanc
we can just as easily embed and index any new data and be able to retrieve it to answer questions.
num_chunks=5
Component-wise evaluation
And for end-to-end evaluation, we can assess the quality of the entire system (given the data sources, what is the quality of the response).
isolation
evaluator LLM
we need a dataset of questions and the source where the answer comes from. We can use this dataset to ask our different evaluators to provide an answer and then rate their answer (ex. score between 1-5).
. To address this cold start problem, we could use an LLM to look at our text chunks and generate questions that the specific chunk would answer.
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.