RAG for Dummies

Retrieval Augmented Generation(RAG) is a machine learning technique that enhances the capabilities of Large Language Models to provide more accurate and up-to date responses.

How RAG works:

(i)The user asks the Large Language Models (LLM) a question


This content originally appeared on DEV Community and was authored by NgetichB

Retrieval Augmented Generation(RAG) is a machine learning technique that enhances the capabilities of Large Language Models to provide more accurate and up-to date responses.

How RAG works:

(i)The user asks the Large Language Models (LLM) a question

(ii)Retrieval is the second step whereby the RAG system uses the question asked to search an external knowledge base for relevant information. The RAG system uses these three techniques; chunking, embedding and vector database. Chunking- the information in the knowledge base is broken into smaller pieces for efficient searching, the chunks are then converted into numerical representations that capture their meanings, finally the system searches a vector database to find the chunks that are more likely similar to the question asked.

(iii)Augmentation-The most relevant information from the retrieval process is then added to the original question to form an ‘augmented prompt’

(iv)The LLM receives the prompt and uses the original question and the retrieved context to generate a more comprehensive and accurate response

Models used in RAG

(i)Retrieval Models- these act as a detective that gathers relevant documents from the external knowledge base before the LLM generates an answer. The two types of retriever models are Sparse- examples BM25 and TF-IDF and Dense retrievers -eg Llamaindex & Haystack.

(ii) Language Models (LLMs)- the generation component takes the users original prompt and the retrieved information and uses its learned knowledge to create a coherent, natural language response. The examples are- Transformer-based models( GPT-2, GPT-3, and BART (Bidirectional and Auto-Regressive Transformers) and Flan T5 used for the generation part

RAG is applied in Medical AI, chatbots, chat engines and legal assistance. It serves the purpose of bridging the gap between static information and dynamic knowledge hence reduces ambiguity and increases precision, transparency and accuracy.


This content originally appeared on DEV Community and was authored by NgetichB


Print Share Comment Cite Upload Translate Updates
APA

NgetichB | Sciencx (2025-09-13T21:04:33+00:00) RAG for Dummies. Retrieved from https://www.scien.cx/2025/09/13/rag-for-dummies/

MLA
" » RAG for Dummies." NgetichB | Sciencx - Saturday September 13, 2025, https://www.scien.cx/2025/09/13/rag-for-dummies/
HARVARD
NgetichB | Sciencx Saturday September 13, 2025 » RAG for Dummies., viewed ,<https://www.scien.cx/2025/09/13/rag-for-dummies/>
VANCOUVER
NgetichB | Sciencx - » RAG for Dummies. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/09/13/rag-for-dummies/
CHICAGO
" » RAG for Dummies." NgetichB | Sciencx - Accessed . https://www.scien.cx/2025/09/13/rag-for-dummies/
IEEE
" » RAG for Dummies." NgetichB | Sciencx [Online]. Available: https://www.scien.cx/2025/09/13/rag-for-dummies/. [Accessed: ]
rf:citation
» RAG for Dummies | NgetichB | Sciencx | https://www.scien.cx/2025/09/13/rag-for-dummies/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.