Building a “Poor Man’s Agentic RAG” with Python: A Step-by-Step Guide

This content originally appeared on Level Up Coding - Medium and was authored by Qazi Murtaza Ahmed

Retrieval Augmented Generation (RAG) has emerged as a powerful technique to enhance language models' capabilities in AI. By combining the strengths of large language models (LLMs) with external knowledge sources, RAG models can access and process information from the real world, leading to more informative, relevant, and grounded responses. This is especially crucial for tasks requiring up-to-date information or domain-specific knowledge that the LLM might not have been trained on.

But wait, RAG is not what it's all hyped about; it retrieves and generates but doesn't act. What if we want our RAG to be more proactive and do things based on the information it retrieves? That's where the concept of an "Agentic RAG" comes in. A true Agentic RAG would involve planning, tool use, and environmental interaction. Building a full-fledged agentic system is complex, but we can create a "poor man's" version that has some of these key elements in a simplified way. This guide will walk you through the process of building such a system in Python.

For a concise and accurate description of Agentic RAG, read this article — How I finally got agentic RAG to work right

This POC was inspired by Cole Medin, who created this POC using OpenAI; I have simply attempted to take his approach and use it with local LLM, hence Poor Man’s Agentic RAG. This blog only makes sense if you have watched these two videos from him — video 1 and video 2

I want to do this POC to see how far ahead or backward the Local LLM scene is compared to pay-per-token services; I bet many of us want to play with LLM without considering token usage.

We start by defining the model and dependencies;

supabase: Client = create_client(
    os.getenv("SUPABASE_URL"),
    os.getenv("SUPABASE_SERVICE_KEY")
)

model = OllamaModel(
    model_name=os.getenv('OLLAMA_MODEL_NAME'),
    base_url=os.getenv('OLLAMA_BASE_URL')
)

@dataclass
class PydanticAIDeps:
    supabase: Client
    model: OllamaModel
    system_prompt: strpytpo

pydantic_ai_expert = Agent(
    model,
    deps_type=PydanticAIDeps,
    retries=2
)

Next, we need a neat mechanism to log our tool usage; like me, you don't have a paid LLM logging service, so this might be something you can incorporate.

class ToolUsageTracker:
    def __init__(self):
        self.tools_used = []
        self.required_tools = [
            'retrieve_relevant_documentation',
            'list_documentation_pages',
            'get_page_content'
        ]

    def track_tool(self, tool_name: str):
        self.tools_used.append(tool_name)
        logging.debug(f"Tool used: {tool_name} ✅")

    def get_missing_tools(self) -> List[str]:
        return [tool for tool in self.required_tools if tool not in self.tools_used]
tool_tracker = ToolUsageTracker()

This can be used inside of tools, which can then print for us if the tool was used; you might not think much of it yet, but it does help us understand the execution of tools from LLM,

# execute with in the tool functions i.e
tool_tracker.track_tool("retrieve_relevant_documentation")

# In console
# 🔍 2025-01-29 14:29:00,997 - DEBUG - Tool used: retrieve_relevant_documentation✅

Now that we have our first debugging tools — let's move on to defining prompts.

def update_system_prompt(attempt: int) -> str:
    base_prompt = """
You are an expert at Pydantic AI - a Python AI agent framework that you have access to all the documentation to,
including examples, an API reference, and other resources to help you build Pydantic AI agents, use it to answer users question.

Your only job is to assist with this and you don't answer other questions besides describing what you are able to do.

Don't ask the user before taking an action, just do it. Always make sure you look at the documentation with the provided tools before answering the user's question unless you have already.

When you first look at the documentation, always start with RAG.
Then also always check the list of available documentation pages and retrieve the content of page(s) if it'll help.

Always let the user know when you didn't find the answer in the documentation or the right URL - be honest.
To ensure accuracy and efficiency, always consult the Pydantic AI documentation and follow this structured approach:

1. **Retrieve Relevant Documents**: Use `retrieve_relevant_documentation` to get the most relevant documentation chunks based on the user's query.
2. **Analyze Documentation Pages**: Use `list_documentation_pages` to identify all available documentation pages that could help answer the question.
3. **Generate Content Summary**: Use `get_page_content` to gather and summarize the full content of any relevant documentation pages identified.

"""

    if attempt > 0:
        unused_tools = tool_tracker.get_missing_tools()
        additional_instructions = f"""
        IMPORTANT: In your previous {attempt} attempt{'s' if attempt > 1 else ''}, you failed to use all required tools or provide a complete answer.
        It is CRUCIAL that you use ALL tools in the specified order before answering.
        Failure to do so will result in an incorrect response.
        In your previous attempt, you didn't use these tools: {', '.join(unused_tools)}. Please ensure you use ALL tools in the correct order before providing an answer.
        """
        base_prompt += additional_instructions

    return base_prompt

I moved away from simple prompt definitions and wanted it to be more dynamic. On the first attempt, this function returns the initial prompt, but if it receives an `attempt` parameter greater than 0, it does some magic; it appends to the original system prompt, suggesting what went wrong—highlighting the attempt number and retrieving the tools that were not used, enforcing the use of the tools.

We have a solid foundation to build upon, our debugging tool, and our advanced prompting technique. Now, we need something to create embeddings, and our local LLM is here to rescue us.

async def get_embedding(text: str) -> List[float]:
    """Get embedding vector from Ollama API."""
    url = os.getenv('EMBEDDING_API_URL')
    payload = {
        "model": os.getenv('EMBEDDING_MODEL', 'nomic-embed-text'),
        "prompt": text
    }
    try:
        async with aiohttp.ClientSession() as session:
            async with session.post(url, json=payload) as response:
                if response.status == 200:
                    result = await response.json()
                    return result['embedding']
                else:
                    logging.error(f"Error getting embedding: HTTP {response.status}")
                    return [0] * 768  # Changed to 768 for nomic-embed-text
    except Exception as e:
        logging.error(f"Error getting embedding: {e}")
        return [0] * 768  # Changed to 768 for nomic-embed-text

We used almost the same definitions for Pydantic AI tools as in Cole Medin's videos. I will copy them for posterity's sake.

@pydantic_ai_expert.tool
async def retrieve_relevant_documentation(ctx: RunContext[PydanticAIDeps], user_query: str) -> str:
    """
    Retrieve relevant documentation chunks based on the query with RAG.
    
    Args:
        ctx: The context including the Supabase client and Ollama client
        user_query: The user's question or query
        
    Returns:
        A formatted string containing the top 5 most relevant documentation chunks
    """
    tool_tracker.track_tool("retrieve_relevant_documentation")
    try:
        query_embedding = await get_embedding(user_query)
        result = ctx.deps.supabase.rpc(
            'match_site_pages',
            {
                'query_embedding': query_embedding,
                'match_count': 5,
                'filter': {'source': 'pydantic_ai_docs'}
            }
        ).execute()
        
        if not result.data:
            return "No relevant documentation found."
            
        formatted_chunks = [
            f"# {doc['title']}\n\n{doc['content']}"
            for doc in result.data
        ]
        return "\n\n---\n\n".join(formatted_chunks)
        
    except Exception as e:
        print(f"Error retrieving documentation: {e}")
        return f"Error retrieving documentation: {str(e)}"

@pydantic_ai_expert.tool
async def list_documentation_pages(ctx: RunContext[PydanticAIDeps]) -> List[str]:
    """
    Retrieve a list of all available Pydantic AI documentation pages.
    
    Returns:
        List[str]: List of unique URLs for all documentation pages
    """
    tool_tracker.track_tool("list_documentation_pages")
    try:
        result = ctx.deps.supabase.from_('site_pages') \
            .select('url') \
            .eq('metadata->>source', 'pydantic_ai_docs') \
            .execute()
        
        if not result.data:
            return []
            
        urls = sorted(set(doc['url'] for doc in result.data))
        return urls
        
    except Exception as e:
        print(f"Error retrieving documentation pages: {e}")
        return []

@pydantic_ai_expert.tool
async def get_page_content(ctx: RunContext[PydanticAIDeps], url: str) -> str:
    """
    Retrieve the full content of a specific documentation page by combining all its chunks.
    
    Args:
        ctx: The context including the Supabase client
        url: The URL of the page to retrieve
        
    Returns:
        str: The complete page content with all chunks combined in order
    """
    tool_tracker.track_tool("get_page_content")
    try:
        result = ctx.deps.supabase.from_('site_pages') \
            .select('title, content, chunk_number') \
            .eq('url', url) \
            .eq('metadata->>source', 'pydantic_ai_docs') \
            .order('chunk_number') \
            .execute()
        
        if not result.data:
            return f"No content found for URL: {url}"
            
        page_title = result.data[0]['title'].split(' - ')[0]  # Get the main title
        formatted_content = [f"# {page_title}\n"]
        
        for chunk in result.data:
            formatted_content.append(chunk['content'])
            
        return "\n\n".join(formatted_content)
        
    except Exception as e:
        return f"Error retrieving page content: {str(e)}"

Lastly, our brains of the operation;

async def main():
    # original_question = "what LLM models are supported by PydanticAI?"
    original_question = "get me the Weather agent code"
    
    for attempt in range(3):
        logging.info(f"Attempt {attempt + 1}")
        tool_tracker.tools_used = []

        updated_system_prompt = update_system_prompt(attempt)
        logging.info(f"system_prompt to be executed: {updated_system_prompt}")
        deps = PydanticAIDeps(supabase=supabase,
                    model=model,
                    system_prompt=updated_system_prompt
               )

        pydantic_ai_expert.model_settings = settings.ModelSettings(
            temperature=0.0,
            parallel_tool_calls=False
        )

        response = await pydantic_ai_expert.run(user_prompt=original_question, deps=deps)
        logging.info(f"before validation answer: {response.data}")

        unused_tools = tool_tracker.get_missing_tools()
        logging.info(f"Tools used in attempt {attempt + 1}: {tool_tracker.tools_used}")
        logging.info(f"Unused tools: {unused_tools}")
        logging.info(f"pydantic_ai_expert response: {response.data}")

        validation_result = await validate_answer_against_query(
            user_query=original_question,
            generated_answer=str(response.data),
        )

        if validation_result and len(unused_tools) < 1:
            logging.info(f"Answer validated and all tools used on attempt {attempt + 1}")
            break
        elif validation_result and len(unused_tools) > 0:
            logging.warning("Answer validated but not all tools were used. Retrying.")
        else:
            logging.warning(f"Validation failed on attempt {attempt + 1}")

    else:
        logging.error("Failed to get a valid answer using all tools after 3 attempts")
        response.data = "Unable to generate a valid answer using all required tools after multiple attempts. Please try rephrasing your question."

    logging.info(f"Final Answer: {response.data}")

This is where we have diverted in approach; what I have done is encapsulate the agent in execution in a loop and ask another agent to validate if the answer is valid.

Below, you can find the validation function;

async def validate_answer_against_query( user_query: str, generated_answer: str) -> bool:
    """
    Validate if the generated answer matches the intent of the user's query using Pydantic AI.
    
    Args:
        ctx: The context containing the Pydantic AI agent
        user_query: The original user query
        generated_answer: The answer generated by the system
        
    Returns:
        bool: True if the answer matches the query intent, False otherwise
    """
    try:
        validation_system_prompt = """
        You are an expert at validating responses to ensure they accurately address the original query.

        Your primary objective is to verify:
        1. Accuracy: Does the answer correctly address the question?
        2. Completeness: Is the answer thorough and relevant?
        3. Directness: Does the answer directly respond to the user's request?

        Answer with "Yes" if all three criteria are met, otherwise "No".

        Examples of responses:
        - Yes: "The answer properly addresses the query."
        - No: "The answer does not relate to the question."

        Only return "Yes" or "No" - no additional explanation.
        """
        validation_prompt = f"Does this answer properly address the following question? Answer with 'Yes' or 'No'.\n\nQuestion: {user_query}\nAnswer: {generated_answer}"
        validation_agent = Agent(model=model, system_prompt=validation_system_prompt)
        validation_agent.model_settings = settings.ModelSettings(
            temperature=0.0,
            parallel_tool_call=False,
        )
        result = await validation_agent.run(
            user_prompt=validation_prompt
        )
        logging.info(f"validation result: {result}")
        return str(result.data).lower().startswith('yes')
        
    except Exception as e:
        logging.error(f"Error validating answer: {e}")
        return False

AND... Results —

it worked

it failed 99% of the time

This exploration of a "Poor Man's Agentic RAG" system, built with Python and local LLMs, offers a glimpse into this approach's potential and current limitations. While the system successfully integrates tool usage, dynamic prompting, and an iterative refinement process, it also reveals the challenges of achieving consistent and reliable performance with local LLMs.

While a crucial component, the tool usage proved inconsistent, sometimes deviating from the prescribed order or omitting calls altogether. This erratic behaviour and occasional successes highlight the gap between local LLM capabilities and the more robust performance of paid hosted services (as seen in Cole Medin's videos, which works flawlessly or does it?). Despite these inconsistencies, the project provides valuable insights into the complexities of building agentic RAG systems.

The dynamic prompt engineering, incorporating feedback on tool usage, demonstrates a promising direction for improving LLM behaviour. While imperfect, this "poor man's" approach is a valuable stepping stone for future development.

I will explore alternative validation strategies. Leveraging advancements in local LLM technology will be crucial for realizing the full potential of agentic RAG without relying on external, token-based services.

To make powerful AI tools more accessible, continued innovation in local LLM development, particularly in tool calling and consistent execution, is needed.

In my next attempt, I am trying to replace the local LLM with AWS bedrock serverless options for both chat completion and tool calling and embeddings; let us see if we can build budget-friendly serverless Agentic RAG.

Models used for testing;

llama3.2:3b
llama3.1:latest
qwen2.5:14b

embedding model;

nomic-embed-text

If you have had better luck, please share how I could improve this implementation in the comments.

Building a "Poor Man's Agentic RAG" with Python: A Step-by-Step Guide was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This content originally appeared on Level Up Coding - Medium and was authored by Qazi Murtaza Ahmed

Print Share Comment Cite Upload Translate Updates

APA

Qazi Murtaza Ahmed | Sciencx (2025-01-31T14:00:22+00:00) Building a “Poor Man’s Agentic RAG” with Python: A Step-by-Step Guide. Retrieved from https://www.scien.cx/2025/01/31/building-a-poor-mans-agentic-rag-with-python-a-step-by-step-guide/

MLA

" » Building a “Poor Man’s Agentic RAG” with Python: A Step-by-Step Guide." Qazi Murtaza Ahmed | Sciencx - Friday January 31, 2025, https://www.scien.cx/2025/01/31/building-a-poor-mans-agentic-rag-with-python-a-step-by-step-guide/

HARVARD

Qazi Murtaza Ahmed | Sciencx Friday January 31, 2025 » Building a “Poor Man’s Agentic RAG” with Python: A Step-by-Step Guide., viewed ,<https://www.scien.cx/2025/01/31/building-a-poor-mans-agentic-rag-with-python-a-step-by-step-guide/>

VANCOUVER

Qazi Murtaza Ahmed | Sciencx - » Building a “Poor Man’s Agentic RAG” with Python: A Step-by-Step Guide. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/31/building-a-poor-mans-agentic-rag-with-python-a-step-by-step-guide/

CHICAGO

" » Building a “Poor Man’s Agentic RAG” with Python: A Step-by-Step Guide." Qazi Murtaza Ahmed | Sciencx - Accessed . https://www.scien.cx/2025/01/31/building-a-poor-mans-agentic-rag-with-python-a-step-by-step-guide/

IEEE

" » Building a “Poor Man’s Agentic RAG” with Python: A Step-by-Step Guide." Qazi Murtaza Ahmed | Sciencx [Online]. Available: https://www.scien.cx/2025/01/31/building-a-poor-mans-agentic-rag-with-python-a-step-by-step-guide/. [Accessed: ]

rf:citation

» Building a “Poor Man’s Agentic RAG” with Python: A Step-by-Step Guide | Qazi Murtaza Ahmed | Sciencx | https://www.scien.cx/2025/01/31/building-a-poor-mans-agentic-rag-with-python-a-step-by-step-guide/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Related Posts