This content originally appeared on Level Up Coding - Medium and was authored by Ali.Mz
Overview of Creating a Data Querying Agent using LangGraph and CopilotKit

1. Introduction
In today’s data-driven landscape, organizations accumulate vast amounts of structured data daily. However, accessing and interpreting this type of data often requires proficiency in Structured Query Language (SQL), which implies a barrier for non-technical stakeholders. Business professionals, marketers, and decision-makers frequently depend on data analysts to extract insights, which leads to delays in the decision-making process.
In this tutorial, we’ll explore how to build an end-to-end production-ready ReAct Agent that enables us to interact with relational databases using everyday language. Using LangGraph for managing the agent’s reasoning process and CopilotKit for creating an intuitive frontend interface, we’ll learn how to translate natural language questions into SQL queries and present the results through user-friendly visualizations.
The entire codebase could be found in this GitHub repo for FREE: https://github.com/al-mz/insight-copilot
2. Architecture Overview
Before we dive into the Natural Language Data Querying Agent, let’s review the architecture to understand the essential components of the system that facilitate interaction between users and structured data. Here are the key elements:
Frontend Layer
- Framework: Built using Next.js and React to ensure a responsive and dynamic user interface.
- AI Integration: Integrates CopilotKit to embed AI capabilities directly into the frontend, enabling conversational interactions.
- User Interface: Features a modern dashboard with interactive charts and an AI-powered sidebar for real-time data visualization and querying.
Backend Layer
- API Management: Utilizes FastAPI to handle HTTP requests, providing a robust and efficient backend service.
- AI Agent System: Employs LangGraph to manage the AI agent’s reasoning process, translating natural language into SQL queries and performing data analysis.
- Database: Leverages SQLite for simple data storage, offering a lightweight and easy-to-manage solution for fast prototyping. SQLite could be replaced with cloud-based database solutions during the deployment phase.
Key Components
- UI Components: Includes dashboards, charts, and interactive elements that present data in an accessible format. You can take this template and change it to whatever UI/UX you desire for your use case.
- CopilotKit Integration: Handles AI interactions and manages the chat interface, facilitating natural language communication.
- API Routes: Serve as the communication bridge between the frontend and backend, ensuring smooth data flow.
- LangGraph Agent: Processes user queries, generates corresponding SQL commands, and interprets results to provide meaningful insights.
- Database: Stores and retrieves data efficiently, supporting the agent’s query requirements.
Data Flow
- The user inputs a query in natural language through the frontend interface.
- The frontend sends this request to the backend via defined API routes.
- The LangGraph agent interprets the natural language, references the database scheme (i.e. table structures, column names, and relationships), formulates an appropriate SQL query, and executes it against the SQLite database with retry logic. If the query execution fails due to syntax errors, missing fields, or other issues, the agent is equipped with retry and fallback logic. This logic analyzes the error, adjusts the query, and reattempts execution, ensuring more robust data retrieval.
- The SQLite database processes the refined query and returns the raw data to the backend.
- The LangGraph agent takes the raw data and compiles it into a natural language response, summarizing the results in a way that aligns with the user’s original query intent.
- The backend sends the generated response back to the frontend, where it is presented to the user as a natural language output, effectively closing the query loop.

3. The Power of LangGraph
LangGraph is a flexible framework designed to build stateful, modular agents that can reason, act, and adapt dynamically.
In this project, I used LangGraph to implement a ReAct (Reasoning and Acting) agent. This agent processes natural language queries, formulates SQL commands based on the database schema, and interprets the results to provide meaningful responses. In case of failure, the agent is able to analyze the error and come up with a new plan of action while keeping the user updated on the progress (more on that in the CopilotKit discussion).
For instance, if a generated SQL query fails, the agent can analyze the error, adjust the query accordingly, and retry.
Additionally, LangGraph’s stateful architecture allows the agent to maintain context across interactions, which enables it to handle follow-up questions and refine its responses based on previous exchanges. The ReAct pattern involves the agent iteratively reasoning about a query, deciding on an action (such as what tables should be called or generating a SQL query), executing that action, and observing the outcome.
Agent Configuration
In LangGraph, agent configuration could be managed through an encapsulated Configuration class, allowing for flexible and dynamic adjustments to the agent's behaviour. This class defines key parameters such as the system prompt, the language model to be used, and other customizable settings. For instance, we can set the system_prompt Configuration class to set the context for the agent's interactions, ormodel to specify the language model (e.g., "openai/gpt-4o").
"""Define the configurable parameters for the agent."""
from __future__ import annotations
from dataclasses import dataclass, field, fields
from typing import Annotated
from app.agent import prompts
from langchain_core.runnables import ensure_config
from langgraph.config import get_config
@dataclass(kw_only=True)
class Configuration:
"""The configuration for the agent."""
system_prompt: str = field(
default=prompts.SYSTEM_PROMPT,
metadata={
"description": "The system prompt to use for the agent's interactions. "
"This prompt sets the context and behavior for the agent."
},
)
model: Annotated[str, {"__template_metadata__": {"kind": "llm"}}] = field(
default="openai/gpt-4o",
metadata={
"description": "The name of the language model to use for the agent's main interactions. "
"Should be in the form: provider/model-name."
},
)
max_search_results: int = field(
default=10,
metadata={"description": "The maximum number of search results to return for each search query."},
)
@classmethod
def from_context(cls) -> Configuration:
"""Create a Configuration instance from a RunnableConfig object."""
try:
config = get_config()
except RuntimeError:
config = None
config = ensure_config(config)
configurable = config.get("configurable") or {}
_fields = {f.name for f in fields(cls) if f.init}
return cls(**{k: v for k, v in configurable.items() if k in _fields})
Agent State Management
The agent’s state is managed using LangGraph’s state management capabilities. The agent’s state is defined through a series of dataclasses that encapsulate the necessary information for processing queries:
from dataclasses import dataclass, field
from typing import Optional, Sequence, Dict, List
from langgraph.graph import add_messages, RemainingSteps, IsLastStep
from langgraph.graph.message import AnyMessage
@dataclass
class InputState:
"""Defines the input state for the agent, representing a narrower interface to the outside world."""
messages: Annotated[Sequence[AnyMessage], add_messages] = field(default_factory=list)
@dataclass
class AgentState(InputState):
remaining_steps: RemainingSteps = 25
is_last_step: IsLastStep = field(default=False)
progress: Optional[str] = None
def items(self):
return self.__dict__.items()
def __getitem__(self, key):
return getattr(self, key)
def get(self, key, default=None):
return getattr(self, key, default)
@dataclass
class SQLAgentState(AgentState):
"""Extended state for SQL agent with query tracking."""
last_query: Optional[str] = None
query_attempts: int = 0
schema: Optional[Dict[str, List[str]]] = None
These classes define the structure of the agent’s state, including the conversation history (messages), control flags (remaining_steps, is_last_step), and SQL-specific information (last_query, query_attempts, schema). This structured approach enables the agent to maintain context across interactions and manage its operations effectively.
Agent Tools
Tools are modular functions that an LLM can invoke to perform specific tasks beyond its native capabilities. With these tools, the AI agent can interact with external systems, such as databases or APIs, allowing for dynamic and context-aware operations. By defining tools, developers can extend the agent’s functionality in a controlled and reusable manner.
For the SQL querying agent, two primary tools are defined: get_schema and run_query.
- get_schema: This tool retrieves the database schema, providing the agent with a structured overview of the available tables and their columns. By understanding the schema, the agent can formulate accurate SQL queries that align with the database structure.
- run_query: This tool executes SQL queries against the database and returns the results. It includes retry logic to handle transient errors, ensuring robustness in query execution.
class SQLiteDatabase:
def __init__(self, db_path: Path):
self.db_path = db_path
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def execute_query(self, query: str) -> pd.DataFrame:
"""Execute a SQL query with retry logic."""
try:
with sqlite3.connect(self.db_path) as conn:
return pd.read_sql_query(query, conn)
except sqlite3.Error as e:
raise Exception(f"Database error: {str(e)}")
except Exception as e:
raise Exception(f"Unexpected error: {str(e)}")
def get_schema(self) -> Dict[str, List[str]]:
"""Get the database schema."""
schema = {}
with sqlite3.connect(self.db_path) as conn:
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cursor.fetchall()
for table in tables:
table_name = table[0]
cursor.execute(f"PRAGMA table_info({table_name});")
columns = cursor.fetchall()
schema[table_name] = [col[1] for col in columns]
return schema
@tool(description="Get the database schema", return_direct=False)
async def get_schema(
tool_call_id: Annotated[str, InjectedToolCallId],
state: Annotated[Any, InjectedState],
) -> str:
"""Get the database schema."""
schema = db.get_schema()
return json.dumps(schema, indent=2)
@tool(description="Run a query on the database", return_direct=True)
async def run_query(
tool_call_id: Annotated[str, InjectedToolCallId],
state: Annotated[Any, InjectedState],
config: RunnableConfig,
query: str,
) -> str:
"""Run a SQL query on the database with retry logic."""
await copilotkit_emit_state(config, {"progress": "Running query..."})
try:
result = db.execute_query(query)
return result.to_json(orient="records")
except Exception as e:
return f"Error executing query: {str(e)}"
TOOLS: List[Callable[..., Any]] = [get_schema, run_query]
Notice how the copilotkit_emit_state() function is called within tools like run_query() to provide progress updates, such as "Running query...". This will be picked up by the chatbot UI to inform the user of the current status of agent.
4. Frontend Implementation
I used Next.js and React to craft the frontend of Natural Language Data Querying Agent. This standard setup provides a common foundation for building interactive web applications. At the heart, we have CopilotKit, an open-source framework that really simplifies the addition of AI agents into React applications.
CopilotKit Integration for AI-Powered UI
CopilotKit offers a suite of React components and hooks designed to embed AI functionalities into your application. We can use<CopilotChat /> and <CopilotSidebar /> to introduce compelling conversation UIs that allow users to interact with the AI agent directly within the app.
One of CopilotKit’s standout features (that I like very much) is its ability to maintain real-time synchronization between the AI agent’s internal state and the user interface. This becomes important when we want to keep users informed about the agent’s current activities, such as performing a web search or executing a SQL command. The immediate feedback on the agent’s status improves transparency into system’s operation and gain trust and engagement in user.
Moreover, CopilotKit supports human-in-the-loop (HITL) workflows by allowing the agent to pause and request user input at critical decision points. For instance, if the agent faces ambiguous request or requires confirmation before proceeding with a significant action, it can prompt the user for guidance. This approach ensures that AI system follows user intention and adapt to complex scenarios.
5. Practical Examples: Interacting with Insight-Copilot
In this section, we’ll test the interactive analytics dashboard powered by an agentic backend, using the Sakila sample database as our data source.
Dataset Overview: The Sakila Sample Database
The Sakila sample database is a well-structured schema designed to represent a fictional DVD rental store. It includes tables such as film, film_category, actor, customer, rental, payment, and inventory, among others. This dataset is widely used for educational purposes, and I downloaded it from Kaggle in SQLite format. Below you can see a few metrics I extracted from the dataset and displayed in the dashboard. To support the dashboard’s visualizations, I used FastAPI to develop specific API endpoints. These endpoints handle data aggregation and processing to serve the necessary information for each chart component. For instance, to display total sales by region, an endpoint aggregates payment data grouped by customer location.

Natural Language Query Examples
Let’s test out our agent using a few examples from simple to difficult:
- What is the total revenue generated so far?
This is a simple yet effective question validating that the pipeline is working. Here is the screenshot of our first conversation with AI agent.

We can also test the pipeline and gain clear visibility into exactly what our agent was doing, step by step, through LangGraph’s handy Studio. Studio is a specialized visual interface that enables visualization, interaction, and debugging of agentic systems. For the studio to work correctly, LANGSMITH_API_KEY needs to be set. We can start the LangGraph API server locally using this command:
langgraph dev --allow-blocking
Studio offers an interface for visualizing and debugging the internal reasoning processes of AI agents.
Using LangGraph Studio, we can visualize and debug the internal reasoning processes of AI agents. For example, when a user asks a question like “What is the total revenue generated so far? “, we can trace each step the agent takes to arrive at an answer.
Initially, the agent invokes the get_schema tool to retrieve the database schema, gaining insight into available tables, columns, and their relationships. This understanding helps the agent to create a relevant SQL query.
Next, the agent formulates the SQL command and utilizes the run_query tool to execute it against the database. After the raw data is obtained, the agent processes the results and generates a coherent response to present to the user.

- What was the total sales amount on July 1st, 2005?

- Which film has recorded the highest revenue to date?

7. Future works
Looking ahead, we can refine the capabilities of this application and deploy it to the cloud to enable scalability and accessibility. The codebase includes built-in cloud deployment capabilities using Docker containerization and AWS services, including Amazon Elastic Container Registry (ECR) for secure storage of Docker images and Amazon Elastic Container Service (ECS) for orchestrating container deployment.
8. Conclusion
In this tutorial, we reviewed how to build a natural language data querying agent to make data analysis more accessible to non-technical users. We explored the integration of open-source tools like LangGraph and CopilotKit and demonstrated how to translate user-friendly prompts into actionable SQL queries. However, we have to keep in mind the current limitations and areas for improvement in such systems.
One remaining significant challenge is to ensure the accuracy of AI-generated SQL commands. Even with advanced models, we can still see there’s a risk of generating syntactically correct but semantically incorrect queries. Here are a few things I learned to implement to make the AI agent robust:
- Ensure to add error handling logic for the AI agent to understand the reason behind failure.
- Implement fallback strategies to mitigate these issues.
Additionally, prompt engineering remains a vital aspect of optimizing agent performance.
🙌 Enjoyed the Read?
- Clap for this post to let me know you liked it.
- Follow me to stay updated on future tutorials and insights into AI, data analytics, and more.
- Share it with friends or colleagues who might benefit from building their own AI-powered data querying agents.
For those interested in exploring this project further, the codebase is available for experimentation and customization: https://github.com/al-mz/insight-copilot
Resources for Further Learning:
- LangGraph Documentation
- CopilotKit Documentation
- Prompt Engineering Best Practices
- Building AI Agents: Lessons Learned by Patrick Dougherty
How to Build a Natural Language Data Querying Agent with A Production-Ready Co-Pilot was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Ali.Mz
Ali.Mz | Sciencx (2025-05-20T12:04:32+00:00) How to Build a Natural Language Data Querying Agent with A Production-Ready Co-Pilot. Retrieved from https://www.scien.cx/2025/05/20/how-to-build-a-natural-language-data-querying-agent-with-a-production-ready-co-pilot/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.