I Built A Q&A Bot For Websites With LangChain

Vector Stores, Similarity Searches, ChatGPT LLMs, Automatic Web ScrapingRecently, I found myself regularly going to travel news websites to hunt deals. For example, I want to find out if there are Delta Airlines flash sales or Hilton Hotel point sales….


This content originally appeared on Level Up Coding - Medium and was authored by Irtiza Hafiz

Vector Stores, Similarity Searches, ChatGPT LLMs, Automatic Web Scraping

Recently, I found myself regularly going to travel news websites to hunt deals. For example, I want to find out if there are Delta Airlines flash sales or Hilton Hotel point sales.

To make this process easier, I decided to build a simple Q&A bot with LangChain and OpenAI’s LLM model.

With only a few lines of code, I was able to build a very well functioning chatbot. I can only imagine how much better it will get after spending a few days fine-tuning it.

In this tutorial, I’ll show you how I built it. Here’s the basic idea:

  1. Grab content from a webpage using a loader.
  2. Store the content in a searchable vector store.
  3. Search for relevant documents based on a user’s question.
  4. Use OpenAI’s language model to answer the question.

Let’s break it down step-by-step and see how it all comes together!

If you don’t care about the explanation and want to peek at the code, check out my Github!

Step 1: Extracting Webpage Content

The first step is to extract content from a webpage. For this, we use the WebBaseLoader from LangChain.

Environment Setup

import os
from dotenv import load_dotenv

load_dotenv()
  • dotenv: Loads environment variables from a .env file, including the OpenAI API key.
  • os: Provides access to system environment variables.

Extracting the Webpage

from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document

page_url = "https://frequentmiler.com/ihg-giving-opportunity-to-buy-status-it-could-be-a-good-deal/"
loader = WebBaseLoader(web_paths=[page_url])
documents = []
for doc in loader.lazy_load():
documents.append(doc)
  • WebBaseLoader: Loads content from the specified URL.
  • documents: Stores the extracted webpage content as a list of Document objects.

Now, we have our documents ready for the next step.

Step 2: Storing and Retrieving Relevant Documents

To provide context for generating answers, we need to store the extracted documents in a vector store and retrieve the most relevant ones based on a user’s query.

Setting Up OpenAI

import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
  • OpenAI API Key: Retrieved from environment variables to authenticate requests.

Retrieving Relevant Documents

from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

def get_k_relevant_documents(documents, question, k=3):
print(f"Storing {len(documents)} into Vector Store.")
vector_store = InMemoryVectorStore.from_documents(documents, OpenAIEmbeddings())
print("Getting relevant documents from in-memory vector store.")
relevant_docs = vector_store.similarity_search(question, k=k)
print(f"Retrieved similar documents: {len(relevant_docs)}")
return relevant_docs
  • InMemoryVectorStore: Temporarily stores the documents and their embeddings in your computer’s memory (RAM). These embeddings, created by OpenAIEmbeddings, allow the system to perform similarity searches. However, the data is lost when the program stops. For persistent storage, you’d need to use options like FAISS or a database-backed store.
  • Similarity Search: Finds the top k documents most relevant to the user’s question.

Step 3: Generating Answers with Context

With the relevant documents in hand, we can use OpenAI’s language model to generate answers.

Querying the Language Model

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.output_parsers import StrOutputParser

def get_answer_from_llm(documents, question):
relevant_docs = get_k_relevant_documents(documents, question)
model = ChatOpenAI(model="gpt-4o-mini")
context_from_docs = "\n\n".join([doc.page_content for doc in relevant_docs])
messages = [
SystemMessage(
content=f"Use the following context to answer my question: {context_from_docs}"
),
HumanMessage(content=f"{question}"),
]
parser = StrOutputParser()
chain = model | parser
return chain.invoke(messages)
  • ChatOpenAI: A wrapper around OpenAI’s language model for conversational interactions.
  • Context Creation: Combines the content of relevant documents to provide context for answering the question.
  • Message Pipeline: Sends system and user messages to the model and processes the output with a parser.

Step 4: Putting It All Together

Finally, we integrate the pieces into a cohesive system.

from get_relevant_documents import get_answer_from_llm

answer = get_answer_from_llm(
documents=documents,
question="How much would buying diamond status cost me?",
)
print(answer)

Running the script will:

  1. Fetch content from the specified webpage.
  2. Search for relevant information in the extracted content.
  3. Generate and print the answer, such as: “Buying diamond status would cost approximately $X.”

Closing Thoughts

Okay folks, that’s all for today.

Again, if you want the full code, check out my Github.

If you have read it so far, thank you for your time. I hope you found it valuable.

If you want to stay connected, here are a few ways you can do so: follow me on Medium or check out my website.


I Built A Q&A Bot For Websites With LangChain was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Irtiza Hafiz


Print Share Comment Cite Upload Translate Updates
APA

Irtiza Hafiz | Sciencx (2025-01-15T16:49:58+00:00) I Built A Q&A Bot For Websites With LangChain. Retrieved from https://www.scien.cx/2025/01/15/i-built-a-qa-bot-for-websites-with-langchain/

MLA
" » I Built A Q&A Bot For Websites With LangChain." Irtiza Hafiz | Sciencx - Wednesday January 15, 2025, https://www.scien.cx/2025/01/15/i-built-a-qa-bot-for-websites-with-langchain/
HARVARD
Irtiza Hafiz | Sciencx Wednesday January 15, 2025 » I Built A Q&A Bot For Websites With LangChain., viewed ,<https://www.scien.cx/2025/01/15/i-built-a-qa-bot-for-websites-with-langchain/>
VANCOUVER
Irtiza Hafiz | Sciencx - » I Built A Q&A Bot For Websites With LangChain. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/15/i-built-a-qa-bot-for-websites-with-langchain/
CHICAGO
" » I Built A Q&A Bot For Websites With LangChain." Irtiza Hafiz | Sciencx - Accessed . https://www.scien.cx/2025/01/15/i-built-a-qa-bot-for-websites-with-langchain/
IEEE
" » I Built A Q&A Bot For Websites With LangChain." Irtiza Hafiz | Sciencx [Online]. Available: https://www.scien.cx/2025/01/15/i-built-a-qa-bot-for-websites-with-langchain/. [Accessed: ]
rf:citation
» I Built A Q&A Bot For Websites With LangChain | Irtiza Hafiz | Sciencx | https://www.scien.cx/2025/01/15/i-built-a-qa-bot-for-websites-with-langchain/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.