Build a Document QA generator with Langchain and Streamlit

In this post, I will be explaining about with_structured_output api of langchain, which helps to output the response in the predefined output model.

At the end of this blog, we will be building a tool which generates a multiple choice questions along …


This content originally appeared on DEV Community and was authored by Sathish Panthagani

In this post, I will be explaining about with_structured_output api of langchain, which helps to output the response in the predefined output model.

At the end of this blog, we will be building a tool which generates a multiple choice questions along answers from the PDF uploaded. I find it is easy to host the solution on streamlit, it has one click deployment hosted on their community cloud and it is Free.

The demo app can be found here

Pre-requisites

  • Groq API key to access LLM models - you can get one from here

  • Streamlit account to host the application

  • Python experience

Tech Stack

  • Langchain - used to integrate LLM models

  • fitz - used to read the pdf file contents

Project Setup

  • Create a python virtual environment using venv with below command, read more about venv here

$ python -m venv .venv

  • activate virtual environment with below command

$ source <venv>/bin/activate

  • Create a file requirements.txt with below packages.
streamlit
streamlit-feedback
langchain
langchain-community
langchain-groq
  • Run pip install command to install the required packages.

pip install -r requirements.txt

Build Streamlit app

  • Create a file main.py and setup the basic streamlit app with below content
import streamlit as st
st.set_page_config(
    page_title="Document Questionnaire generator",
    page_icon="📚",
    layout="centered",
    initial_sidebar_state="auto",
)
st.title('Document Questionnaire generator')
st.caption("🚀 A Streamlit chatbot powered by Groq")
st.write('Generate multiple choice questionnaire from a pdf file uploaded')
  • Save file and run below command to run the streamlit app.
streamlit run main.py

initial streamlit app

  • Now add file upload control from streamlit to enable upload pdf feature.
file_bytes = st.file_uploader("Upload a PDF file", type=["pdf"])
Read the uploaded file and store the text content which we need to feed to LLM

import fitz

# Read the PDF
pages = []
pdf_document = fitz.open(stream=file_stream, filetype="pdf")
for page_num in range(len(pdf_document)):
    page = pdf_document.load_page(page_num)
    pages.append(page)

TEXT_CONTENT = " ".join(d.get_text() for d in pages)

Build LLM Model to generate question

  • Now we have the text content read from the pdf file, so we can go ahead with building LLM to generate Question answers from a prompt.

  • add below prompt to generate question and answers, we can also ask for number of questions generated.

# system prompt to generate questions
SYSTEM_PROMT =  """Generate {questions_count} multiple choice questions from the following text:
'''
{context}
'''
"""

SYSTEM_PROMT_FRMTD = SYSTEM_PROMT.format(context=TEXT_CONTENT, questions_count=NUM_QUESTIONS)
  • Invoke the LLM model from Groq,
from langchain_groq import ChatGroq

# Create a model
llm = ChatGroq(model="llama-3.1-70b-versatile", temperature=0, max_retries=2)
  • Since we would need a questions with answers we will be using with_structured_output api from langchain, this will output the results in a predefined structure so that it can be used for building an app.

  • Define the structure of the output model

# TypedDict
class MCQuestion(TypedDict):
    """Multiple choice question."""
    text: Annotated[str, ..., "The text of the question"]
    options: Annotated[list[str], ..., "The options for the question"]
    answer: Annotated[Optional[int], None, "the answer for the question from the options"]

class MultipleQuestion(TypedDict):
    """List of multiple choice questions."""
    questions: Annotated[list[MCQuestion], ..., "List of multiple choice questions"]
  • And then invoke LLM model with the text content from PDF.
# Cache the data so that it wont trigger the model to generate questions again
@st.cache_data
def call_llm(content: str) -> dict:
    """Call the model to generate questions."""
    # Generate questions
    structured_llm = llm.with_structured_output(MultipleQuestion)
    return structured_llm.invoke([SystemMessage(content=content)])


response = call_llm(SYSTEM_PROMT_FRMTD)
  • store questions to streamlit session state as streamlit run entire code for every change and we might lost the data.
st.session_state.questions = response.get("questions")
  • Once the questions are received from LLM output it can be presented in a multiple choice to the user and we can restrict user only after answering the question, once all questions are answered we can show a congratulations message.
for question in st.session_state.questions:
    answer_selected = st.radio(
        options=question.get("options"),
        label=question.get("text"),
        index=None)

    if not answer_selected:
        st.text('Please select an answer')
        st.stop()
    else:
        if question.get('options')[question.get('answer')] == answer_selected:
            st.text(f"✅ Correct! The answer is {answer_selected}")
        else:
            st.text('❌ Incorrect answer, please try again')
            st.stop()

st.balloons()
st.success('Congratulations! You have completed the questionnaire', icon="✔")

Once it tested working, you can push the code changes to GitHub.

In Streamlit, you have an option to deploy them by selecting the repository from GitHub and the starter file. Read more about Streamlit here

Thanks for reading and hope you enjoyed the article. You can find the code from Github which i have used in this article.

GitHub logo sathish39893 / streamlit-apps

streamlit apps built using generative ai llms

streamlit-apps

streamlit apps built using generative ai llms







This content originally appeared on DEV Community and was authored by Sathish Panthagani


Print Share Comment Cite Upload Translate Updates
APA

Sathish Panthagani | Sciencx (2025-01-22T05:15:56+00:00) Build a Document QA generator with Langchain and Streamlit. Retrieved from https://www.scien.cx/2025/01/22/build-a-document-qa-generator-with-langchain-and-streamlit/

MLA
" » Build a Document QA generator with Langchain and Streamlit." Sathish Panthagani | Sciencx - Wednesday January 22, 2025, https://www.scien.cx/2025/01/22/build-a-document-qa-generator-with-langchain-and-streamlit/
HARVARD
Sathish Panthagani | Sciencx Wednesday January 22, 2025 » Build a Document QA generator with Langchain and Streamlit., viewed ,<https://www.scien.cx/2025/01/22/build-a-document-qa-generator-with-langchain-and-streamlit/>
VANCOUVER
Sathish Panthagani | Sciencx - » Build a Document QA generator with Langchain and Streamlit. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/22/build-a-document-qa-generator-with-langchain-and-streamlit/
CHICAGO
" » Build a Document QA generator with Langchain and Streamlit." Sathish Panthagani | Sciencx - Accessed . https://www.scien.cx/2025/01/22/build-a-document-qa-generator-with-langchain-and-streamlit/
IEEE
" » Build a Document QA generator with Langchain and Streamlit." Sathish Panthagani | Sciencx [Online]. Available: https://www.scien.cx/2025/01/22/build-a-document-qa-generator-with-langchain-and-streamlit/. [Accessed: ]
rf:citation
» Build a Document QA generator with Langchain and Streamlit | Sathish Panthagani | Sciencx | https://www.scien.cx/2025/01/22/build-a-document-qa-generator-with-langchain-and-streamlit/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.