If you're intrigued by the advancements making learning and teaching a lot easier, you've landed at the right place. In this comprehensive guide, we'll unravel one of the ways in which we can generate questions for various branches of learning using LangChain and show you how they can be a game-changer in your academic life and career. Let make tutoring easy using LangChain, OpenAI and ChatGPT.
What is LangChain?
Before we dive head first into generating quick mock test papers with LangChain, let's take a step back and understand what it is.
LangChain is a creative AI application that aims to address the limitations of large language models (LLM) like GPT-3 and GPT-4. It allows AI developers to combine the LLMs with external data.
Why use LangChain?
As you may know, the GPT models are trained over large amounts of data, but up to a given period. For example, GPT-4 is trained with data up to September 2021, which can be a significant limitation. Although the model has a great level of knowledge, learning of current events, or connecting to custom data opens doors to many more levels of complexity.
This is where LangChain comes into the picture.
LangChain allows your LLM to refer your custom data, or some search engines to come up with its answers. This makes your GPT model access to information up-to-date regarding documents, website information, etc.
What is the aim?
We aim to generate mock questions for any given subject, along with any specifications we would like to add to give a better understanding of the types of questions we require the LLM to provide us. We also upload sample test papers so the LLM has a reference as to how it should generate the questions.
How does it help?
There are two sides to every coin. In this case, this will help both students of all levels as well as teaching professionals.
As a teacher or professor, you could just punch in the details of the subject for which you want to generate mock questions, OR as a student, you could prepare better by practicing answering questions and letting the LLM with LangChain do its magic!
Flow Diagram
The following diagram gives an overall idea of how our use case is achieved.
We upload documents (PDFs, TXTs) to our program, and give some test specifications that will be required to generate questions. The documents along with OpenAI’s API key are sent across to the test generator, where questions based on the given specifications and the reference documents are generated.
Generate mock test using LangChain, OpenAI and ChatGPT
There are a few LangChain components that will be used to accomplish what we have set out to do for this use case.
Models (LLM Wrappers)
Prompts
Chains
Tools and Agents
Document Loaders
Let's dive deep into each of the components along with a short code snippet to achieve our goal.
Environment
The first is to set up a Python environment. The requirements.txt will look something like this:
langchain
openai
streamlit
Running the below command will install all the packages specified on the requirements.txt file.
pip install –r requirements.txt
Store your OpenAI's API keys in your environment file so it can be loaded at the time of execution.
OpenAI's API keys can be found on its official website.
# Load environment variables
from dotenv import load_dotenv
Your .env file will look something like this
OPENAI_API_KEY=your API key
Now that our environment is ready, let's dive in!
Import packages
The following are the packages that will have to be imported to execute the code.
# The easy document loader for Text
from langchain.document_loaders import TextLoader
# Implementation of splitting text that looks at characters
from langchain.text_splitter import RecursiveCharacterTextSplitter
# generic, low and high level interfaces for creating temporary files and directories
import tempfile
# Schema to represent a prompt for an LLM
from langchain.prompts import PromptTemplate
# Wrapper around OpenAI Chat large language models
from langchain.chat_models import ChatOpenAI
# Load summarizing chain
from langchain.chains.summarize import load_summarize_chain
# Tool that takes in function or coroutine directly
from langchain.agents import Tool
# ZeroShotAgent - Agent for the MRKL chain. AgentExecutor - Consists of an agent using tools
from langchain.agents import ZeroShotAgent, AgentExecutor
# Chain to run queries against LLMs
from langchain import LLMChain
# StreamLit
import streamlit as st
Document Loaders
LangChain provides a way to efficiently retrieve information from multiple types of files like PDFs, Texts, etc. This is an example code snippet to extract data from multiple text files.
Text_data = []
for file_path in file_paths:
Text_data.extend(TextLoader(file_path).load())
text_content = '\n'.join(doc.page_content for doc in text_data)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=400)
docs = text_splitter.create_documents([text_content])
Here we use the Recursive character text splitter to split the combined content of all the uploaded documents into chunks of size 1000 that will be returned per document, and an overlap of 400 characters between the chunks.
Prompts
LangChain provides a fantastic component calledPromptTemplates. It allows you to change the prompts according to the user's inputs dynamically.
map_prompt = """
You are a helpful AI bot that aids a user in generating questions. Uploaded documents indicate the subject, difficulty level, and question format.
Your goal is to generate {question_type} from the uploaded documents.
{test_specification}
YOUR RESPONSE:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["question_type", "test_specification"])
Question type and test specifications can be modified in different ways to fit your requirement.
Chains
LangChain's chain is an effective functionality that lets you combine your PromptTemplates and LLMs like so:
llm = ChatOpenAI(temperature=0, openai_api_key=openai_api_key)
llm_chain = LLMChain(llm=llm, prompt=map_prompt_template)
summarizer = load_summarize_chain(llm=llm, chain_type="map_reduce", verbose=True)
Here, the openai_api_key is OpenAI's API key which can be obtained from the OpenAI website.
The llm_chain is a simple chain to run queries against the LLM. And the summariser loads a summarising chain. This chain will be helpful to summarise all the contents of the multiple files that will be considered to generate questions.
Tools and Agents
A tool is designed to execute a particular task. These tools can range from those who know how to do mathematics, to database lookups, to those who can connect to other services like Gmail, etc.
Here we will be creating a tool to summarise a set of documents as input.
description = "useful for when you need to generate questions similar to the format uploaded document content."
tools = [Tool(name="Uploaded documents",
func=summarizer.run,
description=description)]
Now, we initialise our agent.
An agent is an autonomous AI that takes input, completes the required set of tasks in a sequential manner until it reaches an answer.
In this use case, it uses a combination of functionalities such as tools that we saw above along with our LLM + prompt chain, effectively creating a proper AI app.
Let's look at the code to get a better understanding of how we can use agents.
agent = ZeroShotAgent(llm_chain=llm_chain,
tools=tools,
verbose=True)
agent_test_generator = AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
verbose=True)
# Execute the agent – example to test
test_generator_prompt = 'Generate 20 flashcards on subject Physics with short answers for a quiz'
agent_test_generator.run({"input_documents": docs,
"question_type": "quiz",
"test_specification":
test_generator_prompt})
This effectively makes a chatbot which is essentially an agent that chains together prompt, goes through the summary of the loaded documents, and generates questions accordingly.
Using StreamLit
And lastly, we come to render our program on the browser. For that, we will be making use of the streamlit package.
We will use the streamlit form to fill in some of the test details, create a prompt from it to pass to the agent, and the response will be displayed on the browser.
We define a simple function that generates a prompt from the form.
def get_test_generator_prompt(test_type: str, subject: str, num_of_questions: str, marks: str, test_description: str = None):
if test_type == "Quiz":
test_generator_prompt = f'Generate {num_of_questions} flashcards on subject {subject} with short answers for a quiz'
else:
test_generator_prompt = f'Generate sample question paper on subject {subject} containing {num_of_questions} number of questions for a total of {marks} marks'
if test_description:
test_generator_prompt += f'\n{test_description}'
return test_generator_prompt
We create different prompt statements for each test type.
We then move forward to our streamlit code.
In the above code, we are reading the OpenAI API key from the environment file, and we are using direct file paths to upload our documents.
While using the streamlit package, we can make an optional modification where we can prompt the user to enter the API key and upload documents through the browser itself!
openai_api_key = st.sidebar.text_input(
label="#### Your OpenAI API key 👇",
placeholder="Paste your openAI API key, sk-",
type="password")
file_paths = st.sidebar.file_uploader("Upload your files", type=['txt', "pdf"], accept_multiple_files=True)
Let’s move on with the test specification form. We use streamlit’s form to take in the test details.
st.title('🦜 Generate your mock test paper ')
with st.form(key='my_form', clear_on_submit=True):
test_type = st.radio("Test type", options=["Test paper", "Quiz"])
test_subject = st.text_input("Subject")
num_of_questions = st.slider("Number of questions", min_value=1, max_value=50)
marks = st.slider("Total marks", min_value=10, max_value=100)
test_description = st.text_area(label="Test Description",
placeholder="Do you have any extra specifications you would like to add?")
# Every form must have a submit button.
submit_button = st.form_submit_button("Submit")
if submit_button:
with st.spinner('Generating questions...'):
prompt = get_test_generator_prompt(test_type, test_subject,
num_of_questions, marks, test_description)
output = agent_test_generator.run({"input_documents": docs,
"question_type": test_type, "test_specification": prompt})
st.success('GENERATED QUESTIONS:', icon="✅")
st.success(output)
As already mentioned above, we take test type (mock question paper or quiz), subject, number of questions, total marks, and an optional test description.
On hitting the submit button, a prompt with the specifications should be generated which will be passed to the agent along with the split docs as inputs.
Once the agent completes its execution, the response is rendered on the browser.
Extended example
This is an example of a Test generator with LangChain agents, the inputs we give, and the response that is generated, which is rendered on the browser using StreamLit.
As we can see in the below image, we pass our OpenAI API key and upload the necessary reference question papers. Along with that, we add some test specifications on which we want to generate questions.
For this example, let’s say we want to generate a short Quiz on the subject “Data structures in C”. It is completely optional for us to add other requirements too.
For the specifications given, we generate a prompt that instructs the LLM to generate a mock test paper with 10 questions with a total of 50 marks with some additional details that should be considered.
LangChain’s agent uses OpenAI’s LLM, feeding it the summarised information of the uploaded documents and the generated prompt to give back a mock question paper.
Variations
What you saw here is just a fragment of what LangChain can do. For example;
We have used document loaders to get a summary of the documents. But you can explore other components like Embeddings and Vector stores to save and query the documents.
We have used only one OpenAI model in this use case. But there are many more models from which LangChain can perform many more tasks. Like image generation with Dall-E.
LangChain agents can be used purely as an AI friend. They can execute multiple types of tasks, based on the various helpful tools we have discussed, and many more.
There are various LangChain toolkits, which have very specific functionalities like reading a CSV file and querying from it or connecting to an SQL database and querying from a myriad of rows.
Take some time to explore the interesting landscape of AI that LangChain provides, and soon you will be able to put your ideas into action!
Conclusion
We hope this blog gave you a sneak peek at these new AI tools. In the booming world of AI, learning LangChain along with OpenAI can be a valuable skill set as a programmer, which can open possibilities in the latest developments.
Start experimenting with LangChain and create some very nifty AI projects!