Data Science How-To's

How to Build Chatbots With LangChain

This is a guest post from Dido Grigorov, a deep learning engineer and Python programmer with 17 years of experience in the field.

Chatbots have evolved far beyond simple question-and-answer tools. With the power of large language models (LLMs), they can understand the context of conversations and generate human-like responses, making them invaluable for customer support applications and other types of virtual assistance. 

LangChain, an open-source framework, streamlines the process of building these conversational chatbots by providing tools for seamless model integration, context management, and prompt engineering.

In this blog post, we’ll explore how LangChain works and how chatbots interact with LLMs. We’ll also guide you step by step through building a context-aware chatbot that delivers accurate, relevant responses using LangChain and GPT-3.

What are the chatbots in the realm of LLMs?

Chatbots in the field of LLMs are cutting-edge software that simulate human-like conversations with users through text or voice interfaces. These chatbots exploit the advanced capabilities of LLMs, which are neural networks trained on huge amounts of text data which allows them to produce human-like responses to a wide range of input prompts.

One among all other matters is that LLM-based chatbots can take a conversation’s context into account when generating a response. This means they can keep coherence across several exchanges and can process complex queries to produce outputs that are in line with the users’ intentions. Additionally, these chatbots assess the emotional tone of a user’s input and adjust their responses to match the user’s sentiments.

Chatbots are highly adaptable and personalized. They learn from how users interact with them thus improving on their responses by adjusting them according to individual preferences and needs. 

What is LangChain?

LangChain is a framework that’s open-source developed for creating apps that use large language models (LLMs). It comes with tools and abstractions to better personalize the information produced from these models while maintaining accuracy and relevance. 

One common term you can see when you read about LLMs is “prompt chains”. A prompt chain refers to a sequence of prompts or instructions used in the context of artificial intelligence and machine learning, with the purpose to guide the AI model through a multi-step process to generate more accurate, detailed, or refined outputs. This method can be employed for various tasks, such as writing, problem-solving, or generating code. 

Developers can create new prompt chains using LangChain, which is one of the strongest sides of the framework. They can even modify existing prompt templates without needing to train the model again when using new datasets.

How does LangChain work?

LangChain is a framework designed to simplify the development of applications that utilize language models. It offers a suite of tools that help developers efficiently build and manage applications that involve natural language processing (NLP) and Large Language Models. By defining the steps needed to achieve the desired outcome (this might be a chatbot, task automation, virtual assistant, customer support, and even more), developers can adapt language models flexibly to specific business contexts using LangChain. 

Here’s a high-level overview of how LangChain works.

Model integration

LangChain supports various Language models including those from OpenAI, Hugging Face, Cohere, Anyscale, Azure Models, Databricks, Ollama, Llama, GPT4All, Spacy, Pinecone, AWS Bedrock, MistralAI, among others. Developers can easily switch between different models or use multiple models in one application. They can build custom-developed model integration solutions, which allow developers to take advantage of specific capabilities tailored to their specific applications.

Chains

The core concept of LangChain is chains, which bring together different AI components for context-aware responses. A chain represents a set of automated actions between a user prompt and the final model output. There are two types of chains provided by LangChain:

  • Sequential chains: These chains enable the output of a model or function to be used as an input for another one. This is particularly helpful in making multi-step processes that depend on each other.
  • Parallel chains: It allows for simultaneous running of multiple tasks, with their outputs merged at the end. This makes it perfect for doing tasks that can be divided into subtasks that are completely independent.

Memory

LangChain facilitates the storage and retrieval of information across various interactions. This is essential where there is need for persistence of context such as with chat-bots or interactive agents. There are also two types of memory provided:

  • Short-term memory – Helps keep track of recent sessions.
  • Long-term memory – Allows retention of information from previous sessions enhancing system recall capability on past chats and user preferences.

Tools and utilities

LangChain provides many tools, but the most used ones are Prompt Engineering, Data Loaders and Evaluators.  When it comes to Prompt Engineering, LangChain contains utilities to develop good prompts, which are very important in getting the best responses from language models.

If you want to load up files like csv, pdf or other format, Data Loaders are here to help you to load and pre-process different types of data hence making them usable in model interactions.

Evaluation is an essential part of working with machine learning models and large language models. That’s why LangChain provides Evaluators – tools used for testing language models and chains so that generated results meet the required criteria, which might include:

Datasets criteria:

  • Manually curated examples: Start with high-quality, diverse inputs.
  • Historical logs: Use real user data and feedback.
  • Synthetic data: Generate examples based on initial data.

Types of evaluations:

  • Human: Manual scoring and feedback.
  • Heuristic: Rule-based functions, both reference-free and reference-based.
  • LLM-as-judge: LLMs score outputs based on encoded criteria.
  • Pairwise: Compare two outputs to pick the better one.

Application evaluations:

  • Unit tests: Quick, heuristic-based checks.
  • Regression testing: Measure performance changes over time.
  • Back-testing: Re-run production data on new versions.
  • Online evaluation: Evaluate in real-time, often for guardrails and classifications.

Agents
LangChain agents are essentially autonomous entities that leverage LLMs to interact with users, perform tasks, and make decisions based on natural language inputs.

Action-driven agents use language models to decide on optimal actions for predefined tasks. On the other side interactive agents or interactive applications such as chatbots make use of these agents, which also take into account user input and stored memory when responding to queries.

How do chatbots work with LLMs?

LLMs underlying chatbots use Natural Language Understanding (NLU) and Natural Language Generation (NLG), which are made possible through pre-training of models on vast textual data.

Natural Language Understanding (NLU)

  • Context awareness: LLMs can understand the subtlety and allusions in a conversation, and they can keep track of the conversation from one turn to the next. This makes it possible for the chatbots to generate logical and contextually appropriate responses to the clients.
  • Intent recognition: These models should be capable of understanding the user’s intent from their queries, whether the language is very specific or quite general. They can discern what the user wants to achieve and determine the best way to help them reach that goal.
  • Sentiment analysis: Chatbots can determine the emotion of the user through the tone of language used and adapt to the user’s emotional state, which increases the engagement of the user.

Natural Language Generation (NLG)

  • Response generation: When LLMs are asked questions, the responses they provide are correct both in terms of grammar and the context. This is because the responses that are produced by these models mimic human communication, due to the training of the models on vast amounts of natural language textual data.
  • Creativity and flexibility: Apart from simple answers, LLM-based chatbots can tell a story, create a poem, or provide a detailed description of a specific technical issue and, therefore, can be considered to be very flexible in terms of the provided material.

Personalization and adaptability

  • Learning from interactions: Chatbots make the interaction personalized because they have the ability to learn from the users’ behavior, as well as from their choices. It can be said that it is constantly learning, thereby making the chatbot more effective and precise in answering questions.
  • Adaptation to different domains: The LLMs can be tuned to particular areas or specialties that allow the chatbots to perform as subject matter experts in customer relations, technical support, or the healthcare domain.

LLMs are capable of understanding and generating text in multiple languages, making them suitable for applications in diverse linguistic contexts.

Building your own chatbot with LangChain in five steps

This project aims to build a chatbot that leverages GPT-3 to search for answers within documents. First, we scrape content from online articles, split them into small chunks, compute their embeddings, and store them in Deep Lake. Then, we use a user query to retrieve the most relevant chunks from Deep Lake, which are incorporated into a prompt for generating the final answer with the LLM.

It’s important to note that using LLMs carries a risk of generating hallucinations or false information. While this may be unacceptable for many customer support scenarios, the chatbot can still be valuable for assisting operators in drafting answers that they can verify before sending to users.

Next, we’ll explore how to manage conversations with GPT-3 and provide examples to demonstrate the effectiveness of this workflow

Step 1: Project creation, prerequisites, and required library installation

First create your PyCharm project for the chatbot. Open up Pycharm and click on “new project”. Then give a name of your project.

Create a PyCharm project

Once ready with the project set up, generate your `OPENAI_API_KEY` on the OpenAI API Platform Website, once you are logged in (or sign up on the OpenAI website for that purpose). To do that go to the “API Keys” section on the left navigation menu and then click on the button “+Create new secret key”. Don’t forget to copy your key.

After that get your `ACTIVELOOP_TOKEN` by signing up on the Activeloop website. Once logged in, just click on the button “Create API Token” and you’ll be navigated to the token creation page. Copy this token as well.

Once you have both the token and the key, open your configuration settings in PyCharm, by clicking on the 3 dots button next to the run and debug buttons, and choose “Edit”. You should see the following window:

Run/debug configurations in PyCharm

Now locate the field “Environment variables” and find the icon on the right side of the field. Then click there – you’ll see the following window:

Environment variables in PyCharm

And now by clicking the + button start adding your environmental variables and be careful with their names. They should be the same as mentioned above: `OPENAI_API_KEY` and `ACTIVELOOP_TOKEN`. When ready just click OK on the first window and then “Apply” and “OK” on the second one.

That’s a very big advantage of PyCharm and I very much love it, because it handles the environment variables for us automatically without the requirement for additional calls to them, allowing us to think more about the creative part of the code.

Note: ActiveLoop is a technology company that focuses on developing data infrastructure and tools for machine learning and artificial intelligence. The company aims to streamline the process of managing, storing, and processing large-scale datasets, particularly for deep learning and other AI applications.

DeepLake is an ActiveLoop’s flagship product. It provides efficient data storage, management, and access capabilities, optimized for large-scale datasets often used in AI.

Install the required libraries

We’ll use the `SeleniumURLLoader` class from LangChain, which relies on the `unstructured` and `selenium` Python libraries. Install these using pip.  It is recommended to install the latest version, although the code has been specifically tested with version 0.7.7. 

To do that use the following command in your PyCharm terminal:

pip install unstructured selenium
Install Selenium

Now we need to install langchain, deeplake and openai. To do that just use this command in your terminal (same window you used for Selenium) and wait a bit until everything is successfully installed:

pip install langchain==0.0.208 deeplake openai==0.27.8 psutil tiktoken

To make sure all libraries are properly installed, just add the following lines needed for our chatbot app and click on the Run button:

from langchain.embeddings.openai import OpenAIEmbeddings

from langchain.vectorstores import DeepLake

from langchain.text_splitter import CharacterTextSplitter

from langchain import OpenAI

from langchain.document_loaders import SeleniumURLLoader

from langchain import PromptTemplate

Another way to install your libraries is through the settings of PyCharm. Open them and go to the section Project -> Project Interpreter. Then locate the + button, search for your package and hit the button “Install Package”. Once ready, close it, and on the next window click “Apply” and then “OK”.

Python interpreter in PyCharm

Step 2: Splitting content into chunks and computing their embeddings

As previously mentioned, our chatbot will “communicate” with content coming out of online articles, that’s why I picked Digitaltrends.com as my source of data and selected 8 articles to start. All of them are organized into a Python list and assigned to a variable called “articles”.

articles = ['https://www.digitaltrends.com/computing/claude-sonnet-vs-gpt-4o-comparison/',
           'https://www.digitaltrends.com/computing/apple-intelligence-proves-that-macbooks-need-something-more/',
           'https://www.digitaltrends.com/computing/how-to-use-openai-chatgpt-text-generation-chatbot/',
           'https://www.digitaltrends.com/computing/character-ai-how-to-use/',
           'https://www.digitaltrends.com/computing/how-to-upload-pdf-to-chatgpt/']

We load the documents from the provided URLs and split them into chunks using the `CharacterTextSplitter` with a chunk size of 1000 and no overlap:

# Use the selenium to load the documents
loader = SeleniumURLLoader(urls=articles)
docs_not_splitted = loader.load()

# Split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs_not_splitted)

If you run the code till now you should receive the following output, if everything works well:

[Document(page_content="techcrunch\n\ntechcrunch\n\nWe, TechCrunch, are part of the Yahoo family of brandsThe sites and apps that we own and operate, including Yahoo and AOL, and our digital advertising service, Yahoo Advertising.Yahoo family of brands.\n\n    When you use our sites and apps, we use \n\nCookiesCookies (including similar technologies such as web storage) allow the operators of websites and apps to store and read information from your device. Learn more in our cookie policy.cookies to:\n\nprovide our sites and apps to you\n\nauthenticate users, apply security measures, and prevent spam and abuse, and\n\nmeasure your use of our sites and apps\n\n    If you click '", metadata={'source': ……………]

Next, we generate the embeddings using OpenAIEmbeddings and save them in a DeepLake vector store hosted in the cloud. Ideally, in a production environment, we could upload an entire website or course lesson to a DeepLake dataset, enabling searches across thousands or even millions of documents. 

By leveraging a serverless Deep Lake dataset in the cloud, applications from various locations can seamlessly access a centralized dataset without the necessity of setting up a vector store on a dedicated machine.

Why do we need embeddings and documents in chunks?

When building chatbots with Langchain, embeddings and chunking documents are essential for several reasons that relate to the efficiency, accuracy, and performance of the chatbot.

Embeddings are vector representations of text (words, sentences, paragraphs, or documents) that capture semantic meaning. They encapsulate the context and meaning of words in a numerical form. This allows the chatbot to understand and generate responses that are contextually appropriate by capturing nuances, synonyms, and relationships between words.

Thanks to the embeddings, the chatbot can also quickly identify and retrieve the most relevant responses or information from a knowledge base, because they allow matching user queries with the most semantically relevant chunks of information, even if the wording differs.

Chunking, on the other side, involves dividing large documents into smaller, manageable pieces or chunks. Smaller chunks are faster to process and analyze compared to large, monolithic documents. This results in quicker response times from the chatbot.

Document chunking helps also with the relevancy of the output, because when a user asks a question, it is often only in a specific part of a document. Chunking allows the system to pinpoint and retrieve just the relevant sections and the chatbot can provide more precise and accurate answers.

Now let’s get back to our application and let’s update the following code by including your Activeloop organization ID. Keep in mind that, by default, your organization ID is the same as your username.

# TODO: use your organization id here. (by default, org id is your username)
my_activeloop_org_id = "didogrigorov"
my_activeloop_dataset_name = "jetbrains_article_dataset"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)


# add documents to our Deep Lake dataset
db.add_documents(docs)

Another great feature of PyCharm I love is the option TODO notes to be added directly in Python comments. Once you type TODO with capital letters, all notes go to a section of PyCharm where you can see them all:

# TODO: use your organization id here. (by default, org id is your username)

You can click on them and PyCharm directly shows you where they are in your code. I find it very convenient for developers and use it all the time:

TODO notes in Python comments in PyCharm

If you execute the code till now you should see the following output, if everything works normal:

Chatbot code execution in PyCharm

To find the most similar chunks to a given query, we can utilize the similarity_search method provided by the Deep Lake vector store:

# Check the top relevant documents to a specific query
query = "how to check disk usage in linux?"
docs = db.similarity_search(query)
print(docs[0].page_content)

Step 3: Let’s build the prompt for GPT-3

We will design a prompt template that integrates role-prompting, pertinent Knowledge Base data, and the user’s inquiry. This template establishes the chatbot’s persona as an outstanding customer support agent. It accepts two input variables: chunks_formatted, containing the pre-formatted excerpts from articles, and query, representing the customer’s question. The goal is to produce a precise response solely based on the given chunks, avoiding any fabricated or incorrect information.

Step 4: Building the chatbot functionality

To generate a response, we begin by retrieving the top-k (e.g., top-3) chunks that are most similar to the user’s query. These chunks are then formatted into a prompt, which is sent to the GPT-3 model with a temperature setting of 0.

# user question
query = "How to check disk usage in linux?"

# retrieve relevant chunks
docs = db.similarity_search(query)
retrieved_chunks = [doc.page_content for doc in docs]

# format the prompt
chunks_formatted = "\n\n".join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query)

# generate answer
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
answer = llm(prompt_formatted)
print(answer)

If everything works fine, your output should be:

To upload a PDF to ChatGPT, first log into the website and click the paperclip icon next to the text input field. Then, select the PDF from your local hard drive, Google Drive, or Microsoft OneDrive. Once attached, type your query or question into the prompt field and click the upload button. Give the system time to analyze the PDF and provide you with a response.

Step 5: Build conversational history

# Create conversational memory
memory = ConversationBufferMemory(memory_key="chat_history", input_key="input")

# Define a prompt template that includes memory
template = """You are an exceptional customer support chatbot that gently answers questions.

{chat_history}

You know the following context information.

{chunks_formatted}

Answer the following question from a customer. Use only information from the previous context information. Do not invent stuff.

Question: {input}

Answer:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "chunks_formatted", "input"],
    template=template,
)

# Initialize the OpenAI model
llm = OpenAI(openai_api_key="YOUR API KEY", model="gpt-3.5-turbo-instruct", temperature=0)

# Create the LLMChain with memory
chain = LLMChain(
    llm=llm,
    prompt=prompt,
    memory=memory
)

# User query
query = "What was the 5th point about on the question how to remove spotify account?"

# Retrieve relevant chunks
docs = db.similarity_search(query)
retrieved_chunks = [doc.page_content for doc in docs]

# Format the chunks for the prompt
chunks_formatted = "\n\n".join(retrieved_chunks)

# Prepare the input for the chain
input_data = {
    "input": query,
    "chunks_formatted": chunks_formatted,
    "chat_history": memory.buffer
}

# Simulate a conversation
response = chain.predict(**input_data)

print(response)

Let’s walk through the code in a more conversational manner.

To start with, we set up a conversational memory using `ConversationBufferMemory`. This allows our chatbot to remember the ongoing chat history, using `input_key=”input”` to manage the incoming user inputs.

Next, we design a prompt template. This template is like a script for the chatbot, including sections for chat history, the chunks of information we’ve gathered, and the current user question (input). This structure helps the chatbot know exactly what context it has and what question it needs to answer.

Then, we move on to initializing our language model chain, or `LLMChain`. Think of this as assembling the components: we take our prompt template, the language model, and the memory we set up earlier, and combine them into a single workflow.

When it’s time to handle a user query, we prepare the input. This involves creating a dictionary that includes the user’s question (`input`) and the relevant information chunks (`chunks_formatted`). This setup ensures that the chatbot has all the details it needs to craft a well-informed response.

Finally, we generate a response. We call the `chain.predict` method, passing in our prepared input data. The method processes this input through the workflow we’ve built, and out comes the chatbot’s answer, which we then display.

This approach allows our chatbot to maintain a smooth, informed conversation, remembering past interactions and providing relevant answers based on the context.

Another favorite trick with PyCharm that helped me a lot to build this functionality was the opportunity to put my cursor over a method, to hit the key “CTRL” and click on it.

In conclusion

GPT-3 excels at creating conversational chatbots capable of answering specific questions based on contextual information provided in the prompt. However, ensuring the model generates answers solely based on this context can be challenging, as it often tends to hallucinate (i.e., generate new, potentially false information). The impact of such false information varies depending on the use case.

In summary, we developed a context-aware question-answering system using LangChain, following the provided code and strategies. The process included splitting documents into chunks, computing their embeddings, implementing a retriever to find similar chunks, crafting a prompt for GPT-3, and using the GPT-3 model for text generation. This approach showcases the potential of leveraging GPT-3 to create powerful and contextually accurate chatbots while also emphasizing the importance of being vigilant about the risk of generating false information.

About the author

Dido Grigorov

Dido Grigorov

Dido is a seasoned Deep Learning Engineer and Python programmer with an impressive 17 years of experience in the field. He is currently pursuing advanced studies at the prestigious Stanford University, where he is enrolled in a cutting-edge AI program, led by renowned experts such as Andrew Ng, Christopher Manning, Fei-Fei Li and Chelsea Finn, providing Dido with unparalleled insights and mentorship.

Dido’s passion for Artificial Intelligence is evident in his dedication to both work and experimentation. Over the years, he has developed a deep expertise in designing, implementing, and optimizing machine learning models. His proficiency in Python has enabled him to tackle complex problems and contribute to innovative AI solutions across various domains.

image description