Chat with Your Data: A Simple Guide Using Amazon Bedrock, LangChain, and Streamlit

Segment your data into chunks and store them in a Vector database. Utilise Amazon Bedrock’s embeddings protocol using Langchain. Build a Streamlit website to chat with your data.

In my previous article, I introduced you to Amazon Bedrock using Anthropic’s Claude model and guided you through building a basic Q&A application. Now, in this follow-up, I’ll outline the steps to:

  1. Segment your data into chunks and store them in a Vector database (Pinecone)
  2. Utilise Amazon Bedrock’s embeddings protocol using Langchain.
  3. Build a Streamlit website to chat with your data.

The scripts below are displayed in a linear manner and can form either a Jupyter notebook or a Python script.

This will be a relatively high-level tutorial and I will not be explaining individual functions or the inner workings of LangChain to the Nth degree. Some experience working with Python is expected prior to following this tutorial if you want to fully understand it.

Prerequisites:

  1. A free (or paid) Pinecone account
  2. An AWS account with an IAM user with Bedrock permissions, and set up locally in your development environment.
  3. Python working in your development environment. I use 3.11.

Step 1: Check Bedrock is working

For this tutorial, I am assuming you are familiar with the AWS python extension boto3 and have previously set up your environment to utilise your IAM account with secret keys.

This is my standard script to connect to Amazon Bedrock with the Claude v2 model and to get a response from it to check things are working. If everything is set up properly you should get a response from Claude. If not, fix your environment and try again. I won’t explain how this does as it was covered in my previous article.

I would recommend building this in a Jupyter notebook so you can run each step separately, but there is nothing to stop you from building it in a Python file.

# Standard Claude prompt and response

import boto3
import json

bedrock = boto3.client(service_name=''bedrock-runtime'', region_name=''us-east-1'')

modelId = ''anthropic.claude-v2''
accept = ''application/json''
contentType = ''application/json''
body = json.dumps({
    "prompt": "Human: Tell me about the English Premier League. Assistant:",
    "max_tokens_to_sample": 500
})

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get(''body'').read())

print(response_body.get(''completion''))

Step 2: Split your data into chunks

For this step, I will use LangChain to split our source data into chunks and store in a Pinecone Vector database.

In a new Python script or Jupyter notebook, I first set up my environment by importing the relevant packages and creating the Amazon Bedrock connection.

import boto3
import json
import time
import os
from dotenv import load_dotenv

from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock
from langchain.vectorstores import Pinecone
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader
from langchain.chains.question_answering import load_qa_chain

import pinecone

load_dotenv()

bedrock_runtime = boto3.client(
    service_name = "bedrock-runtime",
    region_name = "us-east-1"
)

modelId = ''anthropic.claude-v2''
accept = ''application/json''
contentType = ''application/json''
body = json.dumps({
    "max_tokens_to_sample": 40000,
    "temperature": 0.1,
    "top_p": 0.9,
})

All my data (in this instance it is two PDFs regarding the English Premier League, and the FA Laws of the Game, are stored in a directory called ‘data’. I am using LangChain’s “directoryloader” function to load in the entire directory at once. Once loaded, I use langchain’s RecursiveCharacterTextSplitter to split the data into small chunks that can be stored in our vector database.

There are two key variables in this code to discuss: chunk_size and chunk_overlap. There is a lot of reading material available regarding optimal chunk size and chunk overlap.

  • Chunk Size: How many characters are included in each chunk? I have chosen relatively small chunks of 256 as from my testing this performs better than larger chunks, while still being optimal for costs.
  • Chunk Overlap: The number of characters that adjacent chunks share. I chose 25, which is roughly 10% of the overall chunk sharing an overlap. Again, your mileage may vary and I suggest testing and researching this further for a production-grade application.

This script could take a while to run depending on the size of your input documents. My documents total 26 MB and are processed in 52s on my Macbook Air M2.

Hint: Start with a single, small document to ensure it’s working before chunking your entire dataset.

# Define the path to the directory containing the PDF files (example_data folder).
directory = ''data''

# Function to load documents from the specified directory.
def load_docs(directory):
    # Create an instance of the DirectoryLoader with the provided directory path.
    loader = DirectoryLoader(directory)
    # Use the loader to load the documents from the directory and store them in ''documents''.
    documents = loader.load()
    # Return the loaded documents.
    return documents


# Call the load_docs function to retrieve the documents from the specified directory.
documents = load_docs(directory)

# Function to split the loaded documents into semantically separate chunks.
def split_docs(documents, chunk_size=256, chunk_overlap=25):
    # Create an instance of the RecursiveCharacterTextSplitter with specified chunk size and overlap.
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    # Use the text splitter to split the documents into chunks and store them in ''docs''.
    docs = text_splitter.split_documents(documents)
    # Return the split documents.
    return docs

# Call the split_docs function to break the loaded documents into chunks.
# The chunk_size and chunk_overlap parameters can be adjusted based on specific requirements.
docs = split_docs(documents)

Step 3: Store the chunks in a Pinecone database

Note: I am using Pinecone throughout this tutorial as it offers a free database for individuals. This could be replaced by various other vector databases such as Chrome, Weaviate or even AWS OpenSearch.

Log into your pinecone account and create an index, giving it a dimension size of 1536 with a dotproduct metric. Similarly to chunk size and overlap, there are various strategies available for this for optimal performance. Take note of your index name (this can be anything and will be required later).

Head to the API keys section and create a set of API keys for your environment. I use python dotenv to store environment variables, you can use whichever method you prefer. When it comes to Streamlit later in the tutorial I will make use of their preferred method that will allow us to deploy to their servers.

# Initiate the pinecone client

pinecone_api = os.getenv("PINECONE_API")
pinecone_env = os.getenv("PINECONE_ENV")

pinecone.init(
    api_key = pinecone_api,
    environment = pinecone_env
)

index_name = "yourindexname" # change to your index name

To create an index on Pinecone programmatically, or delete it, you can use this block of code. I find while testing it’s sometimes easier to check if it exists, then drop and re-create. Especially if you are switching around your input data.

if index_name in pinecone.list_indexes():
    pinecone.delete_index(index_name)

pinecone.create_index(name=index_name, dimension=1536, metric="dotproduct")
# wait for index to finish initialization
while not pinecone.describe_index(index_name).status["ready"]:
    time.sleep(1)

Create embeddings with Bedrock and store in Pinecone

To create our embeddings and store them into the Vector index this block of code is all you need. Note that it could take some time to run depending on how large your source data is. My text-heavy 26 MB source data took around 35 minutes to process.

Pinecone.from_texts will take our chunked documents and use bedrock embeddings to store them in the pinecone index.

The bedrock_embeddings is where you will start to incur costs on AWS. I would highly advise to check the Bedrock costs for Claude and the size of data you are processing.

Hint: Start with a single, small document to ensure it’s working before embedding your entire dataset.

llm = Bedrock(
    model_id=modelId,
    client=bedrock_runtime
)
bedrock_embeddings = BedrockEmbeddings(client=bedrock_runtime)

docsearch = Pinecone.from_texts(
    [t.page_content for t in docs],
    bedrock_embeddings,
    index_name = index_name
)

To test that it’s work you can run this simple piece of code to start asking your data questions. Each run of this block will incur a small AWS charge based on your input.

chain = load_qa_chain(llm, chain_type = "stuff")
query = "Explain the offside law so a 5 year old can understand it"
docs = docsearch.similarity_search(query)
chain.run(input_documents = docs, question = query)


" I''m afraid I don''t have enough context to fully explain the offside law in soccer to a 5 year old. The pieces of information provided discuss some specific scenarios related to offside, but don''t provide a full explanation of the rule itself. Without more complete information about the offside law, I don''t want to try to make up an explanation that could be incorrect or confusing. I''m sorry I can''t provide a helpful answer without more complete context about the specifics of the offside rule."

It works! My test question shows an output relating to my follow-up questions below so it can be proved that it is answering questions from my documents.

However, you need to test that it is asking questions specifically about your dataset and not its general ‘knowledge’. Ask a question specifically about your dataset so that you know what the answer should be. Follow up with a question that you know is not in the dataset and it should tell you that it can’t find it.

"What is considered a red card offence?"

'' Based on the provided context, it seems a direct red card offence would be something like:
- A penalty kick offence that was not penalised when it should have been (e.g. a deliberate handball to stop a goal)
- A penalty kick being incorrectly awarded when there was no offence
- Delaying the restart of play to show a card

The context indicates a red card is shown directly for serious offences, not as a second yellow card. But without more specifics from the passage, those are the clearest examples of red card offences that I can determine from the information provided. Please let me know if you need any clarification!''

Step 4: Integrate your model into Streamlit

This part of the tutorial will largely mirror Streamlit’s own guide to build a chat bot; however, this guide will focus on an implementation using an LLM and your own data — something that is not covered by Streamlit’s guide.

For this section, you must build this in a Python file rather than a Jupyter notebook as Streamlit cannot run through a notebook.

Environment setup:

  1. Create a new Python file called app.py.
  2. Create a new Python file called utils.py
  3. Create a new folder called .streamlit with a file called secrets.toml

app.py

This will create the streamlit application and allow a user to interact with the model you previously created. Although this is similar to Streamlit’s example application there are a few key updates:

  1. We utilise Langchain, Bedrock and Pinecone for question and answering from your data
  2. Linebreaks are preserved in the streaming output
  3. Secrets.toml is implemented for environment variables and secret keys.
  4. A new utils.py file is created for better management of functions.
  5. A ‘reset chat’ function is included so a user doesn’t need to clear the cache each time they want a new conversation.
import streamlit as st
from langchain.llms.bedrock import Bedrock
from langchain.vectorstores import Pinecone
from langchain.embeddings import BedrockEmbeddings
from langchain.chains.question_answering import load_qa_chain

import boto3
import time
import re

import pinecone
from utils import get_data_files, reset_conversation

# Set the page title
st.set_page_config(page_title="Bedrock with Langchain on Streamlit", page_icon="🇸🇦", layout="wide", initial_sidebar_state="auto", menu_items=None)

# Create a sidebar and a button to reset the chat, using our reset_conversation function from utils.py
# Also display a list of all files in the data folder that were used to train our model
st.sidebar.title("Additional Options")
with st.sidebar:
    st.button(''New Chat'', on_click=reset_conversation)
    st.write("Data files Loaded:")
    for file in get_data_files():
        st.markdown("- " + file)

# Set variables for the Pinecone and AWS APIs
pinecone_api = st.secrets["PINECONE_API"]
pinecone_env = st.secrets["PINECONE_ENV"]
pinecone_index = st.secrets["pinecone_index_name"]

AWS_SECRET_ACCESS_KEY = st.secrets["AWS_SECRET_ACCESS_KEY"]
AWS_ACCESS_KEY_ID = st.secrets["AWS_ACCESS_KEY_ID"]
AWS_DEFAULT_REGION = st.secrets["AWS_DEFAULT_REGION"]

# Create a bedrock runtime client
bedrock_runtime = boto3.client(
    service_name = "bedrock-runtime",
    region_name = "us-east-1"
)

# Create a Pinecone client
pinecone.init(
    api_key = pinecone_api,
    environment = pinecone_env
)


# Set an LLM connection using Claude, and choose the maximum number of tokens.
llm = Bedrock(
model_id=''anthropic.claude-v2'',
client=bedrock_runtime
)
llm.model_kwargs = { "max_tokens_to_sample": 1000}


# Search an existing pinecone index (as it was trained in the previous step)

bedrock_embeddings = BedrockEmbeddings(client=bedrock_runtime)
docsearch = Pinecone.from_existing_index(pinecone_index, bedrock_embeddings)

chain = load_qa_chain(llm, chain_type = "stuff")


# Set the title on the page
st.title("Bedrock with Langchain and Streamlit")


# Initialise chat history
if "messages" not in st.session_state:
    st.session_state.messages = []


# Display chat messages from history on app rerun
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])


# Create a function to generate a resposne from the model
def generate_response(input_text):
   #This will initiate the LLM and run a similarity search across the input text on your documents
    docs = docsearch.similarity_search(input_text)

   # Write the input text from the user onto the chat window
    with st.chat_message("user"):
        st.write(input_text)
        st.session_state.messages.append({"role": "user", "content": input_text})

    # Take the output message and display in the chat box
    with st.chat_message("assistant"):
        st.toast("Running...", icon="⏳")

        response = chain.run(input_documents = docs, question = input_text)
        message_placeholder = st.empty()
        full_response = ""

        # Simulate stream of response with milliseconds delay. THis is not true streaming functionality. We use re.split functionality to ensure that line breaks are preserved in the output.
        for chunk in re.split(r''(\s+)'', response):
            full_response += chunk + " "
            time.sleep(0.05)
            # Add a blinking cursor to simulate typing
            message_placeholder.markdown(full_response + "▌")
        message_placeholder.markdown(full_response)

        st.session_state.messages.append({"role": "assistant", "content": full_response})

# Create an input box to take the user''s input question
prompt = st.chat_input("Ask a question...")

if prompt:
    generate_response(prompt)

secrets.toml

Streamlit uses a file named secrets.toml to import environment variables. Although you can also use python dotenv (or your preferred method) using a secrets.toml allows you to use native Streamlit functionality and is easier to deploy later on.

AWS_ACCESS_KEY_ID = "YOUR AWS ACCESS KEY"
AWS_SECRET_ACCESS_KEY = "YOUR AWS SECRET ACCESS KEY"
AWS_DEFAULT_REGION = "us-east-1"

PINECONE_API = "YOUR PINECONE API KEY"
PINECONE_ENV = "YOUR PINECONE ENV SET"
pinecone_index_name = "YOUR PINECONE INDEX NAME"

Utils.py

My utils file has two basic functions which are used as part of the application. One to get the file names of the PDFs that we have trained our model on so they can be displayed on the front end, and a second to reset the chat conversation. This uses three different streamlit features. Session state to set the conversation and chat history to none, and to empty the messages.

import os
import streamlit as st

def get_data_files():
    data_files = []
    for dirname, _, filenames in os.walk("data"):
        for filename in filenames:
            data_files.append(os.path.join(filename))
    return data_files


def reset_conversation():
    st.session_state.conversation = None
    st.session_state.chat_history = None
    st.session_state.messages = []

The finished application running on Streamlit:

Finishing up

You’ve now successfully split source data into chunks and stored it in a Pinecone vector database and built a mini chatbot using Streamlit, LangChain, and Amazon Bedrock.

You may have noticed one key thing — this is not a true chatbot. Each interaction is completely separate. My next article will cover how to use memory within LangChain and turn it into a true chatbot that you can ask follow-up questions.

You can use Streamlit’s built-in functionality to deploy to streamlit.io. Your secrets.toml can be copy/pasted exactly to their environment variables section. A requirements.txt file is recommended to ensure that it will successfully find all the packages that are used.

If you have any questions, please contact me on LinkedIn.

Continue Learning

Discover more articles on similar topics