LLamaindex published an article showing how to set up and run ollama on your local computer (here). In the article the llamaindex package was used in conjunction with Qdrant vector database to enable search and answer generation based documents on local computer. In this article we are going to explore the chat options that llamaindex offers with a python script, as well as the llamaindex-cli rag build-in option that uses only Chromadb.
Code Walk-Through for Qdrant
import sys
import os
import qdrant_client
from pathlib import Path
from llama_index import (
VectorStoreIndex,
ServiceContext,
download_loader)
from llama_index.llms import Ollama
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
def read_single_json(path: str):
JSONReader = download_loader("JSONReader")
loader = JSONReader()
return loader.load_data(Path(path))
def read_single_pdf(path: str):
PDFReader = download_loader("PDFReader")
loader = PDFReader()
return loader.load_data(Path(path))
def create_qdrant_clinet(path: str):
return qdrant_client.QdrantClient(path=path)
def create_collection(data_filename: str):
basename = data_filename.split(".")[0].replace(" ", "_")
extension = data_filename.split(".")[1]
path = os.path.join("data", data_filename)
match extension:
case "pdf":
loaded_documents = read_single_pdf(path)
case "json":
loaded_documents = read_single_json(path)
return loaded_documents, basename
def initialize_qdrant(documents, client, collection_name, llm_model):
"""define the vector database and index
:return
query_engine index object
"""
vector_store = QdrantVectorStore(client=client, collection_name=collection_name)
service_context = ServiceContext.from_defaults(llm=llm_model, embed_model="local")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents,
service_context=service_context,
storage_context=storage_context)
query_engine = index.as_query_engine(streaming=True)
return query_engine
if __name__ == "__main__":
# definition of variables
data_filename = "some.pdf"
llm_model = Ollama(model="mixtral")
client = create_qdrant_clinet(path="./qdrant_data")
documents, collection_name = create_collection(data_filename)
query_engine = initialize_qdrant(documents, client, collection_name, llm_model)
# main CLI interaction loop
while True:
query_message = input("Q: ")
response = query_engine.query(query_message)
response.print_response_stream()
sys.stdout.flush()
sys.stdout.write("\n")
The read_single_json
, and read_single_pdf
specify the loader specific to the document type. Separating the loader makes the code implementation more explicit. The create_collection
function prepares our loaded document set (either a JSON file or a PDF file). It identifies the file type by splitting the file name on the dot and taking the second part (the extension). Depending on whether this is 'pdf' or 'json', we then call the appropriate function defined earlier to read the data.
After the documents are loaded and, the create_qdrant_clinet
function initializes a Qdrant client.
In the initialize_qdrant
function we set up our vector store for the chatbot. We create instances of a QdrantVectorStore
, ServiceContext
, and StorageContext
objects. We also use the documents from create_collection
, our client, the collection name, and the llm_model
(the language model from Ollama), to initialize our VectorStoreIndex
. Finally, the index is complied into a QueryEngine
object, that processes the queries, fetches relevant information, and returns the most suitable answers. Importantly, the streaming options can be used for nicer word-by-word display of the answer text.
query_engine = index.as_query_engine(streaming=True)
The while loop enables the script to receive a new question after each given answer, as an endless loop.
if __name__ == "__main__":
data_filename = "some.pdf"
llm_model = Ollama(model="mixtral")
client = create_qdrant_clinet(path="./qdrant_data")
documents, collection_name = create_collection(data_filename)
query_engine = initialize_qdrant(documents,
client,
collection_name,
llm_model)
# main CLI interaction loop
while True:
query_message = input("Q: ")
response = query_engine.query(query_message)
response.print_response_stream()
sys.stdout.flush()
sys.stdout.write("\n")
The user input is passed into the query engine, which then processes the question, retrieves the optimal response, and subsequently prints it. The sys.stdout commands are required to handle the printout in the terminal window.
Llamaindex-cli RAG with Chromadb
As the llamaindex package was installed in the python virtual environment, llamaindex-cli
can also be used without the need to run python scripts. This option is available only in conjunction with chromadb (pip install chromadb). Running the command:
llamaindex-cli rag --files "./data/*pdf"
will use all the pdf-s in the provided folder to build the knowledge vector database.
Querying the index is simple as:
llamaindex-cli rag --question "What are the key takeaways from the documents?"
Alternatively the chat options is built-in as well given that the first step of providing the files for the RAG have been run. The chat option is initialized:
llamaindex-cli rag --chat
Photo by Avi Richards on Unsplash
Conclusion
The Python script implementation gives more freedom for extending the chat options and build upon. The llamaindex-cli rag option appears as a nice feature without much of purpose for it. But than again, this is just my opinion ;).