The open blogging platform. Say no to algorithms and paywalls.

Chatting with Your Documents in the CLI with Ollama and LlamaIndex

Explore the chat options that llamaindex offers with a Python script, as well as the llamaindex-cli rag build-in option that uses only Chromadb.

LLamaindex published an article showing how to set up and run ollama on your local computer (here). In the article the llamaindex package was used in conjunction with Qdrant vector database to enable search and answer generation based documents on local computer. In this article we are going to explore the chat options that llamaindex offers with a python script, as well as the llamaindex-cli rag build-in option that uses only Chromadb.

Code Walk-Through for Qdrant

import sys  
import os  
import qdrant_client  
from pathlib import Path  
from llama_index import (  
    VectorStoreIndex,  
    ServiceContext,  
    download_loader)  
from llama_index.llms import Ollama  
from llama_index.storage.storage_context import StorageContext  
from llama_index.vector_stores.qdrant import QdrantVectorStore  
  
  
def read_single_json(path: str):  
    JSONReader = download_loader("JSONReader")  
    loader = JSONReader()  
    return loader.load_data(Path(path))  
  
  
def read_single_pdf(path: str):  
    PDFReader = download_loader("PDFReader")  
    loader = PDFReader()  
    return loader.load_data(Path(path))  
  
  
def create_qdrant_clinet(path: str):  
    return qdrant_client.QdrantClient(path=path)  
  
  
def create_collection(data_filename: str):  
    basename = data_filename.split(".")[0].replace(" ", "_")  
    extension = data_filename.split(".")[1]  
    path = os.path.join("data", data_filename)  
    match extension:  
        case "pdf":  
            loaded_documents = read_single_pdf(path)  
        case "json":  
            loaded_documents = read_single_json(path)  
    return loaded_documents, basename  
  
  
def initialize_qdrant(documents, client, collection_name, llm_model):  
    """define the vector database and index  
    :return  
        query_engine index object  
    """  
    vector_store = QdrantVectorStore(client=client, collection_name=collection_name)  
    service_context = ServiceContext.from_defaults(llm=llm_model, embed_model="local")  
    storage_context = StorageContext.from_defaults(vector_store=vector_store)  
    index = VectorStoreIndex.from_documents(documents,  
                                            service_context=service_context,  
                                            storage_context=storage_context)  
    query_engine = index.as_query_engine(streaming=True)  
    return query_engine  
  
  
if __name__ == "__main__":  
    # definition of variables  
    data_filename = "some.pdf"  
    llm_model = Ollama(model="mixtral")  
    client = create_qdrant_clinet(path="./qdrant_data")  
    documents, collection_name = create_collection(data_filename)  
    query_engine = initialize_qdrant(documents, client, collection_name, llm_model)  
  
    # main CLI interaction loop  
    while True:  
        query_message = input("Q: ")  
        response = query_engine.query(query_message)  
        response.print_response_stream()  
        sys.stdout.flush()  
        sys.stdout.write("\n")

The read_single_json, and read_single_pdf specify the loader specific to the document type. Separating the loader makes the code implementation more explicit. The create_collection function prepares our loaded document set (either a JSON file or a PDF file). It identifies the file type by splitting the file name on the dot and taking the second part (the extension). Depending on whether this is 'pdf' or 'json', we then call the appropriate function defined earlier to read the data.

After the documents are loaded and, the create_qdrant_clinet function initializes a Qdrant client.

In the initialize_qdrant function we set up our vector store for the chatbot. We create instances of a QdrantVectorStore, ServiceContext, and StorageContext objects. We also use the documents from create_collection, our client, the collection name, and the llm_model (the language model from Ollama), to initialize our VectorStoreIndex. Finally, the index is complied into a QueryEngine object, that processes the queries, fetches relevant information, and returns the most suitable answers. Importantly, the streaming options can be used for nicer word-by-word display of the answer text.

query_engine = index.as_query_engine(streaming=True)

The while loop enables the script to receive a new question after each given answer, as an endless loop.

if __name__ == "__main__":  
    data_filename = "some.pdf"  
    llm_model = Ollama(model="mixtral")  
    client = create_qdrant_clinet(path="./qdrant_data")  
    documents, collection_name = create_collection(data_filename)  
    query_engine = initialize_qdrant(documents,   
                                     client,   
                                     collection_name,   
                                     llm_model)  
    # main CLI interaction loop  
    while True:  
        query_message = input("Q: ")  
        response = query_engine.query(query_message)  
        response.print_response_stream()  
        sys.stdout.flush()  
        sys.stdout.write("\n")

The user input is passed into the query engine, which then processes the question, retrieves the optimal response, and subsequently prints it. The sys.stdout commands are required to handle the printout in the terminal window.

Llamaindex-cli RAG with Chromadb

As the llamaindex package was installed in the python virtual environment, llamaindex-cli can also be used without the need to run python scripts. This option is available only in conjunction with chromadb (pip install chromadb). Running the command:

llamaindex-cli rag --files "./data/*pdf"

will use all the pdf-s in the provided folder to build the knowledge vector database.

Querying the index is simple as:

llamaindex-cli rag --question "What are the key takeaways from the documents?"

Alternatively the chat options is built-in as well given that the first step of providing the files for the RAG have been run. The chat option is initialized:

llamaindex-cli rag --chat 

Photo by Avi Richards on Unsplash

Conclusion

The Python script implementation gives more freedom for extending the chat options and build upon. The llamaindex-cli rag option appears as a nice feature without much of purpose for it. But than again, this is just my opinion ;).




Continue Learning