Chatting with Your Documents in the CLI with Ollama and LlamaIndex

LLamaindex published an article showing how to set up and run ollama on your local computer (here). In the article the llamaindex package was used in conjunction with Qdrant vector database to enable search and answer generation based documents on local computer. In this article we are going to explore the chat options that llamaindex offers with a python script, as well as the llamaindex-cli rag build-in option that uses only Chromadb.

Code Walk-Through for Qdrant

import sys
import os
import qdrant_client
from pathlib import Path
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
    download_loader)
from llama_index.llms import Ollama
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore


def read_single_json(path: str):
    JSONReader = download_loader("JSONReader")
    loader = JSONReader()
    return loader.load_data(Path(path))


def read_single_pdf(path: str):
    PDFReader = download_loader("PDFReader")
    loader = PDFReader()
    return loader.load_data(Path(path))


def create_qdrant_clinet(path: str):
    return qdrant_client.QdrantClient(path=path)


def create_collection(data_filename: str):
    basename = data_filename.split(".")[0].replace(" ", "_")
    extension = data_filename.split(".")[1]
    path = os.path.join("data", data_filename)
    match extension:
        case "pdf":
            loaded_documents = read_single_pdf(path)
        case "json":
            loaded_documents = read_single_json(path)
    return loaded_documents, basename


def initialize_qdrant(documents, client, collection_name, llm_model):
    """define the vector database and index
    :return
        query_engine index object
    """
    vector_store = QdrantVectorStore(client=client, collection_name=collection_name)
    service_context = ServiceContext.from_defaults(llm=llm_model, embed_model="local")
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = VectorStoreIndex.from_documents(documents,
                                            service_context=service_context,
                                            storage_context=storage_context)
    query_engine = index.as_query_engine(streaming=True)
    return query_engine


if __name__ == "__main__":
    # definition of variables
    data_filename = "some.pdf"
    llm_model = Ollama(model="mixtral")
    client = create_qdrant_clinet(path="./qdrant_data")
    documents, collection_name = create_collection(data_filename)
    query_engine = initialize_qdrant(documents, client, collection_name, llm_model)

    # main CLI interaction loop
    while True:
        query_message = input("Q: ")
        response = query_engine.query(query_message)
        response.print_response_stream()
        sys.stdout.flush()
        sys.stdout.write("\n")

The read_single_json, and read_single_pdf specify the loader specific to the document type. Separating the loader makes the code implementation more explicit. The create_collection function prepares our loaded document set (either a JSON file or a PDF file). It identifies the file type by splitting the file name on the dot and taking the second part (the extension). Depending on whether this is 'pdf' or 'json', we then call the appropriate function defined earlier to read the data.

After the documents are loaded and, the create_qdrant_clinet function initializes a Qdrant client.

In the initialize_qdrant function we set up our vector store for the chatbot. We create instances of a QdrantVectorStore, ServiceContext, and StorageContext objects. We also use the documents from create_collection, our client, the collection name, and the llm_model (the language model from Ollama), to initialize our VectorStoreIndex. Finally, the index is complied into a QueryEngine object, that processes the queries, fetches relevant information, and returns the most suitable answers. Importantly, the streaming options can be used for nicer word-by-word display of the answer text.

query_engine = index.as_query_engine(streaming=True)

The while loop enables the script to receive a new question after each given answer, as an endless loop.

if __name__ == "__main__":
    data_filename = "some.pdf"
    llm_model = Ollama(model="mixtral")
    client = create_qdrant_clinet(path="./qdrant_data")
    documents, collection_name = create_collection(data_filename)
    query_engine = initialize_qdrant(documents,
                                     client,
                                     collection_name,
                                     llm_model)
    # main CLI interaction loop
    while True:
        query_message = input("Q: ")
        response = query_engine.query(query_message)
        response.print_response_stream()
        sys.stdout.flush()
        sys.stdout.write("\n")

The user input is passed into the query engine, which then processes the question, retrieves the optimal response, and subsequently prints it. The sys.stdout commands are required to handle the printout in the terminal window.

Llamaindex-cli RAG with Chromadb

As the llamaindex package was installed in the python virtual environment, llamaindex-cli can also be used without the need to run python scripts. This option is available only in conjunction with chromadb (pip install chromadb). Running the command:

llamaindex-cli rag --files "./data/*pdf"

will use all the pdf-s in the provided folder to build the knowledge vector database.

Querying the index is simple as:

llamaindex-cli rag --question "What are the key takeaways from the documents?"

Alternatively the chat options is built-in as well given that the first step of providing the files for the RAG have been run. The chat option is initialized:

llamaindex-cli rag --chat

Photo by Avi Richards on Unsplash

Conclusion

The Python script implementation gives more freedom for extending the chat options and build upon. The llamaindex-cli rag option appears as a nice feature without much of purpose for it. But than again, this is just my opinion ;).

Chatting with Your Documents in the CLI with Ollama and LlamaIndex

Explore the chat options that llamaindex offers with a Python script, as well as the llamaindex-cli rag build-in option that uses only Chromadb.

Code Walk-Through for Qdrant

Llamaindex-cli RAG with Chromadb

Conclusion

Continue Learning

Importing Excel into a PostgreSQL Database with Python

Python: logging program state into multiple files for analysis

How to Find the Mode in a List Using Python

A basic Named entity recognition (NER) with SpaCy in 10 lines of code in Python

File Uploads and Downloads in FastAPI: A Comprehensive Guide

Extracting Specific Keys/Values From A Messed-Up JSON File (Python)

Main Menu

Follow Us

Chatting with Your Documents in the CLI with Ollama and LlamaIndex

Explore the chat options that llamaindex offers with a Python script, as well as the llamaindex-cli rag build-in option that uses only Chromadb.

Code Walk-Through for Qdrant

Llamaindex-cli RAG with Chromadb

Conclusion

Continue Learning

Importing Excel into a PostgreSQL Database with Python

Python: logging program state into multiple files for analysis

How to Find the Mode in a List Using Python

A basic Named entity recognition (NER) with SpaCy in 10 lines of code in Python

File Uploads and Downloads in FastAPI: A Comprehensive Guide

Extracting Specific Keys/Values From A Messed-Up JSON File (Python)