Your Local LLM using FastAPI

FastAPI is a modern, fast, and easy-to-use web framework for building APIs with Python. It is based on the standard Python pointer type and supports features such as data validation, documentation…

Published on

Your Local LLM using FastAPI

FastAPI is a modern, fast, and easy-to-use web framework for building APIs with Python. It is based on the standard Python pointer type and supports features such as data validation, documentation, authentication and testing. FastAPI is also compatible with many popular Python libraries and frameworks, such as Pydantic, SQLAlchemy, Starlette, and Uvicorn. One of the strengths of FastAPI is that it can be used to create web interfaces for your local large language model (LLM). LLM is a powerful neural network that can generate natural language from requests or requests. They can be used for various tasks such as writing texts, summarizing, translating, answering questions and many more. However, running an LLM locally can be challenging, as it requires a lot of computing resources and technical skills. Using FastAPI, you can easily create web applications that allow you to interact with your local LLM via a browser or API client. You can also customize the appearance and functionality of your web application according to your needs and preferences. In this article, we will show you how to use FastAPI to create a simple web interface for your local LLM using Hugging Face Transformers.

Hugging Face Transformers is a library providing access to thousands of LLMs trained for various natural language processing tasks. You can use Hugging Face Transformers to load and run LLM on your local machine, as well as enhance it on your own data. Some of the popular LLMs available on Hugging Face Transformers are GPT-3, BERT, T5, and XLNet. To use FastAPI with Hugging Face Transformers, you need to install the following packages:

  • fastapi: The web framework that we will use to create our web application.
  • uvicorn: The ASGI server that we will use to run our web application.transformers: The library we will use to load and run our local LLM.
  • torch: The library we will use to perform tensor operations and run our local LLM on the GPU (optional). You can install these packages using pip:
pip install fastapi uvicorn transformers torch

After installing the package, you can create a Python file called app.py and paste the following code:

# Import the libraries
from fastapi import FastAPI, Request
from transformers import pipeline
from pydantic import BaseModel

# Create a FastAPI app
app = FastAPI()

# Create a class for the input data
class InputData(BaseModel):
 prompt: str

# Create a class for the output data
class OutputData(BaseModel):
 response: str

# Load a local LLM using Hugging Face Transformers
# You can change the model name and the task according to your needs
# For example, you can use “t5-base” for summarization or “bert-base-cased” for question answering
model = pipeline("text-generation", model="gpt2")

# Create a route for the web application
@app.post("/generate", response_model=OutputData)
def generate(request: Request, input_data: InputData):
 # Get the prompt from the input data
 prompt = input_data.prompt

# Generate a response from the local LLM using the prompt
 response = model(prompt)[0]["generated_text"]

# Return the response as output data
 return OutputData(response=response)

In the code above, we perform the following steps:

  • Import the libraries we need for our web application.
  • Creates a FastAPI application object.
  • Classes are created for data input and data output using the Pydantic model. The input data consists of a prompt string which we will use to generate our local LLM text. The output data consists of the response string that we will return from our local LLM.
  • Loading a local LLM using the Hugging Face Transformers pipeline function. We use the “text generation” task and the “gpt2” model as examples. You can change these parameters according to your needs and preferences. You can also specify device arguments to use GPU instead of CPU if available.
  • Create a route for our web app using the @app.post decorator. We set the path as “/generate” and the response model as OutputData. We also define a function called generate which takes a request object and a data input object as parameters.
  • Gets a prompt from a data input object using dot notation.
  • Generate responses from our local LLM using model functions. We pass the prompt as an argument and get the first element of the list returned as our response. We also access the “generated_text” key from the response dictionary using bracket notation.
  • Returns the response as an output data object using the OutputData constructor. To run our web application we can use uvicorn from the command line:
uvicorn app:app — reload

This will start our web application at http://127.0.0.1:8000. We can also access interactive documentation at http://127.0.0.1:8000/docs, where we can test our web application using the user interface. To use our web application, we can enter a prompt in the data input field and click the Execute button. This will send a POST request to our web application and get a response from our local LLM. For example, if we enter “Hello, world!” as our request, we might get something like this in our response:

{
 “response”: “Hello, world! This is a text generated by a local LLM using FastAPI and Hugging Face Transformers.”
}

We may also use API clients such as Postman or curl to send requests to our web application and get responses from our local LLM. For example, we can use curl from the command line:

curl -X POST -H “Content-Type: application/json” -d ‘{“prompt”:”Hello, world!”}’ http://127.0.0.1:8000/generate

This will send a POST request with a JSON body containing our request to our web application and get a JSON response containing our response from our local LLM. This article has shown you how to use FastAPI to create a simple web interface for your local LLM using Hugging Face Transformers. FastAPI is a great choice for building web applications for your local LLM, as it provides a fast, easy, and flexible way to interact with them via a browser or API client. You can also customize the web application according to your needs and preferences, such as changing the model, task, route or view.

Enjoyed this article?

Share it with your network to help others discover it

Continue Learning

Discover more articles on similar topics