How to Use Llama 2 with an API on AWS to Power Your AI Apps

ā€¢

Meta just released a badass new LLM called Llama 2.

And if you are anything like us, you just canā€™t wait to get your hands dirty and build with it.

The first step to building with any kind of LLM is to host it somewhere and use it through an API. Then your developers can easily integrate it into your applications.

Why should I use Llama 2 when I can use Open AI API?

3 things:

  1. Security ā€” keep sensitive data away from 3rd party vendors
  2. Reliability ā€” ensure your applications have guaranteed uptime
  3. Consistency ā€” get the same results each time a question is asked

What will this guide cover

  1. Part I ā€”Ā Hosting the Llama 2 model on AWS sagemaker
  2. Part II ā€”Ā Use the model through an API with AWS Lambda and AWS API Gateway

Step 0: Log in or Sign up for an AWS account

  1. Go toĀ https://aws.amazon.com/Ā and log in or sign up for an account
  2. If you sign up for a new account, you will automatically be given Free Tier access, which does provide some Sagemaker credits, but keep an eye out on them since depending on your server selection the bill can get absurdly high

Part I ā€” Hosting the Model

Step 1: Go to AWS Sagemaker

Once you are in your AWS Dashboard, search for AWS Sagemaker in the search bar, and click on it to go to AWS Sagemaker

AWS Sagemaker is AWSā€™s solution for deploying and hosting Machine Learning models.

Step 2: Set up a domain on AWS Sagemaker

  1. Click onĀ DomainsĀ on the left sidebar

2. Click onĀ Create a Domain

3. Make sure theĀ Quick SetupĀ box is selected

4. Fill out the form below with a domain name of your choosing and the rest of the options filled out as you see in the screenshot.

If you are new to this, chooseĀ create a new roleĀ in the Execution role category. Otherwise, pick a role that you may have created before.

5. ClickĀ SubmitĀ on the form to create your domain

6. When the domain is finished being created, you will be shown this screen

Note down the user name you see here as it will be needed to deploy our model in the next step

If your domain had an error being created, it is likely due to user permissions or VPC configuration.

Step 3: Start a Sagemaker Studio Session

  1. Click on theĀ StudioĀ link in the left sidebar once your domain is finished being created.

2. Select theĀ domain nameĀ and theĀ user profileĀ you selected previously and clickĀ Open Studio.

This will take you to a Jupyter lab studio session that looks like this.

Step 4: Select the Llama-2ā€“7b-chat model

We are going to deploy the chat-optimized & 7 billion parameter version of the llama 2 model.

There is a more powerful 70b model, which is much more robust, for demo purposes it will be too costly so we will go with the smaller model

  1. Click onĀ Models, notebooks, and solutionsĀ in the left sidebar under theĀ SageMaker JumpstartĀ tab

2. Search for theĀ Llama 2Ā model in the search bar. We are looking for the 7b chat model. Click on the model

If you do not see this model then you may need to shut down and restart your studio session

3. This will take you to the model page. You can change the deployment settings as best suited to your use case but we will just proceed with the default Sagemaker settings andĀ DeployĀ the model as is.

The 70B version needs a powerful server so your deployment might error out if your account does not have access to it. In this case, submit a request to AWS service quotas.

4. Wait 5ā€“10 minutes for deployment to finish and the confirmation screen to be shown.

Note down the modelā€™sĀ Endpoint nameĀ since you will need it to use the model with an API.

And with that, you are now done withĀ Part IĀ of hosting the model. Have a beverage or snack of your choice to celebrate!


Part II ā€” Use the model with an API

Step 1: Go to AWS Lambda to create a Lambda Function

A lambda function will be used to call your LLM modelā€™s endpoint.

  1. Search for theĀ Lambda serviceĀ in the AWS console search bar and click on the Lambda service.

2. Click onĀ Create function

3. Enter a proper function name (doesnā€™t matter what), chooseĀ Python 3.10Ā as the runtime, and the x86_64 architecture. Then click onĀ Create Function

Step 2: Specify your modelā€™s endpoint point

Enter the LLM modelā€™s endpoint name from the last step ofĀ Part IĀ as an environment variable.

1. Click on theĀ Configuration tabĀ in your newly created model.

2. Click onĀ Environment variablesĀ and click onĀ Edit.

3. Click onĀ Add environment variableĀ on the next screen.

4. EnterĀ ENDPOINT_NAMEĀ as the key and your modelā€™s endpoint name as the value. ClickĀ Save.

You can actually add anything for the key you wish but it will need to match up with what we write in our code to call the function later.

Step 3: Write the code that will call the Llama model

1. Go back to theĀ CodeĀ tab and copy and paste the following code there.

import os
import io
import boto3
import json

# grab environment variables
ENDPOINT_NAME = os.environ[''ENDPOINT_NAME'']
runtime= boto3.client(''runtime.sagemaker'')

def lambda_handler(event, context):
    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                       ContentType=''application/json'',
                                       Body=event[''body''],
                                       CustomAttributes="accept_eula=true")
    
    result = json.loads(response[''Body''].read().decode())
    
    
    return {
        "statusCode": 200,
        "body": json.dumps(result)
    }

2. ClickĀ DeployĀ after the code is successfully inserted.

Step 4: Connect your new Lambda function to AWS API Gateway

  1. Go to your Lambda functionā€™s home screen and clickĀ Add Trigger.

2. Select theĀ API GatewayĀ menu item in the Add trigger dialog.

3. Fill out the API Gateway dialog as follows and click onĀ Add.

4. After the API endpoint has been successfully created, you can view the API URL under theĀ ConfigurationĀ tab andĀ TriggersĀ sidebar.

Step 5: Test your brand spanking new LLM API

  1. Make a POST or GET request to your API URL with the following JSON body
{
 "inputs": [
  [
   {"role": "system", "content": "You are chat bot who writes songs"},
   {"role": "user", "content": "Write a rap about Barbie"}
  ]
 ],
 "parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}

TheĀ inputsĀ key stores an array with each element being 1 conversation between you and the chat bot.

The conversation is represented by a nested JSON object that containsĀ roleĀ which signifies the ā€œpersonā€ in the conversation andĀ contentĀ which is the prompt

2. Check theĀ responseĀ status code and theĀ responseĀ JSON from the API. The status code should be 200 and the response JSON will be like the following

[{"generation": {"role": "assistant", 
"content": " Yo, listen up, I got a story to tell 
''Bout a doll that''s fly, and she''s doin'' well
Barbie, Barbie, she''s the queen of the scene
From Malibu to Kenya, she''s always lookin'' supreme
She''s got the style, she''s got the grace
She''s got the swag, she''s got the pace
She''s got the cars, she''s got the cash
She''s livin'' large, she''s got the flash
She''s got the perfect body, she''s got the perfect face
She''s got the perfect hair, she''s got the perfect pace
She''s got the perfect life, she''s got the perfect vibe
She''s got it all, she''s got the perfect ride
She''s got Ken by her side, he''s her main man
He''s got the skills, he''s got the plan
They''re livin'' large, they''re got the fame
They''re the perfect couple, they''re got the game
So listen up,"}}]

You can use the following Python code to test the API. Replace the value for theĀ api_urlĀ with the API Url that you created in the last step 4

import requests

api_url = ''https://spip03jtgd.execute-api.us-east-1.amazonaws.com/default/call-bloom-llm''

json_body = {
 "inputs": [
  [
   {"role": "system", "content": "You are chat bot who writes songs"},
   {"role": "user", "content": "Write a rap about Barbie"}
  ]
 ],
 "parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}

r = requests.post(api_url, json=json_body)

print(r.json())

Potential Errors

You might receive a few errors in this scenario:

  1. Permissions:Ā if your role does not have permission to use the Sagemaker invoke endpoint policy, then you will not be able to call the endpoint.
  2. Timeout:Ā depending on your prompt and variables, you may receive a timed-out error. Unlike permissions, this is an easy fix. Click onĀ Configuration, General,Ā andĀ Edit TimeoutĀ and set the timeout value to more seconds

Update:Ā please refer to the comments if you continue to receive an error. There are great responses by the readers on how to debug!

Conclusion

This post shows you how to use the most powerful LLM to date

There are many reasons you should consider using your own hosted open-source LLM as an API such as:

  1. Security
  2. Reliability
  3. Consistency

Enjoyed this article?

Share it with your network to help others discover it

Continue Learning

Discover more articles on similar topics