Build awareness and adoption for your software startup with Circuit.

Is Hosting Your Own LLM Cheaper than OpenAI? Hint: It Could Be

Open AI Pricing

Open AI charges per token. 750 words are approximately 1000 tokens.

The price per token also depends on the model. For e.g.,

  1. GPT-4 (new) costs $0.03 (3 cents) per 1000 tokens
  2. GPT-3.5 (older) costs $0.0015 (.15 cents) per 1000 tokens.

Now, let's see this pricing in action to extrapolate monthly costs for a sample application.

Take an example of an AI application that writes blog posts for users under 1500 words.

That means 1 blog post will cost a maximum of 6 cents using GPT-4 (the better model).

If your application is receiving a 1000 requests per day to write blog posts, you are looking at an average cost of approximately $1,000 per month*

Now, let's see how much this would cost hosting your own LLM on AWS.

Host your own LLM Pricing

Server type is the primary cost factor for hosting your own LLM on AWS. Different models require different server types.

If we choose the Llama-2 7b (7 billion parameter) model, then we need at least the EC2 g5.2xlarge server instance which costs approximately $850 month.

We also need to connect the model to an API to use (with AWS API Gateway & AWS Lambda) but with 1000 requests per day, the cost will be less than $100 per month.

So we can estimate the AWS hosting to also cost approximately $1,000 per month

Wow, who would have thunk it!

Playing around with cost variables

Now let's play around with some variables

Since the Open AI pricing is based on tokens, if your usage increases to 2,000 requests per day then your costs will also double to $2,000 per month.

BUT our AWS setup will be able to handle that without any further scaling, keeping our costs stable around $1,000 per month.

So you being the savvy businessperson you are go with the AWS set up for your 2,000 requests per day application.

However, users start complaining that the quality of the blog posts is not as good as ChatGPT.

It turns out that the Llama-2 7B is not good enough for your use case.

You experiment and notice that Llama-2 13B is much better suited.

Llama-2 13B requires a much more powerful server. Using this server brings your costs much higher to around $5,000 per month. $3,000 more than the Open AI API.

Uh, oh!

Conclusion

So where does that leave us? With some generalized heuristics of course!

  1. Experiment with different models to see which one produce the best results.
  2. Figure out how much text these models expect to consume & generate.
  3. If the amount of text is going to be consistent & low, and security is not an issue, then go with Open AI.
  4. Otherwise, run the numbers for AWS.

You can use our handy dandy LLM hosting calculator here to run the numbers for AWS LLM hosting.

Appendix

We estimated the monthly cost for Open AI with the following breakdown of requests.

The table shows how many percentage of requests we assume are received with the corresponding words




Continue Learning