Image Source: https://quickbooks.intuit.com
Discover the potential of LangChain, an advanced framework tailored for creating language models that empower developers to streamline intricate Natural Language Processing (NLP) pipelines easily. When constructing these pipelines, monitoring your token usage is imperative, particularly for calls that hinge on paid APIs like OpenAI’s GPT-3.
Join me in this tutorial as we explore the ins and outs of tracking token usage in your NLP calls, including the associated token costs — configurable to match your organization’s pricing arrangements — using LangChain. It’s crucial to highlight that this tracking feature is currently limited to the OpenAI API, providing you with enhanced control over your resource management. Additionally, we’ll delve into the customization of the per 1k token cost, acknowledging the unique pricing structures for enterprises leveraging OpenAI.
What are tokens and how they are calculated?
Tokens, the building blocks of words, play a crucial role in Open AI API processing. These units comprise about 4 characters or ¾ words in English. Here are some helpful rules of thumb for understanding tokens in terms of lengths:
- 1 token ~= 4 chars in English
- 1 token ~= ¾ words
- 100 tokens ~= 75 words
- 1–2 sentences ≈ 30 tokens
- 1 paragraph ≈ 100 tokens
To understand tokenization, you have the option of utilizing an interactive Tokenizer tool. This tool not only enables you to compute the token count but also provides insights into the segmentation of text into tokens. Alternatively, for programmatic tokenization, consider employing Tiktoken — an efficient tokenizer designed specifically for OpenAI models.
Why Token calculation is important?
The token consumption in a single API call varies depending on the chosen LLM (Large Language Model).
For instance, when utilizing gpt-3.5-turbo, there is a maximum limit of 4,096 tokens for both the input prompt and the completion. If the input prompt already uses 3,500 tokens, the remaining capacity for the completion/output response is restricted to 596 tokens.
This underscores the importance of managing token usage effectively to stay within the specified limits of the chosen language model.
Cost calculation using LangChain
To calculate the token consumption in prompt as well as in completion as well as the cost using LangChain you can simply leverage the callback function provided by LangChain. For example
from langchain.chains import LLMChain
from langchain_community.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.callbacks import get_openai_callback
import os
os.environ["OPENAI_API_KEY"] =<Your Open AI Key>
llm = ChatOpenAI(model_name="gpt-3.5-turbo")
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
with get_openai_callback() as cb:
question = "How to create a red sauce pasta"
llm_chain.run(question)
print(cb)
---------
Output:
Tokens Used: 349
Prompt Tokens: 26
Completion Tokens: 323
Successful Requests: 1
Total Cost (USD): $0.000685
If we closely look at the above output and let’s try to calculate the cost with pricing for model gpt-3.5-turbo as for input token pricing $0.0015 / 1K and for output $0.0020 / 1K tokens.
Total cost : (($0.0015 * 26)+($0.0020*323))/1000 = $0.000685
Bingo! The cost obtained from the LangChain callback function aligns with the cost that we had calculated manually by using Open AI standard pricing.
Let’s contemplate a scenario where we have entered into an enterprise deal with OpenAI, securing a 10% discount on the standard pricing. The question at hand is whether Langchain possesses the capability to integrate this discounted price into its calculations.
Custom pricing for token in LangChain
As observed, we have a discount applied to the standard pricing. Now, let’s explore the adjustments needed in the LangChain functions to seamlessly integrate this custom pricing.
To begin, we must create a custom callback function to incorporate our specific pricing. Let’s name this custom callback function ‘get_custom_openai_callback.’ Additionally, we have established the ‘OpenAICustomCallbackHandler’ class, inheriting from ‘OpenAICallbackHandler.’ Now, let’s proceed to override the ‘on_llm_end’ function within this custom callback handler.
from __future__ import annotations
from abc import ABC
from contextlib import contextmanager
from typing import Any, Generator
from langchain_community.callbacks.manager import openai_callback_var
from langchain_community.callbacks.openai_info import OpenAICallbackHandler, standardize_model_name, MODEL_COST_PER_1K_TOKENS
from langchain.schema import LLMResult
def get_openai_token_cost_for_model(model_name: str, num_tokens: int, is_completion: bool = False) -> float:
"""
Get the cost in USD for a given model and number of tokens.
Args:
model_name: Name of the model num tokens: Number of tokens.
is_completion: Whether the model is used for completion or not.
Defaults to False.
Returns:
Cost in USD.
"""
model_name = standardize_model_name (model_name, is_completion=is_completion)
if model_name not in MODEL_COST_PER_1K_TOKENS:
raise ValueError (f"Unknown model: {model_name}. Please provide a valid OpenAI model name.Known models are:"+
",".join (MODEL_COST_PER_1K_TOKENS. keys ())
)
return MODEL_COST_PER_1K_TOKENS[model_name] * (num_tokens / 1000)
class OpenAICustomCallbackHandler(OpenAICallbackHandler,ABC):
def on_llm_end (self, response: LLMResult, **kwargs: Any) -> None:
"""Collect token usage."""
if response. llm_output is None:
return None
self.successful_requests += 1
if "token_usage" not in response.llm_output:
return None
token_usage = response.llm_output["token_usage"]
completion_tokens = token_usage.get ("completion_tokens", 0)
prompt_tokens = token_usage.get("prompt_tokens", 0)
model_name = standardize_model_name (response.llm_output.get ("model_name", ""))
if model_name in MODEL_COST_PER_1K_TOKENS:
completion_cost = get_openai_token_cost_for_model(model_name, completion_tokens, is_completion=True)
prompt_cost = get_openai_token_cost_for_model(model_name, prompt_tokens)
self.total_cost += prompt_cost + completion_cost
self.total_tokens += token_usage.get("total_tokens", 0)
self.prompt_tokens += prompt_tokens
self.completion_tokens += completion_tokens
@contextmanager
def get_custom_openai_callback() -> Generator[OpenAICustomCallbackHandler, None, None]:
"""Get the OpenAI callback handler in a context manager. which conveniently exposes token and cost information.
Returns:
OpenAICallbackHandler: The OpenAI callback handler.
Example:
with get_openai_callback() as cb:
# Use the OpenAI callback handler
"""
cb = OpenAICustomCallbackHandler()
openai_callback_var.set (cb)
yield cb
openai_callback_var.set(None)
Upon closer examination of the aforementioned function, it becomes evident that the data LangChain utilizes for price calculation is, in fact, represented by the variable ‘MODEL_COST_PER_1K_TOKENS.’
So if we overwrite that instead of importing that from langchain_community.callbacks.openai_info.
We can create our file as:
Standard Pricing
MODEL_COST_PER_1K_TOKENS = {
# GPT-3.5 input
"gpt-3.5-turbo": 0.0015,
# GPT-3.5 output
"gpt-3.5-turbo-completion": 0.002,
}
Custom pricing with 10% discount
MODEL_COST_PER_1K_TOKENS = {
# GPT-3.5 input
"gpt-3.5-turbo": 0.00135,
# GPT-3.5 output
"gpt-3.5-turbo-completion": 0.0018,
}
Now look at the full code in action
from __future__ import annotations
from abc import ABC
from contextlib import contextmanager
from typing import Any, Generator
from langchain_community.callbacks.manager import openai_callback_var
from langchain_community.callbacks.openai_info import OpenAICallbackHandler, standardize_model_name
from langchain.schema import LLMResult
##Custom pricing with 10% discount
MODEL_COST_PER_1K_TOKENS = {
# GPT-3.5 input
"gpt-3.5-turbo": 0.00135,
# GPT-3.5 output
"gpt-3.5-turbo-completion": 0.0018,
}
def get_openai_token_cost_for_model(model_name: str, num_tokens: int, is_completion: bool = False) -> float:
"""
Get the cost in USD for a given model and number of tokens.
Args:
model_name: Name of the model num tokens: Number of tokens.
is_completion: Whether the model is used for completion or not.
Defaults to False.
Returns:
Cost in USD.
"""
model_name = standardize_model_name (model_name, is_completion=is_completion)
if model_name not in MODEL_COST_PER_1K_TOKENS:
raise ValueError (f"Unknown model: {model_name}. Please provide a valid OpenAI model name.Known models are:"+
",".join (MODEL_COST_PER_1K_TOKENS. keys ())
)
return MODEL_COST_PER_1K_TOKENS[model_name] * (num_tokens / 1000)
class OpenAICustomCallbackHandler(OpenAICallbackHandler,ABC):
def on_llm_end (self, response: LLMResult, **kwargs: Any) -> None:
"""Collect token usage."""
if response. llm_output is None:
return None
self.successful_requests += 1
if "token_usage" not in response.llm_output:
return None
token_usage = response.llm_output["token_usage"]
completion_tokens = token_usage.get ("completion_tokens", 0)
prompt_tokens = token_usage.get("prompt_tokens", 0)
model_name = standardize_model_name (response.llm_output.get ("model_name", ""))
if model_name in MODEL_COST_PER_1K_TOKENS:
completion_cost = get_openai_token_cost_for_model(model_name, completion_tokens, is_completion=True)
prompt_cost = get_openai_token_cost_for_model(model_name, prompt_tokens)
self.total_cost += prompt_cost + completion_cost
self.total_tokens += token_usage.get("total_tokens", 0)
self.prompt_tokens += prompt_tokens
self.completion_tokens += completion_tokens
@contextmanager
def get_custom_openai_callback() -> Generator[OpenAICustomCallbackHandler, None, None]:
"""Get the OpenAI callback handler in a context manager. which conveniently exposes token and cost information.
Returns:
OpenAICallbackHandler: The OpenAI callback handler.
Example:
with get_openai_callback() as cb:
# Use the OpenAI callback handler
"""
cb = OpenAICustomCallbackHandler()
openai_callback_var.set (cb)
yield cb
openai_callback_var.set(None)
Once we have this custom callback let’s call this in our actual code
from langchain.chains import LLMChain
from langchain_community.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from <path_to_above_code_file> import get_custom_openai_callback
import os
os.environ["OPENAI_API_KEY"] =<Your Open AI Key>
llm = ChatOpenAI(model_name="gpt-3.5-turbo",max_tokens=323)
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
with get_custom_openai_callback() as cb:
question = "How to create a red sauce pasta"
llm_chain.run(question)
print(cb)
---------
Output:
Tokens Used: 349
Prompt Tokens: 26
Completion Tokens: 323
Successful Requests: 1
Total Cost (USD): $0.0006165000000000001
Total cost: (($0.00135 * 26)+($0.0018*323))/1000 = $0.0006165000000000001
If we look at the above price it is at the 10% discount of the older price — $0.000685.
Voila! We have successfully modified the LangChain function to calculate the cost based on the pricing master that we are supplying.
Key Points
- The pricing calculation logic in LangChain is currently designed exclusively for Open AI models.
- At present, LangChain does not support pricing and token calculation for embedding.
- It is necessary to modify the value of
MODEL_COST_PER_1K_TOKENS
based on the specific model being utilized in the application. Refer to the predefined configured values here for guidance in making the necessary adjustments.