The open blogging platform. Say no to algorithms and paywalls.

Customise Token Pricing Calculation in LangChain: A Step-by-Step Guide

Explore the ins and outs of tracking token usage in your NLP calls, including the associated token costs using LangChain.

Image Source: https://quickbooks.intuit.com

Discover the potential of LangChain, an advanced framework tailored for creating language models that empower developers to streamline intricate Natural Language Processing (NLP) pipelines easily. When constructing these pipelines, monitoring your token usage is imperative, particularly for calls that hinge on paid APIs like OpenAI’s GPT-3.

Join me in this tutorial as we explore the ins and outs of tracking token usage in your NLP calls, including the associated token costsconfigurable to match your organization’s pricing arrangements — using LangChain. It’s crucial to highlight that this tracking feature is currently limited to the OpenAI API, providing you with enhanced control over your resource management. Additionally, we’ll delve into the customization of the per 1k token cost, acknowledging the unique pricing structures for enterprises leveraging OpenAI.

What are tokens and how they are calculated?

Tokens, the building blocks of words, play a crucial role in Open AI API processing. These units comprise about 4 characters or ¾ words in English. Here are some helpful rules of thumb for understanding tokens in terms of lengths:

  • 1 token ~= 4 chars in English
  • 1 token ~= ¾ words
  • 100 tokens ~= 75 words
  • 1–2 sentences ≈ 30 tokens
  • 1 paragraph ≈ 100 tokens

To understand tokenization, you have the option of utilizing an interactive Tokenizer tool. This tool not only enables you to compute the token count but also provides insights into the segmentation of text into tokens. Alternatively, for programmatic tokenization, consider employing Tiktoken — an efficient tokenizer designed specifically for OpenAI models.

Why Token calculation is important?

The token consumption in a single API call varies depending on the chosen LLM (Large Language Model).

For instance, when utilizing gpt-3.5-turbo, there is a maximum limit of 4,096 tokens for both the input prompt and the completion. If the input prompt already uses 3,500 tokens, the remaining capacity for the completion/output response is restricted to 596 tokens.

This underscores the importance of managing token usage effectively to stay within the specified limits of the chosen language model.

Cost calculation using LangChain

To calculate the token consumption in prompt as well as in completion as well as the cost using LangChain you can simply leverage the callback function provided by LangChain. For example

from langchain.chains import LLMChain  
from langchain_community.chat_models import ChatOpenAI  
from langchain.prompts import PromptTemplate  
from langchain.callbacks import get_openai_callback  
import os  
os.environ["OPENAI_API_KEY"] =<Your Open AI Key>  
llm = ChatOpenAI(model_name="gpt-3.5-turbo")  
template = """Question: {question}  
  
Answer: Let's think step by step."""  
  
prompt = PromptTemplate(template=template, input_variables=["question"])  
llm_chain = LLMChain(prompt=prompt, llm=llm)  
with get_openai_callback() as cb:  
    question = "How to create a red sauce pasta"  
    llm_chain.run(question)  
    print(cb)  
---------  
Output:  
Tokens Used: 349  
 Prompt Tokens: 26  
 Completion Tokens: 323  
Successful Requests: 1  
Total Cost (USD): $0.000685

If we closely look at the above output and let’s try to calculate the cost with pricing for model gpt-3.5-turbo as for input token pricing $0.0015 / 1K and for output $0.0020 / 1K tokens.

Total cost : (($0.0015 * 26)+($0.0020*323))/1000 = $0.000685

Bingo! The cost obtained from the LangChain callback function aligns with the cost that we had calculated manually by using Open AI standard pricing.

Let’s contemplate a scenario where we have entered into an enterprise deal with OpenAI, securing a 10% discount on the standard pricing. The question at hand is whether Langchain possesses the capability to integrate this discounted price into its calculations.

Custom pricing for token in LangChain

As observed, we have a discount applied to the standard pricing. Now, let’s explore the adjustments needed in the LangChain functions to seamlessly integrate this custom pricing.

To begin, we must create a custom callback function to incorporate our specific pricing. Let’s name this custom callback function ‘get_custom_openai_callback.’ Additionally, we have established the ‘OpenAICustomCallbackHandler’ class, inheriting from ‘OpenAICallbackHandler.’ Now, let’s proceed to override the ‘on_llm_end’ function within this custom callback handler.

from __future__ import annotations  
from abc import ABC   
from contextlib import contextmanager   
from typing import Any, Generator  
from langchain_community.callbacks.manager import openai_callback_var   
from langchain_community.callbacks.openai_info import OpenAICallbackHandler, standardize_model_name, MODEL_COST_PER_1K_TOKENS  
from langchain.schema import LLMResult  
  
def get_openai_token_cost_for_model(model_name: str, num_tokens: int, is_completion: bool = False) -> float:  
    """  
      Get the cost in USD for a given model and number of tokens.  
      Args:  
      model_name: Name of the model num tokens: Number of tokens.  
      is_completion: Whether the model is used for completion or not.  
      Defaults to False.  
      Returns:  
      Cost in USD.  
    """  
    model_name = standardize_model_name (model_name, is_completion=is_completion)  
    if model_name not in MODEL_COST_PER_1K_TOKENS:  
        raise ValueError (f"Unknown model: {model_name}. Please provide a valid OpenAI model name.Known models are:"+  
                          ",".join (MODEL_COST_PER_1K_TOKENS. keys ())  
    )  
    return MODEL_COST_PER_1K_TOKENS[model_name] * (num_tokens / 1000)  
  
class OpenAICustomCallbackHandler(OpenAICallbackHandler,ABC):  
    def on_llm_end (self, response: LLMResult, **kwargs: Any) -> None:  
        """Collect token usage."""   
        if response. llm_output is None:  
          return None   
        self.successful_requests += 1  
        if "token_usage" not in response.llm_output:  
          return None  
        token_usage = response.llm_output["token_usage"]  
        completion_tokens = token_usage.get ("completion_tokens", 0)  
        prompt_tokens = token_usage.get("prompt_tokens", 0)  
        model_name = standardize_model_name (response.llm_output.get ("model_name", ""))  
        if model_name in MODEL_COST_PER_1K_TOKENS:  
            completion_cost = get_openai_token_cost_for_model(model_name, completion_tokens, is_completion=True)  
            prompt_cost = get_openai_token_cost_for_model(model_name, prompt_tokens)  
            self.total_cost += prompt_cost + completion_cost   
        self.total_tokens += token_usage.get("total_tokens", 0)   
        self.prompt_tokens += prompt_tokens   
        self.completion_tokens += completion_tokens  
  
@contextmanager  
def get_custom_openai_callback() -> Generator[OpenAICustomCallbackHandler, None, None]:  
    """Get the OpenAI callback handler in a context manager. which conveniently exposes token and cost information.  
    Returns:  
    OpenAICallbackHandler: The OpenAI callback handler.  
    Example:  
    with get_openai_callback() as cb:  
    # Use the OpenAI callback handler  
    """  
    cb = OpenAICustomCallbackHandler()  
    openai_callback_var.set (cb)  
    yield cb   
    openai_callback_var.set(None)

Upon closer examination of the aforementioned function, it becomes evident that the data LangChain utilizes for price calculation is, in fact, represented by the variable ‘MODEL_COST_PER_1K_TOKENS.’

So if we overwrite that instead of importing that from langchain_community.callbacks.openai_info.

We can create our file as:

Standard Pricing  
  
MODEL_COST_PER_1K_TOKENS = {  
    # GPT-3.5 input  
    "gpt-3.5-turbo": 0.0015,  
    # GPT-3.5 output  
    "gpt-3.5-turbo-completion": 0.002,   
}  
  
Custom pricing with 10% discount  
  
MODEL_COST_PER_1K_TOKENS = {  
    # GPT-3.5 input  
    "gpt-3.5-turbo": 0.00135,  
    # GPT-3.5 output  
    "gpt-3.5-turbo-completion": 0.0018,   
}

Now look at the full code in action

from __future__ import annotations  
from abc import ABC   
from contextlib import contextmanager   
from typing import Any, Generator  
from langchain_community.callbacks.manager import openai_callback_var   
from langchain_community.callbacks.openai_info import OpenAICallbackHandler, standardize_model_name  
from langchain.schema import LLMResult  
  
##Custom pricing with 10% discount  
  
MODEL_COST_PER_1K_TOKENS = {  
    # GPT-3.5 input  
    "gpt-3.5-turbo": 0.00135,  
    # GPT-3.5 output  
    "gpt-3.5-turbo-completion": 0.0018,   
}  
  
def get_openai_token_cost_for_model(model_name: str, num_tokens: int, is_completion: bool = False) -> float:  
    """  
      Get the cost in USD for a given model and number of tokens.  
      Args:  
      model_name: Name of the model num tokens: Number of tokens.  
      is_completion: Whether the model is used for completion or not.  
      Defaults to False.  
      Returns:  
      Cost in USD.  
    """  
    model_name = standardize_model_name (model_name, is_completion=is_completion)  
    if model_name not in MODEL_COST_PER_1K_TOKENS:  
        raise ValueError (f"Unknown model: {model_name}. Please provide a valid OpenAI model name.Known models are:"+  
                          ",".join (MODEL_COST_PER_1K_TOKENS. keys ())  
    )  
    return MODEL_COST_PER_1K_TOKENS[model_name] * (num_tokens / 1000)  
  
class OpenAICustomCallbackHandler(OpenAICallbackHandler,ABC):  
    def on_llm_end (self, response: LLMResult, **kwargs: Any) -> None:  
        """Collect token usage."""   
        if response. llm_output is None:  
          return None   
        self.successful_requests += 1  
        if "token_usage" not in response.llm_output:  
          return None  
        token_usage = response.llm_output["token_usage"]  
        completion_tokens = token_usage.get ("completion_tokens", 0)  
        prompt_tokens = token_usage.get("prompt_tokens", 0)  
        model_name = standardize_model_name (response.llm_output.get ("model_name", ""))  
        if model_name in MODEL_COST_PER_1K_TOKENS:  
            completion_cost = get_openai_token_cost_for_model(model_name, completion_tokens, is_completion=True)  
            prompt_cost = get_openai_token_cost_for_model(model_name, prompt_tokens)  
            self.total_cost += prompt_cost + completion_cost   
        self.total_tokens += token_usage.get("total_tokens", 0)   
        self.prompt_tokens += prompt_tokens   
        self.completion_tokens += completion_tokens  
  
@contextmanager  
def get_custom_openai_callback() -> Generator[OpenAICustomCallbackHandler, None, None]:  
    """Get the OpenAI callback handler in a context manager. which conveniently exposes token and cost information.  
    Returns:  
    OpenAICallbackHandler: The OpenAI callback handler.  
    Example:  
    with get_openai_callback() as cb:  
    # Use the OpenAI callback handler  
    """  
    cb = OpenAICustomCallbackHandler()  
    openai_callback_var.set (cb)  
    yield cb   
    openai_callback_var.set(None)

Once we have this custom callback let’s call this in our actual code

from langchain.chains import LLMChain  
from langchain_community.chat_models import ChatOpenAI  
from langchain.prompts import PromptTemplate  
from <path_to_above_code_file> import get_custom_openai_callback  
import os  
os.environ["OPENAI_API_KEY"] =<Your Open AI Key>  
llm = ChatOpenAI(model_name="gpt-3.5-turbo",max_tokens=323)  
template = """Question: {question}  
  
Answer: Let's think step by step."""  
  
prompt = PromptTemplate(template=template, input_variables=["question"])  
llm_chain = LLMChain(prompt=prompt, llm=llm)  
with get_custom_openai_callback() as cb:  
    question = "How to create a red sauce pasta"  
    llm_chain.run(question)  
    print(cb)  
---------  
Output:  
Tokens Used: 349  
 Prompt Tokens: 26  
 Completion Tokens: 323  
Successful Requests: 1  
Total Cost (USD): $0.0006165000000000001

Total cost: (($0.00135 * 26)+($0.0018*323))/1000 = $0.0006165000000000001

If we look at the above price it is at the 10% discount of the older price — $0.000685.

Voila! We have successfully modified the LangChain function to calculate the cost based on the pricing master that we are supplying.

Key Points

  • The pricing calculation logic in LangChain is currently designed exclusively for Open AI models.
  • At present, LangChain does not support pricing and token calculation for embedding.
  • It is necessary to modify the value of MODEL_COST_PER_1K_TOKENS based on the specific model being utilized in the application. Refer to the predefined configured values here for guidance in making the necessary adjustments.



Continue Learning