Solving Math Based (Hard) Problems with Large Language Models (LLMs)

Problem Statement

Math problems are not the strongest suit of Large Language Models and Generative AI solutions in general. However, we seldom end up with scenarios and situations where “math” saves the day. I have been (off late) in a couple of projects where there are few requirements around math based solution within the Large Language Model solution.

This allowed me to think through and understand what options do we have and what can be potential answers. This blog is a jumpstart of that thought. I plan to add code snippets and solution overview for each of the topics to provide an analysis-paralysis of the different options.

Solution

Let me start by dividing the problem area into two segments. First we can talk about the hart core and direct math problems. In this scenario, the applications need to perform calculations and solve for equations. We can try to understand what options do we have here.

Second, we can discuss more in terms of logical reasoning, the calculations might take a back seat but what is more important and critical here is the ability to solve a problem in logical steps. Mind you, some of the steps might include direct calculations but here the focus is to generate a logical reasoning and solve a equation through a series of logical thinking.

Finally, we can wrap up this blog (as of today, with promise of updating this with code etc.) on what exists in the market. Some of them might not be as impressive as others, but nonetheless, we need to include them for completeness !

Segment 1

In this section, where we are solving for classic math problems, we have 3 options.

  • One, we can use dedicated packages from python libraries like langchain to build LLM chain specifically for math. Chains like MathChain are specially designed to take out the guess work and generate end to end answers. For example, when asked with questions like what is 13 raised to the power of .32, yields correct and accurate results.
  • Two, we can use external calculator plugins that can be added to LLM agents to use special functions to generate the same result. The difference is that with these tools (like ‘pal-math’), we can get 0-shot prompting to get the required results. These tools are independent packages that are maintained and validated for math problems and would typically generate better results than custom coding.
  • Three, we can go a bit of classic way and use output parser or tagging chain to generate and validate the response from LLM’s in a specific structure and format. This will ensure that we can take the result from LLM and pass that to any python class/ function.

Segment 2

In this section, where we are solving for logical reasoning problems, we have 4 options.

  • One, we can PALChain, which is a specialized package that can help decipher and deduce natural language problems into series of maths and logical question and then solve them in sequence. This internally performs 2 steps. Firstly, to call the LLM models and second, to use the response to construct the result.
  • Two, we can use the OpenAI (internal) functions calling feature to generate a structure of input parameters for our custom and registered functions. This is a new and noval way to solve logical reasoning and match problems. The LLM will do the hard work of organizing the inputs in right way but the business logic needs to be implemented independently (and sometimes can be solved by another LLM call !)
  • Three, we can use the python formatter class (like the pydantic) to programatically format the results form LLM call to suit our needs and solve for the math problem. This process is simplest and efficient if we are using state of the art models like GPT4.
  • Four, for more specific logical reasoning problems, we can use “chain or thought” or “tree of thought” prompting techniques that can help convert our prompts and the LLM interactions in a more logical and sequential way to solve a hard problem.

Finally, let us also talk about some of the new 3rd party options we have (both licensed and open source options). Firstly we can use MathGPT which was using GPT3 behind the scene and was quite popular. Or we can custom train/ fine tune a foundational model like Llama 7b or 70 b (if we have enough GPUs). In my opinion, this is the best route of GPT4 with above processes are not living up to the challenge. Or lastly we can use something like Math Chat API’s.

Conclusion

Either way, math based problems remain a research item for many use cases. Havng said that, I have implemented a few of these technqies in production with acceptable success. We need to vet out what works for us and what does not for a use case. So keep researching and experimenting and do drop a note on what worked for you !

Continue Learning

Discover more articles on similar topics