Build awareness and adoption for your software startup with Circuit.

ChatGPT Prompt Engineering, “Let’s Think Step by Step”, and other Magic Phrases


Ai image of a colourful robot zebra

Providing examples and context in your prompts can help LLMs reply more accurately.

Hi folks, in this week’s post, I’m going to discuss “prompt engineering”, what it is and some techniques and “magic phrases” you can use to get better results from Large Language Models (LLMs) like ChatGPT.

Whether you’re a power user, researcher, or entrepreneur setting up a new AI business, “Prompt Engineering”, is a topic you’ll need to master to get the best results. This article will give you a solid grounding and signpost you to further resources.

If you’ve ever interacted with ChatGPT or any other Large Language Model (LLM), you may have encountered instances where the response was not exactly what you expected, or, indeed, up to scratch.

You might have been given incorrect, or in some instances, made-up information, as I’ve discussed at some length in previous newsletters, or even stumbled across some mysterious “unspeakable tokens” that ChatGPT is unable to reproduce.

One of the problems is that LLMs have got so damned complicated, and the “black box” neural nets through which they derive their answers are so opaque, that users are still figuring out how to get the best results from them.

Typing your question or query into an LLM like ChatGPT is also known as “prompting”, and simply by changing the way you phrase your prompt, can give you very different results.

For example:

Remember this simple question I posed ChatGPT in the last post, which it got wrong (and still gets wrong as of writing).

Screen grab of ChatGPT getting the wrong answer to a question

Nice try, but we can do a lot better with a bit of “prompt engineering”. Read on!

What “magic text” can you add to your prompt to make it reply with the correct answer? Find out below how to get exactly the right result every time with a little bit of “zero-shot chain of thought” Prompt Engineering (… and it’s not as complex as it sounds, honest!)

Enter the “Prompt Engineer”

Because of the variability of results from LLMs, like ChatGPT, the AI industry has recently come up with a new specialist role, the “Prompt Engineer”, whose job it is to extract the maximum value from AI models.

This role, typically recruited from people with a “hacker mindset”, consists of constructing carefully phrased prompts that explain to the AI model exactly what they want and how they want it.

The role is now so in demand, that AI companies, flush with cash, are paying hundreds of thousands of dollars to people with the right skills, or aptitude.

Screen grab of a Job ad at Anthropic AI

What are some techniques to use in Prompt Engineering?

It’s tempting to imagine that the work of a Prompt Engineer involves spending all day long at a computer shooting riddles at ChatGPT to see if it can solve them, and then adjusting the model to get the best results.

Whilst most AI companies have teams responsible for testing the pre-trained model and implementing safety measures to reduce bias and hateful outcomes, this process typically occurs before the AI model is made available to the general public.

image

Prompt engineering, on the other hand, is a more dynamic process that involves more strongly guiding the model on a prompt-by-prompt basis sometimes (known as “few-shot learning”, more later) or through creating a fine-tuned model with specialised learning “prompt-completion pairs” uploaded from a file (again, more later).

While ChatGPT is based on a natural language generation model and doesn’t require specific prompts, it does perform much better when you provide more context and specific examples.

In fact, the more information you provide, the better that ChatGPT and other LLM models can understand what you’re asking and can provide a more accurate response in return.

In the language models developed by OpenAI, there are two primary techniques used to activate its vast store of knowledge and improve the accuracy of responses to prompts.

These techniques are known as “few-shot learning” and “fine-tuning”.

Let’s take a look at them, next.

Few-shot learning

The oddly named “few-shot” learning is where the model is provided with a number of explicit examples that are used to strongly guide the model generation to output answers for desired tasks like recognising and classifying a new object or concept.

Typically less than 10 examples are used, but numbers can vary. For learning where there is only one example, you might also hear it being called “one-shot” learning.

Few-shot learning in OpenAI models can be implemented at both the ChatGPT prompt, as well as programmatically by calling the OpenAI API (Application Programming Interface) “completion” endpoint.

TECH TIP

If you’re a coder, you’ll find the OpenAI REST API super easy to access, you just need to register and get an API token first. Tip: If you’re stuck as to what code to write, just ask ChatGPT to write it for you, in the language of your choice (Python, Javascript, C#, etc.), and copy and paste it into your editor :)

For example, using prompt-based few-shot learning in ChatGPT to classify movie genres, you can simply add a selection of examples to your prompt, as follows:

Screen grab of ChatGPT with few-shot learning examples

Providing these examples as context will help ChatGPT correctly classify the new movie description, “A group of astronauts on a mission to save humanity from extinction by colonizing a new planet.”, as “Sci-Fi/Adventure” genre.

Of course, it wouldn’t make much sense to type all this out just to get a single answer, but with ChatGPT’s conversational interface, you can continue to ask it questions until it, err … bombs out! (well, it is a preview/beta version, after all ;).

Screen grab of ChatGPT guessing the correct genre of movie from a description

A more common approach for supplying few-shot learning examples, however, is to encode them in a call to OpenAI’s “completion API”, as discussed above, which allows you to tune prompts for all sorts of use cases, from being a friendly chatbot to an expert wine sommelier that pairs wine with food (of which this latter use case really has wine writers running scared).

ChatGPT and wine: extinction-level event for wine writers and sommeliers?

STOP PRESS!

As I’ve been writing this, a Stanford University student, Kevin Liu, claims to have “hacked” (using prompt engineering) the new Microsoft Bing Chat, which is powered by OpenAI, to reveal its “origin” prompts.

The Microsoft developers allegedly called the new Bing Chat app “Sydney”. The conversation Kevin had with Syndey, aka Bing Chat (if it’s not an early April Fool’s joke), is hilarious and worth taking a look at the full transcript on Twitter.

Here’s a sample of the “prompts” that Kevin alleges the Microsoft engineers used to craft Bing Chat on the OpenAI model:

Kevin found instant viral fame with his “Sydney” aka Bing Chat Tweet

A sample of the alleged prompts used to craft Microsoft’s new Bing Chat.

The “genius in the room” mental model

Jessica Shieh, an AI Strategist at OpenAI, recommends an approach when learning prompt engineering called the “genius in a room” mental model.

The genius in the room model assumes that the AI model (the genius) doesn’t know anything about you other than what you write on a piece of paper and slide under the door (the prompt).

Once you visualise this, you get a more realistic idea of what an LLM like ChatGPT is capable of and what it requires to reply with an accurate result.

image

For example,

Using this mental model, it becomes obvious that the more context you provide the “genius”, the better the answers you will get from it. And so it is when writing a prompt for an AI model.

Jessica recommends three best practices when constructing a prompt to extract the most relevant answers from ChatGPT, as follows:

  1. Explain the problem you want the model to solve
  2. Articulate the output you want — in what format (“ answer in a bulleted list”), in what tone/style (“answer the question as a patient math teacher…”)
  3. Provide the unique knowledge needed for the task

Zero-Shot Chain of Thought (CoT) Prompting

You may also hear people talking about “zero-shot” learning, where a model is able to classify new concepts or objects that it has not encountered before.

For instance, if an AI model is pre-trained to recognise horses and is given an image of a zebra, it can still identify the zebra as a type of striped horse, despite never having seen one before.

image

In all cases, the goal with “x-shot” (whether that be “zero”, “one”, or “few”), is to learn from limited data and make accurate predictions on new, unseen examples.

In ChatGPT, it’s been noted that to get more accurate answers, you can use an incredibly simple technique, a hack even, known as a zero-shot chain of thought.

To use this technique, all you need do is append the words:

let’s think step by step”, or

thinking aloud

At the end of your prompt!

Almost as if by magic, from simply adding this additional text to the prompt, ChatGPT acquires the context to help it extract more accurate answers.

(There’s an example of how to use this further down the article).

Fine-tune learning

Unlike the “x-shot” learning techniques mentioned above, which only guide the model based on a limited number of examples, fine-tuning involves training your model on a much larger set of data, but the drawback is that it does require some coding/command line skills to implement.

This additional training generally allows the model to perform better and achieve better accuracy on a wider range of specific tasks than raw, or few-shot trained models.

image

Fine-tuning is typically used to tune a pre-trained base model, like OpenAI’s powerful davinci model, to a specific use case, for example, digital marketing, contract law, or some other domain. The fine-tuned model can be used either internally, or to sell as an AI service to clients or as part of a larger SaaS offering.

image

Once you’ve fine-tuned your model, you won’t have to worry about providing examples in the prompt anymore. This not only saves costs, as each query costs money based on the number of tokens used, but also makes the model more efficient, reducing latency time for requests. The model is also private to you/your corporation and only accessed by those with the secret API token.

To get the best results from fine-tuning, it’s essential to provide a solid foundation of high-quality examples in JSONL text file format. The more examples you provide, the better your model will perform.

A minimum of a few hundred examples should be your starting point, and for even better results, these examples should be reviewed and approved by domain experts.

The good news is that as you add more examples, OpenAI claims you’ll see a corresponding linear increase in performance. So, if you’re looking to get the most out of your fine-tuning efforts, the more examples you can provide, the better (note that the maximum fine-tune file size is between 80–100 MB).

Now, back to our failed ChatGPT conundrum

If we now return to the simple question,

what is the 4th word in the phrase “I am not what I am”

By using the zero-shot chain of thought hack I covered above, and simply appending “let’s think step by step” to the prompt, guess what we get?

image

ChatGPT failed to answer this question correctly until I appended the words “let’s think step by step” to the prompt!

Bingo!

It’s as if, almost by magic, we now get the correct result every time.

Try it yourself, and let me know if it works.

(Note: if you experiment with different combinations of prompts before trying this out, be sure you start a fresh chat each time so as not to influence the next attempt, as your previous conversation with ChatGPT might alter the results).

Summary

Prompt engineering is a rapidly changing and advancing area of LLMs. What doesn’t work well in a model today, like math and logic, may well work tomorrow!

To get the best from an LLM, like those offered by OpenAI, and especially if you are developing your own AI-powered apps, either for internal or client use, you will need to guide the LLM model further through either few-shot learning or the fine-tuning techniques discussed here.

What I’ve covered here is just the tip of the Prompt Engineering iceberg, however, and I encourage you to explore more if you are interested (see resources below).

Who knows, it might one day lead to a new, but very well-paid career. However, thinking out loud for a moment, it’s unclear to me currently just how long this new career will last?

With LLMs advancing in power and sophistication, some say smashing Moore’s Law from doubling in power every 2 years, to every 6 months(!), surely, at some point, they will become so smart, they will be able to anticipate our needs without all the prompt engineering hacks?

Only time will tell!

Resources

From OpenAI

From around the web

<blockquote class="twitter-tweet">
  <p lang="en" dir="ltr">
    The entire prompt of Microsoft Bing Chat?! (Hi, Sydney.){" "}
    <a href="https://t.co/ZNywWV9MNB">pic.twitter.com/ZNywWV9MNB</a>
  </p>
  &mdash; Kevin Liu (@kliu128)
  <a
    href="https://twitter.com/kliu128/status/1623472922374574080?ref_src=twsrc%5Etfw"
    >February 9, 2023</a
  >
</blockquote>
<script
  async
  src="https://platform.twitter.com/widgets.js"
  charset="utf-8"
></script>



Continue Learning