Over the past few months working as an Agentic AI Engineer at a startup, I learned something:
Better prompts do not solve most problems with LLMs.
But these four things stood out that I thought everyone must know about ChatGPT and Large Language Models:
#1. Prompting Isn’t Everything
Photo by Berke Citak on Unsplash
Apart from what may seem obvious at first, prompting isn’t going to fix your problems, although it can mitigate them.
Other factors also affect whether the model can generate high-quality outputs or hallucinate, such as tokenization strategies, the number of dimensions used for word embeddings, training data, etc.
For example, if a model has never been heavily trained on coding data, it won’t be able to generate any good-quality code, if able to generate any.
The Fix: Research the model
Look at the model’s documentation and other people’s reviews of the specific model for similar use cases. If you think your choice is good, go ahead and use it.
Otherwise, there are tons online to choose from!
#2. Just Simplify the Agent’s Task
Photo by Alexander Zvir on Unsplash
Although the quote “A jack of all trades is a master of none, but oftentimes better than a master of One” is usually better suited for engineering roles, but in the case of LLMs and Agentic AI, it backfires beautifully.
Agents follow the “Do one thing and do it best” approach much better than the “Do everything in a mediocre way” approach.
I ran into a problem where my manager had assigned two duties to a single agent: do X, then do Y.
The Problem: The agent did X without a problem, but skipped Y, or sometimes, went straight to Y, skipping X.
The Fix: Single-Task Agents
Agents work best when their roles and responsibilities are clearly defined with explicit boundaries — where their responsibilities begin and where they end.
So, after two weeks of discussion with my manager, I convinced him to let me split the responsibilities into two separate agents.
Worked like a charm!
So, the next time you think your agent isn’t performing up to your expectations, think whether the work assigned to it can be further divided into smaller tasks. If yes, time to create more specialized agents.
#3. Maintaining the Context Window is a Nightmare
Photo by Jason Leung on Unsplash
What troubled me most was the fact that the LLM either generated too vague outputs or too restrictive ones.
The prompts, along with parameters like max tokens and temperature, either gave the model too much freedom or restricted it too heavily.
And neither approach produced the desired results.
The results were either a mess of inconsistencies or extremely simple; it looked like the model didn’t even try.
The Problem: The context window was getting filled with each API call just to retain the previous context from the previous API calls. There was no room to feed the current context without losing the previous context, which resulted in inconsistencies as the context window overflowed.
The Fix: DSL
Domain-specific language (DSL) came to the rescue.
Feeding the LLMs just the right amount of information can be a hassle and can take a lot of trial and error to get it right.
DSL fixed that problem as its syntax was easy enough for the LLM to know what to do and infer enough information from it to generate the right amount of code.
You don’t need a fully custom-made DSL for your use case. You only need easy syntax for the LLM to understand it without getting confused. Often, you don’t even need to explain the keywords — the model can infer their meaning from context.
Below is an example of the DSL I worked on:
SCREEN cart
TITLE "Your Cart"
ROUTE /cart
AUTH roles=[user]
CONST {
CART_TITLE = "Your Cart"
EMPTY_CART_MESSAGE = "Your cart is empty."
CHECKOUT_BUTTON_LABEL = "Proceed to Checkout"
CONTINUE_BROWSING_LABEL = "Continue Browsing"
TOTAL_LABEL = "Total"
REMOVE_ITEM_LABEL = "Remove"
}
STATE {
items: array<CartItem>
total_price: number
}
HANDLERS {
fetchCartItems(reads: none; writes: none)
}
VIEW items_view {
when items not empty
show list of Card
display name
display quantity
display price
}
VIEW items_empty {
when items empty
show text EMPTY_CART_MESSAGE
}
SECTION text {
show text CART_TITLE
}
SECTION lists {
list items
}
SECTION navigation {
nav /checkout label=CHECKOUT_BUTTON_LABEL
nav /restaurants label=CONTINUE_BROWSING_LABEL
}
FLOWS {
on_load: fetchCartItems
}
This DSL dropped the token count for each API call from ~20,000 tokens to ~1,600 and resulted in better outputs.
#4. One/Few Shot Prompting is Great for Structural Adherence
Ever run into a problem where the LLM was generating the response you wanted, but not in the structure you wanted?
Well, one/few shot prompting is your best friend in this situation!
LLMs are great at following examples, just like us!
If you think the LLM isn’t giving you the response in your desired structure, just feed it a couple of examples within the prompt, and you’re good to go.
Conclusion
LLMs have integrated themselves within the software engineering industry, and there’s no denying that.
No matter where you go, there’ll always be some kind of LLMs working in the background, automating some aspects of the workflow.
The best thing would be to treat these problems like a system design problem — design your agents with only one task in mind, manage the context window with proper structure, and ensure LLMs generate consistent outputs using one/few-shot prompting, etc.
Until next time,
Bye!
Comments
Loading comments…