LLMs struggle to access and manipulate knowledge effectively, as they cannot readily expand or update their memory. Moreover, they may produce erroneous outputs known as “hallucinations” and often fail to provide clear insights into their predictions.
To solve the limitations of LLMs, Retrieval Augmented Generation (RAG) has gained significant attention and is redefining the way we approach text generation tasks.
In this easy-to-follow guide, we are going to see a complete example of how I use AutoGen With Retrieval Augmented Generation (RAG).
Before we start! 🦸🏻♀️
If you like this topic and you want to support me:
- Clap my article 50 times; that will really help me out.👏
- Follow me on Medium and subscribe to get my latest article🫶
- If you prefer video tutorials, please subscribe to my YouTube channel where I started to convert most of my articles to visual demonstrations.
What is Retrieval Augmented Generation (RAG)?
RAG is an AI framework that retrieves facts from an external knowledge base and helps pre-trained large language models generate more accurate, up-to-date information and reduce hallucinations.
Retrieval-Augmented Generation, or RAG, combines the concepts of searching (‘Retrieval’), enhancement (‘Augmented’), and sentence creation (‘Generation’). When these elements are integrated, RAG becomes text generation enhanced by search
To summarize a little more neatly, This technology searches for facts from an external database and uses a large-scale language model (LLM) to generate answers based on the searched information.
Why Do We Need Retrieval Augmented Generation (RAG)?
LLM acquires knowledge based on the large amount of text data that exists in the world during pre-learning.
Therefore, the answer cannot be given from information that does not exist at the time of learning, or from information that exists but has not been learned, because it has not been learned. This is the same for humans, isn’t it?
On the other hand, if you use RAG, the LLM can answer the question without having studied it in advance.
Example: Comparing RAG to an exam
I’m sure you all have experience studying for exams. You gain a lot of knowledge in the process of studying. This is the same as pre-learning for LLM.
Then, you will be asked questions on the exam and you will have to answer them. This is similar to when a human asks the LLM a question and the LLM answers it.
Now, what would you do if you were asked a question that you didn’t study? Normally you can’t solve it.
This is where RAG comes into play. RAG will help you solve the problem. For example, they allow you to use your smartphone to search the internet. Or they can give you a book about the problem. By using these tools, LLMs will be able to answer questions without prior training.
Let’s Try Retrieval Augmented Generation (RAG) with Autogen
1. Data Preparation
Here, I decided to use OnePiece’s Wikipedia text. As for downloading, let’s try using AutoGen.
from autogen import AssistantAgent, UserProxyAgent
config_list = [
{
"model": "gpt-3.5-turbo",
"api_key": "<Enter OpenAI API key>"
}
]
assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent("user_proxy", code_execution_config={
"work_dir": "coding",
"use_docker": False})
user_proxy.initiate_chat(
assistant,
message="""Extract the text data of the following Wikipedia
page and save it as a file named Artificial_intelligence.txt.
https://en.wikipedia.org/wiki/Artificial_intelligence
rewrite it in better way""")
The obtained text data includes Wikipedia text. This time, I only needed to understand the story, so I manually deleted all sentences that didn’t seem like a synopsis. Preparation is now complete.
> How TaskWeaver Can Be Used To Create A Super AI Agent 2023
2. install
When using RAG, the packages seem to be different, and you need to install the following:
pip install "pyautogen[retrievechat]"
The code is below. Use RetrieveAssistantAgent and RetrieveUserProxyAgent as agents. In RetrieveUserProxyAgent, input a parameter called retrieve_config related to Retrieve.
The task has qa and code modes. ChromaDB is used as the client, and if you want to use another DB, you can override retrieve_docs.
It seems that chunk_mode has one_line and multi_lines, but the details were not written in the manual, so I guess I’ll have to read the code.
As for the embedding_model, by default, it seems that all-MiniLM-L6-v2 can be used, and the higher-performance all-mpnet-base-v2 can be used.
It seems that you can also use other models such as those found Here.
import chromadb
from autogen.agentchat.contrib.retrieve_assistant_agent import
RetrieveAssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import
RetrieveUserProxyAgent
config_list = [
{
"model": "gpt-3.5-turbo",
"api_key": "<Enter OpenAI API key>"
}
]
assistant = RetrieveAssistantAgent(
name="assistant",
system_message="You are a helpful assistant.",
llm_config={
"request_timeout": 600,
"seed": 42,
"config_list": config_list,
},
)
# The text file of the synopsis of One Piece that was saved earlier.
corpus_file = "Artificial_intelligence.txt"
ragproxyagent = RetrieveUserProxyAgent(
name="ragproxyagent",
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
retrieve_config={
"task": "qa",
"docs_path": corpus_file,
"chunk_token_size": 2000,
"model": "gpt-3.5-turbo",
"client": chromadb.PersistentClient(path="./db"),
"collection_name": "natural-questions",
"chunk_mode": "one_line",
"embedding_model": "all-MiniLM-L6-v2",
},
)
ragproxyagent.initiate_chat(
assistant,
problem="What is Artificial_intelligence ", n_results=30
)
The results were as follows. Looks like it’ll work fine.
Artificial intelligence (AI) is the intelligence of machines or software,
as opposed to the intelligence of humans or animals. It is also the
field of study in computer science that develops and studies
intelligent machines. "AI" may also refer to the machines themselves.
So what do you think about this question? I’m sure everyone is curious.
assistant.reset()
ragproxyagent.initiate_chat(assistant,
problem="could you please tell me about the Social intelligence",
n_results=30
)
Affective computing is an interdisciplinary umbrella that comprises systems
that recognize, interpret, process or simulate human feeling, emotion and
mood.[63] For example, some virtual assistants are programmed to speak
conversationally or even to banter humorously; it makes them appear more
sensitive to the emotional dynamics of human interaction, or to otherwise
facilitate human–computer interaction.
Next, try asking this question. What happens if you ask a question outside of Artificial intelligence?
assistant.reset()
ragproxyagent.initiate_chat(assistant,
problem="Discord vs telegram which one is the best ?", n_results=30
)
1st time
UPDATE CONTEXT
Second time
UPDATE CONTEXT
3rd time (If I don''t understand, will the question be answered as is?)
Discord vs telegram which one is the best ?
It’s quite remarkable how the system works when an answer isn’t found in the retrieved document, Retrieve will be executed again. Isn’t it quite excellent?
This process is detailed in an extensive log, which indicates the specific context being used for each search.
When the system doesn’t find the needed information, it updates its approach and reattempts the search.
the system responds with an “UPDATE CONTENT” message. as seen below and then restarts the process.
Conclusion:
I’ve also experimented with RAG, similar to other libraries, in conjunction with AutoGen. What sets this RAG apart is its Agent-like capabilities, notably its ability to research when initial results are not found, showcasing its efficiency.
In our next exploration of RAG, we’ll delve deeper by using a library reference manual to generate code.
More ideas on My Homepage:
🧙♂️ We are AI application experts! If you want to collaborate on a project, drop an inquiry here, stop by our website, or book a consultation with us.