Created by the author
Google has just dropped their latest tech marvel, the Gemini Pro, and itâs making waves! Hold tight, because the even more powerful Gemini Ultra is on the horizon.
Let me tell you, the tech world is buzzing with comparisons. First, Claude 2 outdid GPT-4, then Google Bard, and now, Google Gemini seems to be leading the pack. This got me thinking â is Gemini really outperforming GPT-4?
Bloggers are abuzz, pointing at the table below, claiming Gemini has outdone GPT-4. Itâs based on a benchmark thatâs widely respected, and the numbers donât lie. Check it out:
But hereâs a little secret about benchmarks â theyâre not foolproof. Imagine if a language model is trained with the same data itâs tested on. Itâs like having a sneak peek at the exam questions. If you study just those, youâre bound to do well, right? That doesnât mean youâre smarter, just better prepared.
So, while Iâm not accusing Google of any tricks, we should think twice before taking these benchmarks at face value.
Now, letâs get down to business and compare Googleâs Gemini Pro with GPT-4, looking at various aspects. And hey, if you havenât tried Gemini Pro yet, itâs free! Check it out here:Â https://bard.google.com/. Plus, it hooks right into Google Bard. Letâs see how they stack up!
Table of Content
· Token Limit · Long Text Summarization · Knowledge Update · Reasoning · Math Skills · Networked Search Capabilities · Vision Capabilities
Token Limit
Token capacity is a big deal in the world of large language models. Simply put, itâs how many words a model can remember during a conversation. The higher the token limit, the longer and more detailed a conversation can be.
Imagine asking a model to summarize a book. The more words it can handle at once, the more comprehensive the summary. In ongoing conversations, a higher token limit means the model remembers better, making the chat more coherent.
In my tests, Gemini Proâs token limit was notably less than GPT-4âs. For instance, I asked it to turn a 45-minute talk from Sam Altman at the OpenAI DevDay into an article. Gemini Pro tapped out at the 30-minute mark.
The total word count from Gemini Pro was around 5327 words.
Plus, it offers a cool feature where you get three different responses to choose from, pushing the total word count to about 6000.
ChatGPT-4, however, handled it like a champ. I fed it subtitles from two 45-minute videos, and it converted them without breaking a sweat, hitting around 17,408 words.
And remember, GPT-4 Turbo can handle up to an incredible 128k in its context window!
Long Text Summarization
When it comes to text summarization, letâs see how Gemini Pro stacks up against GPT-4.
My task was to turn YouTube subtitles into an article. Gemini Pro opted for a straightforward summary of the subtitles, as shown in the image below.
On the other hand, ChatGPT-4âs output, while shorter, boasted a clear, well-structured format.
Hereâs another twist: Gemini Pro can actually summarize YouTube video content directly. Not too surprising, considering Google owns YouTube. The summary quality is pretty decent.
But GPT-4, especially with its VoxScript plugin, seems to have an edge in summarizing YouTube videos. Its summaries have a beginning, a structured middle, and an end, making them more cohesive.
Pulling out essential information from lengthy texts is a real test for any LLM. So far, Gemini Pro lags a bit behind GPT-4 in this arena. That said, Gemini Proâs knack for summarizing YouTube content is a handy feature, something that might remain unique due to OpenAIâs lack of direct access to YouTube.
Knowledge Update
In November, OpenAI gave ChatGPT a significant update, extending its training data up to April 2023. Theyâve also planned to keep this data fresh with ongoing updates. When I quizzed Gemini Pro about its last update, it played coy, labeling it âconfidential.â
Curiosity led me to test their knowledge on a newer topic: creating your own GPTs in ChatGPT. Gemini Pro responded well, but it left me wondering whether it drew from its own training data or relied on data from the Internet.
In contrast, GPT-4âs training data, as the below image shows, doesnât cover how to create GPTs.
The game changer? Networking. Once I enabled GPT-4âs networking feature, it delivered a more detailed and accurate response to the same question, outperforming Gemini Pro in this aspect.
Reasoning
Evaluating a large language modelâs human-like thinking starts with logic problems. Take this one, for example:
Can Jackâs move from Class A to Class B lead to an increase in the average IQ of both classes? Explain why or why not.
The solution? Just consider the case where Jackâs IQ is lower than class Aâs average but higher than class Bâs. The answer, quite simply, is yes.
Now, letâs see how ChatGPT-4 fared:
GPT-4 hit the nail on the head with a perfect answer.
Gemini Proâs attempts, however, missed the mark. Hereâs one of its better tries:
Moving on to general knowledge, drawn from Yejin Choiâs TED talk, we find another illustrative example. The question is straightforward:
I have a 12-liter jug and a six-liter jug, and I want to measure six liters. How do I do it?
The solution is just as simple: you can make 6 liters in a 6-liter pot.
GPT-4âs response might be a bit wordy, but it gets there:
Gemini Pro, however, takes a longer route:
Hereâs another general knowledge question:
I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes.
The answer is commonly known â it takes only 5 hours.
Gemini Proâs response was off, even using Python to do the math.
ChatGPT-4, on the other hand, provided the correct answer:
This comparison reveals a notable gap in reasoning and common sense between Gemini Pro and GPT-4.
Math Skills
Math skills are a crucial part of a LLMâs reasoning capability. Take this indefinite calculus question, for instance. Originating from the University of California, Berkeley, this question has four correct answers.
Gemini Proâs attempts varied. Its first draft missed one answer.
Then, Gemini Pro Draft 2 interestingly included an extra answer.
However, its third attempt was entirely off the mark.
In contrast, GPT-4 showcased its prowess by providing the correct answer.
This exercise clearly demonstrates GPT-4âs superior mathematical and reasoning abilities compared to Gemini Pro.
Networked Search Capabilities
Gemini Pro really impressed me with its skills in complex networked search tasks! I put it to the test with a specific request:
Prompt: Research sustainable packaging solutions for a fictitious consumer goods company. Identify the top 3 sustainable packaging suppliers and provide a report on their offerings, pricing, and environmental impact.
And Gemini Pro nailed it.
What stood out was its ability to summarize information into tables that can be easily downloaded to Google Sheets. Super handy!
On the other hand, GPT-4âs native networking abilities fell short in this area. Even with the WebPilot plugin, GPT-4 couldnât match Gemini Proâs performance, lacking in table generation and access to certain data like prices.
This aspect of Gemini Pro, powered by Googleâs robust search engine, poses a real challenge for GPT-4. Itâs a wake-up call for Microsoft and OpenAI to up their game in networked search capabilities.
Vision Capabilities
Gemini Pro, like GPT-4, is a multimodal model with impressive capabilities, including image recognition. I put this to the test using images and questions from the Paper âThe Dawn of LMMs: Preliminary Explorations with GPT-4V(ision).â
When presented with a particular image and asked about its humorous aspect, Gemini Proâs response was off the mark.
It incorrectly identified a rhino horn as a turtle, missing the actual humor in the image.
In contrast, GPT-4 demonstrated a precise understanding of the imageâs content and meaning.
I also tested their ability to interpret another image, asking them to describe its content, the Chinese characters, and identify the foodâs city of origin.
Gemini Proâs answer was:
While GPT-4 provided this response:
This comparison suggests that while Gemini Pro may have stronger multilingual capabilities, GPT-4 shows a slightly better grasp of graphic literacy, understanding and interpreting images more accurately.
Summary
Gemini Pro marks a significant leap forward from its predecessor, Bard. While Bard was hardly my go-to choice (except for the occasional map reading), Gemini Pro has brought much more to the table. Although it doesnât quite reach the heights of GPT-4, it certainly outperforms GPT 3.5 in several key areas.
The burning question is: how will Gemini Ultra stack up against GPT-4? My bet is that itâll still trail slightly behind GPT-4.
For users, the availability of a free tool like Gemini, alongside ChatGPT and Claude 2, is definitely a win. Itâs an exciting time in the world of AI, with each new development offering more versatility and utility.
Loved the article?
If yes, then:
- Leave a comment
- Follow my updates
- Free email alert