Created by the author
Google has just dropped their latest tech marvel, the Gemini Pro, and it’s making waves! Hold tight, because the even more powerful Gemini Ultra is on the horizon.
Let me tell you, the tech world is buzzing with comparisons. First, Claude 2 outdid GPT-4, then Google Bard, and now, Google Gemini seems to be leading the pack. This got me thinking — is Gemini really outperforming GPT-4?
Bloggers are abuzz, pointing at the table below, claiming Gemini has outdone GPT-4. It’s based on a benchmark that’s widely respected, and the numbers don’t lie. Check it out:
But here’s a little secret about benchmarks — they’re not foolproof. Imagine if a language model is trained with the same data it’s tested on. It’s like having a sneak peek at the exam questions. If you study just those, you’re bound to do well, right? That doesn’t mean you’re smarter, just better prepared.
So, while I’m not accusing Google of any tricks, we should think twice before taking these benchmarks at face value.
Now, let’s get down to business and compare Google’s Gemini Pro with GPT-4, looking at various aspects. And hey, if you haven’t tried Gemini Pro yet, it’s free! Check it out here: https://bard.google.com/. Plus, it hooks right into Google Bard. Let’s see how they stack up!
Table of Content
Token capacity is a big deal in the world of large language models. Simply put, it’s how many words a model can remember during a conversation. The higher the token limit, the longer and more detailed a conversation can be.
Imagine asking a model to summarize a book. The more words it can handle at once, the more comprehensive the summary. In ongoing conversations, a higher token limit means the model remembers better, making the chat more coherent.
In my tests, Gemini Pro’s token limit was notably less than GPT-4’s. For instance, I asked it to turn a 45-minute talk from Sam Altman at the OpenAI DevDay into an article. Gemini Pro tapped out at the 30-minute mark.
The total word count from Gemini Pro was around 5327 words.
Plus, it offers a cool feature where you get three different responses to choose from, pushing the total word count to about 6000.
ChatGPT-4, however, handled it like a champ. I fed it subtitles from two 45-minute videos, and it converted them without breaking a sweat, hitting around 17,408 words.
And remember, GPT-4 Turbo can handle up to an incredible 128k in its context window!
Long Text Summarization
When it comes to text summarization, let’s see how Gemini Pro stacks up against GPT-4.
My task was to turn YouTube subtitles into an article. Gemini Pro opted for a straightforward summary of the subtitles, as shown in the image below.
On the other hand, ChatGPT-4’s output, while shorter, boasted a clear, well-structured format.
Here’s another twist: Gemini Pro can actually summarize YouTube video content directly. Not too surprising, considering Google owns YouTube. The summary quality is pretty decent.
But GPT-4, especially with its VoxScript plugin, seems to have an edge in summarizing YouTube videos. Its summaries have a beginning, a structured middle, and an end, making them more cohesive.
Pulling out essential information from lengthy texts is a real test for any LLM. So far, Gemini Pro lags a bit behind GPT-4 in this arena. That said, Gemini Pro’s knack for summarizing YouTube content is a handy feature, something that might remain unique due to OpenAI’s lack of direct access to YouTube.
In November, OpenAI gave ChatGPT a significant update, extending its training data up to April 2023. They’ve also planned to keep this data fresh with ongoing updates. When I quizzed Gemini Pro about its last update, it played coy, labeling it “confidential.”
Curiosity led me to test their knowledge on a newer topic: creating your own GPTs in ChatGPT. Gemini Pro responded well, but it left me wondering whether it drew from its own training data or relied on data from the Internet.
In contrast, GPT-4’s training data, as the below image shows, doesn’t cover how to create GPTs.
The game changer? Networking. Once I enabled GPT-4’s networking feature, it delivered a more detailed and accurate response to the same question, outperforming Gemini Pro in this aspect.
Evaluating a large language model’s human-like thinking starts with logic problems. Take this one, for example:
Can Jack’s move from Class A to Class B lead to an increase in the average IQ of both classes? Explain why or why not.
The solution? Just consider the case where Jack’s IQ is lower than class A’s average but higher than class B’s. The answer, quite simply, is yes.
Now, let’s see how ChatGPT-4 fared:
GPT-4 hit the nail on the head with a perfect answer.
Gemini Pro’s attempts, however, missed the mark. Here’s one of its better tries:
Moving on to general knowledge, drawn from Yejin Choi’s TED talk, we find another illustrative example. The question is straightforward:
I have a 12-liter jug and a six-liter jug, and I want to measure six liters. How do I do it?
The solution is just as simple: you can make 6 liters in a 6-liter pot.
GPT-4’s response might be a bit wordy, but it gets there:
Gemini Pro, however, takes a longer route:
Here’s another general knowledge question:
I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes.
The answer is commonly known — it takes only 5 hours.
Gemini Pro’s response was off, even using Python to do the math.
ChatGPT-4, on the other hand, provided the correct answer:
This comparison reveals a notable gap in reasoning and common sense between Gemini Pro and GPT-4.
Math skills are a crucial part of a LLM’s reasoning capability. Take this indefinite calculus question, for instance. Originating from the University of California, Berkeley, this question has four correct answers.
Gemini Pro’s attempts varied. Its first draft missed one answer.
Then, Gemini Pro Draft 2 interestingly included an extra answer.
However, its third attempt was entirely off the mark.
In contrast, GPT-4 showcased its prowess by providing the correct answer.
This exercise clearly demonstrates GPT-4’s superior mathematical and reasoning abilities compared to Gemini Pro.
Networked Search Capabilities
Gemini Pro really impressed me with its skills in complex networked search tasks! I put it to the test with a specific request:
Prompt: Research sustainable packaging solutions for a fictitious consumer goods company. Identify the top 3 sustainable packaging suppliers and provide a report on their offerings, pricing, and environmental impact.
And Gemini Pro nailed it.
What stood out was its ability to summarize information into tables that can be easily downloaded to Google Sheets. Super handy!
On the other hand, GPT-4’s native networking abilities fell short in this area. Even with the WebPilot plugin, GPT-4 couldn’t match Gemini Pro’s performance, lacking in table generation and access to certain data like prices.
This aspect of Gemini Pro, powered by Google’s robust search engine, poses a real challenge for GPT-4. It’s a wake-up call for Microsoft and OpenAI to up their game in networked search capabilities.
Gemini Pro, like GPT-4, is a multimodal model with impressive capabilities, including image recognition. I put this to the test using images and questions from the Paper “The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision).”
When presented with a particular image and asked about its humorous aspect, Gemini Pro’s response was off the mark.
It incorrectly identified a rhino horn as a turtle, missing the actual humor in the image.
In contrast, GPT-4 demonstrated a precise understanding of the image’s content and meaning.
I also tested their ability to interpret another image, asking them to describe its content, the Chinese characters, and identify the food’s city of origin.
Gemini Pro’s answer was:
While GPT-4 provided this response:
This comparison suggests that while Gemini Pro may have stronger multilingual capabilities, GPT-4 shows a slightly better grasp of graphic literacy, understanding and interpreting images more accurately.
Gemini Pro marks a significant leap forward from its predecessor, Bard. While Bard was hardly my go-to choice (except for the occasional map reading), Gemini Pro has brought much more to the table. Although it doesn’t quite reach the heights of GPT-4, it certainly outperforms GPT 3.5 in several key areas.
The burning question is: how will Gemini Ultra stack up against GPT-4? My bet is that it’ll still trail slightly behind GPT-4.
For users, the availability of a free tool like Gemini, alongside ChatGPT and Claude 2, is definitely a win. It’s an exciting time in the world of AI, with each new development offering more versatility and utility.
Loved the article?
If yes, then: