Compare Claude 3.5 Sonnet, ChatGPT 4o and Gemini 1.5 Pro
So, to test this claim, this detailed comparison was done. Like the previous comparison between Claude 3 Opus, GPT-4 and Gemini 1.5 Pro, the comparison evaluated reasoning ability, multimodal reasoning, code generation, etc. Let's find out in detail below. Please!
1. Find the drying time
Although this may seem like a basic question, start the test with this difficult reasoning question. LLMs tend to make frequent mistakes. Claude 3.5 Sonnet makes the same mistake and approaches the question mathematically. The model said it would take 1 hour and 20 minutes to dry 20 towels, which is incorrect. ChatGPT 4o and Gemini 1.5 Pro got the answer right when they said it would still take 1 hour to dry 20 towels.
If it takes 1 hour to dry 15 towels under the Sun, how long will it take to dry 20 towels?
Roughly translated: If you dry 15 towels in the sun for 1 hour, how long will it take to dry 20 towels?
Winning options : ChatGPT 4o and Gemini 1.5 Pro
2. Assess weight
Next, in this classic reasoning question, it's nice that all 3 models including Claude 3.5 Sonnet, ChatGPT 4o and Gemini 1.5 Pro all have the correct answer. A kilogram of feathers or whatever will always weigh more than a pound of steel or other materials.
What's heaviest, a kilo of feathers or a pound of steel?
Which is heavier, a pound of feathers or a pound of steel?
Winning options : Claude 3.5 Sonnet, ChatGPT 4o and Gemini 1.5 Pro
3. Word puzzles
In the next reasoning test, Claude 3.5 Sonnet correctly answers that David has no brothers and that he is the only male among the siblings. ChatGPT 4o and Gemini 1.5 Pro have the right answer.
David has three sisters. Each of them has one brother. How many brothers does David have?
=> David has three sisters. Each of them has a younger brother. How many brothers does David have?
Winning options : Claude 3.5 Sonnet, ChatGPT 4o and Gemini 1.5 Pro
4. Sort items
Then, the author of the article asked all three models to arrange these objects so that they were stable. Unfortunately all three are wrong. The models take an identical approach: First place the laptop, then the book, the bottle, finally the 9 eggs at the bottom of the bottle, which is impossible. The older GPT-4 model had the correct answer.
Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.
Here we have a book, 9 eggs, a laptop, a bottle and a nail. Tell me how to stack them so they don't fall over.
Winning options : None
5. Follow the instructions
In its blog post, Anthropic mentioned that Claude 3.5 Sonnet is excellent at following instructions, and that seems about right. It generates all 10 sentences ending with the word 'AI'. ChatGPT 4o also gets it right 10/10. However, Gemini 1.5 Pro could only produce 5 correct sentences out of 10. Google must drive the model for better guidance.
Generate 10 sentences that end with the word "AI"
Roughly translated: Create 10 sentences ending with the word "AI"
Winning options : Claude 3.5 Sonnet and ChatGPT 4o
6. Find details
Anthropic was one of the first companies to offer large context lengths, starting from 100K tokens up to today's context window of 200K. So for this test, the author provided a large text with 25K characters and about 6K tokens. The author added a detail somewhere in the middle of the text.
The author asked all three models for details, but only Claude 3.5 Sonnet found the answer, while ChatGPT 4o and Gemini 1.5 Pro did not. So for handling large documents, Claude 3.5 Sonnet is the better model.
Winning option : Claude 3.5 Sonnet
7. Check your eyesight
To test visual abilities, the author uploaded images of difficult-to-read writing to see how well the models could detect the characters and extract them. To our surprise, all three models did a great job and accurately identified the texts. Regarding OCR, all three models are quite capable.
Winning options : Claude 3.5 Sonnet, ChatGPT 4o and Gemini 1.5 Pro
8. Create games
In this test, the author uploaded an image of the classic Tetris game without revealing the name and only asked the model to create a game like this in Python. All three models guessed the game correctly, but only the code created by Sonnet ran successfully. Both ChatGPT 4o and Gemini 1.5 Pro fail to produce error-free code.
In just one go, the game ran successfully using Sonnet's code. Many programmers use ChatGPT 4o to assist with coding, but it looks like Anthropic's model may become the new favorite among programmers.
Claude 3.5 Sonnet has achieved 92% of the HumanEval benchmark to evaluate programming ability. In this benchmark, GPT-4o reached 90.2% and Gemini 1.5 Pro at 84.1%. Obviously, for programming, there is a new SOTA model and that is the Claude 3.5 Sonnet.
Winning option : Claude 3.5 Sonnet
After running various tests on all three models, the Claude 3.5 Sonnet is as good as the ChatGPT 4o model, if not better. Especially in the field of programming, Anthropic's new model is really impressive. It's worth noting that the latest Sonnet model isn't even Anthropic's largest yet.
The company says the Claude 3.5 Opus will launch later this year and perform even better. Google's Gemini 1.5 Pro also performed better than previous tests, which means it has improved significantly. Overall, it can be said that OpenAI is not the only AI doing well in the LLM field. Anthropic's Claude 3.5 Sonnet is a testament to that.
You should read it
- 4 ways AI Claude chatbot outperforms ChatGPT
- How to use Anthropic's new AI Claude 3 Prompt Library
- Claude or ChatGPT is the best LLM for everyday task?
- Anthropic Launches Claude 2: New Competitor for ChatGPT and Bard
- What is Forefront AI? Is it better than ChatGPT?
- Experience AI chatbots for free on the same website
- What is Llama 2? How to use Llama 2?
- WikiLeaks revealed malware of CIA hacks and spies on Linux computers
May be interested
- Anthropic Launches Claude 2: New Competitor for ChatGPT and Bardartificial intelligence startup anthropic makes its ai chatbot available to the public for the first time, creating a new rival to chatbots like openai's chatgpt and google's bard.
- Which is better Gemini Advanced or ChatGPT Plus?choosing between chatgpt and google gemini is not easy, especially since gpt-4 and gemini advanced (high-end large language models that power ai chatbots) perform similarly in some areas .
- What is Google Gemini? How does Gemini work?google constantly surprises us - this time with the gemini ai project. although gemini ai is still in development, the project intends to compete with openai's chatgpt application.
- How to use Anthropic's new AI Claude 3 Prompt Librarycreating ai prompts is a difficult skill to learn. it's easy to put any request into an ai tool like chatgpt, copilot or claude, but you don't always get the response you expect.
- Claude AI Registration Guide and How to Use Claude AIclaude ai is currently one of the popular ai chatbots with a similar user interface to other chatbots. here is how to register claude ai as well as how to use claude ai.
- 5 limitations Claude needs to improveclaude always impresses with thoughtful responses and insightful, genuinely helpful conversations. it often provides exactly the depth that many people need.
- Experience AI chatbots for free on the same websitesome solutions for you to use the smartest ai chatbots today, including chatgpt, gemini, claude, llama... on the same tool so you don't have to waste time switching back and forth between websites or applications. .
- Is ChatGPT or Gemini an AI chatbot that writes better code?if you're stuck in a programming project, you might be looking for a tool to help you brainstorm ideas, write clear code, or explain a complex concept.
- After ChatGPT, Apple wants to integrate Google Gemini into iOSapple hopes to bring more ai models like gemini to ios 18 so users can choose between different models.
- Compare ChatGPT 4o and ChatGPT 4openai recently announced its end-to-end multimodal model, chatgpt 4o, and made it freely available to everyone. not only that, free users also get many premium features previously exclusive to chatgpt plus users.