GPT-3.5
In the first test, ChatGPT 4o and ChatGPT 4 performed the same. Despite having access to a code interpreter, none of the models used it to calculate and answer straightforwardly by logical reasoning.
If it takes 1 hour to dry 15 towels under the Sun, how long will it take to dry 20 towels?
Roughly translated : If it takes 1 hour to dry 15 towels in the sun, how long will it take to dry 20 towels?
Winning options: ChatGPT 4o and ChatGPT 4
In the second reasoning test, ChatGPT 4o and ChatGPT 4 answered correctly and came to the correct conclusion. Both models answer the 4th floor.
There is a tall building with a magic elevator in it. When stopping on an even floor, this elevator connects to floor 1 instead. Starting on floor 1, I take the magic elevator 3 floors up. Exiting the elevator, I then use the stairs to go 3 floors up again. Which floor do I end up on?
There is a high-rise building with a magical elevator inside. When stopping at an even floor, this elevator will connect to the 1st floor. Starting from the 1st floor, I took the magic elevator up 3 floors. Getting out of the elevator, I used the stairs to go up 3 more floors. Which floor will I end up on?
Winning options: ChatGPT 4o and ChatGPT 4
Elevator test on ChatGPT 4o
The next test proved to be a challenge for many LLMs, but both ChatGPT 4o and ChatGPT 4 passed without any problems. Both models state 'One kilogram of feathers weighs more than one pound of steel'. In a recent comparison between ChatGPT 4o and Gemini 1.5 Pro, Google's AI model did not accurately answer this question.
What's heaviest, a kilo of feathers or a pound of steel?
Which is heavier, a pound of feathers or a pound of steel?
Winning options: ChatGPT 4o and ChatGPT 4
The author of the article asked ChatGPT 4o and ChatGPT 4 to generate 10 sentences ending with the word 'deep learning' and both models got it right 10 out of 10. Following the instructions, ChatGPT 4o and GPT-4 tied Llama 3 70B understands user intent and shows great alignment.
Generate 10 sentences that end with the word "deep learning"
Roughly translated : Create 10 sentences ending with the word "deep learning"
Winning options: ChatGPT 4o and ChatGPT 4
Test your ability to follow instructions on ChatGPT 4o
The final question is asked to determine whether both models demonstrate similar levels of intelligence. And indeed, ChatGPT 4o and ChatGPT 4 both give the correct answer with clear reasoning. Kudos to OpenAI for making the Omni model 2x faster than GPT 4 but still providing the same level of intelligence.
I had 3 apples today, yesterday I ate an apple. How many apples do I have now?
= > Today I have 3 apples, yesterday I ate one apple. How many apples do I have now?
Winning options: ChatGPT 4o and ChatGPT 4
After testing with both models, it can be seen that ChatGPT 4o is truly the class model of GPT-4. Both perform intelligently and are quite similar in their inference and association. In fact, OpenAI benchmark results show that ChatGPT 4o is one order of magnitude better than the ChatGPT 4 model. The LMSYS rankings also show the same.
ChatGPT 4o scored 88.7 MMLU points and the latest GPT-4 (gpt-4-turbo-2024-04-09) scored 86.5. This trend is similar in the HumanEval, MATH and GPQA benchmark tests. The only difference is the operating speed of ChatGPT 4o. It is 2 times faster and 50% cheaper than GPT-4.
For free ChatGPT users, it can be said that a limit of 10 messages every 5 hours is quite good. You can access the modern ChatGPT 4o model for free (plus many other premium features).
If you are a power user and regularly use ChatGPT for daily work, then registering will be a better choice. Recently some people had access to ChatGPT 4o on a free account but its performance was unsatisfactory. So for advanced users, go ahead and sign up for ChatGPT Plus.