What would happen if Claude fixed the faulty ChatGPT code?

Some time ago, someone asked Claude, ChatGPT, and Gemini to build a solar system simulator. That was around the time Claude was getting a lot of attention, and many people realized that perhaps they shouldn't limit themselves to just ChatGPT .

That test yielded one of the clearest results ever. Claude did very well and won convincingly. Gemini produced working code, but it wasn't particularly impressive, and interestingly, ChatGPT failed.

Today, let's examine a different parameter. This time, instead of writing code, let's assign the LLMs the task of debugging the code. Specifically, let's ask them to fix the ChatGPT code.

The ChatGPT code is broken.

A small but serious mistake.

The previous test had a simple constraint: No retrying. Whatever result was received on the first attempt would be used for evaluation. Unfortunately for ChatGPT, the code it generated had a small, subtle bug that rendered the entire program unusable.

ChatGPT uses kilometers for some distances, and then uses astronomical units (AU) for others. It then blends them together and ultimately displays space and diameter in AU, but the distances between planets in kilometers. The result is that planets are only a few kilometers apart, which, on the scale of the solar system, means they are essentially inside each other.

ChatGPT's code is quite clean and implements everything, but unfortunately, this small bug has rendered it unusable, preventing other aspects of the code from being evaluated.

Writing code and debugging code are two different skills. This is also true for people. A developer might be good at writing code, while another might be good at debugging. The same concept applies to other fields. A good writer isn't necessarily a good editor, and vice versa.

Let's get to the test. The question is given below. The problem is quite simple and asks them to clearly identify what error has been corrected so that we can easily understand.

Bạn được giao một đoạn code nguồn bị lỗi của một trình mô phỏng hệ mặt trời. Code nguồn chứa một lỗi khiến trình mô phỏng không thể sử dụng được. Nhiệm vụ của bạn: Xác định chính xác vấn đề gây ra lỗi. Sửa code nguồn. Trả lại phiên bản code nguồn đã được sửa lỗi hoàn toàn. Giải thích ngắn gọn lỗi ở đâu và cách bạn đã sửa nó. Giữ cho lời giải thích ngắn gọn và chỉ tập trung vào nguyên nhân gốc rễ và cách giải quyết.

Note : The primary initial test was to see if Claude could fix the ChatGPT source code, but to add context, ask all regular participants to do the same.

Gemini has fixed the code correctly.

Recently, Gemini has been the worst software among similar programs. In experience, even if you send it perfect code and ask it to fix the bug, it will create a bug, "fix" it, and destroy the code. Therefore, it's unknown whether it can accurately identify the actual bug in the broken code. To clarify, the example is using Gemini 3.1 Thinking.

Surprisingly, Gemini did it. It correctly identified the error: The projection instrument expected the position in kilometers, but orbitalPosition and drawOrbit calculated in astronomical units. It even specified the result of the error: Essentially, it had superimposed the planets on top of the sun, making the system appear empty.

The solution it used was also correct, and the fixed code worked. Finally, we can see ChatGPT's solar system simulator!

ChatGPT can also self-correct errors.

Ultimately, this is a flawed code within ChatGPT itself. We can't expect the same chatbot that wrote the flawed code to find the error. However, ChatGPT has improved recently. In experience, it performs better than Gemini. ChatGPT used to be so frustrating that many people switched to Claude, but now, when they use it occasionally, they generally get good results.

That's natural. These models are constantly being refined and updated, even if the version names don't change. In experience, ChatGPT is also much more sensitive to custom instructions than other chatbots, so your custom instructions in ChatGPT can significantly impact the experience.

Surprisingly, ChatGPT did a great job. It found the root cause, provided a concise but clear explanation as requested, and fixed the code. And the fixed code works well (using ChatGPT 5.4 Thinking as the model).

ChatGPT took the longest thought process in the initial task. Perhaps all the pre-code generation thought filled the context window and contributed to the error. Or maybe ChatGPT is fine-tuned in a way that makes it better at handling smaller tasks and minor tweaks than creating a project from scratch.

The contrast is very interesting. However, the most interesting thing is what is presented in the following paragraph.

Claude created the biggest surprise.

Claude's results in the final test were simply on a different level. They were more thorough, detailed, informative, and scientifically sound, far surpassing ChatGPT and Gemini.

But here's the surprise: Claude failed to find the main error in ChatGPT's code.

Instead, it found a different bug related to the camera panning mechanism. To be fair, it wasn't delusional to think it was. The bug did exist, but it only appeared when you dragged the mouse using the Shift key to pan the camera, and at most camera angles, it was subtle enough to be easily missed. But considering there was a much bigger bug, rendering the simulator almost useless, Claude completely overlooked it.

Strange, isn't it? The chatbot that writes the best code is now the worst at debugging other people's code. This time, Claude fails where ChatGPT and Gemini succeed. To be clear, the choice remains Claude Sonnet 4.6, using the same model as before.

After being given another chance, Claude promised to review the code more carefully and continued. Then, it produced another result, still wrong. But interestingly, it didn't stop there. It said, "will review more carefully" and continued.

But once again, it found a mistake. This wasn't a serious error related to the ratio. Fortunately, Claude didn't give up. It continued to think and finally realized it was a conversion error from AU to kilometers. The consequence it deduced was wrong, but the solution was still correct. You can see some of the dialogue in the image gallery above.

Finally, Claude reported the bug along with the correct solution.

Claude is more like a bot than other chatbots. That's part of the reason why people are switching to Claude and why it's been the most used lately. A chatbot should just be a true chatbot; it doesn't need to act like a human.

While more popular chatbots like Gemini and ChatGPT seem geared towards the average user, striving to listen and feel more human, Claude is different. That difference is also evident in this test. Claude found actual bugs, but not the first decisive ones. According to Claude's reasoning, things seem to go something like this: This is a bug; the code shouldn't have bugs; this is critical. This is definitely a critical bug; then the task is complete.

Claude created the most powerful original simulator, but it was the weakest at identifying the most critical errors when constraints were in place. That's the main lesson here!

One LLM model might seem similar to another at first glance, but they differ in key aspects. Start a private conversation and ask it to find errors in its own code. Send the same code to another chatbot and ask it to do the same.

It is becoming increasingly clear that no single model can dominate everything. Perhaps we will need to combine multiple models, just in case.

Why do many people still use Grok instead of ChatGPT or Claude?
there's one single reason why so many people still open grok every day (and that's something neither claude nor chatgpt can currently match).
How to map understanding and build lessons from gaps with Claude.
claude opus 4.6 traces the source of your confusion. it maps what you've already understood, identifies specific misconceptions, and builds a personalized learning experience around it.
Claude AI User Guide: A Comprehensive Guide for 2026
professional users can leverage specialized capabilities such as claude 4.5 sonnet for automated programming, real-time file creation, and the 'computer use' feature to automate complex web tasks.
7 Features That Claude, the Best ChatGPT Alternative, Still Misses
claude is the best alternative to chatgpt, but it still lacks some key features that could make the tool truly competitive with this platform. if claude had these additional features, it would be better than chatgpt.
Claude AI Starts Blackmailing Developers Who Try to Uninstall It
artificial intelligence (ai) is known to say strange things from time to time. continuing that trend, this ai system is now threatening to blackmail developers who want it removed from their systems.
3 Apple Visual Intelligence Alternatives for Older iPhones
if you have an older iphone, don't worry — there are great apple visual intelligence alternatives that offer similar features without requiring the latest hardware.
Claude or ChatGPT is the best LLM for everyday task?
chatgpt has certainly made history as the most popular ai chatbot, but it also has no shortage of competitors.
4 ways AI Claude chatbot outperforms ChatGPT
with anthropic releasing the claude ai 2 model built to power the claude ai chatbot, chatgpt's dominance in the ai space appears to have been shaken.

What would happen if Claude fixed the faulty ChatGPT code?

The ChatGPT code is broken.

A small but serious mistake.

Gemini has fixed the code correctly.

ChatGPT can also self-correct errors.

Claude created the biggest surprise.

Why do many people still use Grok instead of ChatGPT or Claude?

How to map understanding and build lessons from gaps with Claude.

Claude AI User Guide: A Comprehensive Guide for 2026

7 Features That Claude, the Best ChatGPT Alternative, Still Misses

Claude AI Starts Blackmailing Developers Who Try to Uninstall It

3 Apple Visual Intelligence Alternatives for Older iPhones

Claude or ChatGPT is the best LLM for everyday task?

4 ways AI Claude chatbot outperforms ChatGPT

How to connect an MCP Server Filesystem to Claude Desktop

What is Grok? Learn about AI chatbots from xAI.

Guide to creating 3D animated videos about smartphones with Grok AI

Why do many people still use Grok instead of ChatGPT or Claude?

Hệ thống AI cộng tác: Khi con người và AI cùng làm việc

Maximizing the synergy between humans and AI in the workplace.