However, speed always comes with a trade-off. When applications are created too quickly, most developers no longer have time to read and test all the code as they used to. This isn't actually a major problem in most cases, because modern coding agents are powerful enough to reliably handle much of the implementation themselves. Unless you're building a safety-critical system or require extremely high accuracy, reading every line of code isn't always necessary.
Without proper procedures, AI-generated code is prone to robustness issues—that is, the ability to function stably across a wide range of scenarios. An application might run well under normal conditions but easily fail when encountering edge cases or context changes.
This article focuses on techniques to improve code stability when working with Claude Code and AI coding agents, especially in cases where developers cannot manually review the entire source code.
Why is robust code more important than ever?
Robust Code is a term in computer science referring to code that is designed and built to withstand and handle unexpected situations, execution errors, or invalid input without crashing or ceasing to function completely.
This is almost an 'obvious' question. Everyone wants a more stable product because it directly impacts the user experience. The fewer bugs an application has, the better the overall experience.
However, another question often arises: if you want more robust code, why not just read and edit the entire code yourself to be sure?
There are two fairly practical reasons. First is the speed of product development. If you want to maintain a fast build pace with an AI coding agent, it's almost impossible to spend time reviewing every line of code as you would with traditional programming.
The second reason is that coding agents have improved significantly. If properly prompted, they can detect problems and generate more reliable code on their own without requiring much manual human intervention.
Therefore, instead of trying to control everything manually, the goal should be to build a workflow that allows the AI to increase the robustness of its code as much as possible on its own.
To ensure stable code, invest in the planning phase.
The first technique to emphasize is the use of 'plan mode'. This is one of the most important features if you want to fully exploit the power of coding agents. Instead of starting implementation immediately, plan mode forces the AI to spend more time thinking about the overall architecture and implementation before writing code.
This is especially helpful because many bugs arise when one component is modified inadvertently breaking the logic in another. With more planning time, models generally see the 'big picture' better and are less likely to create unintended side effects.
Another interesting point is that plan mode often prompts the AI to proactively ask the user questions to clarify ambiguity. According to the author, this is an extremely powerful capability that many people haven't fully utilized.
Instead of constantly asking questions to the AI, let the AI ask you questions. This approach allows the model to think more independently and only return questions when it truly needs clarification of requirements or a clearer understanding of the implementation goals.
Although initial planning might slow down the workflow slightly, the author argues that the long-term benefits far outweigh the drawbacks due to fewer bugs and a significant reduction in the number of post-implementation rework loops.
Markdown files can become 'long-term memory' for AI agents.
Another extremely effective technique is to maintain a system of markdown files within the repository. Over time, when working with the coding agent, the number of markdown files should gradually increase to record:
- How agents need to function in a project
- Bugs have appeared before.
- How the bug was fixed
- Important technical notes
A key feature is that modern coding agents typically have sufficient context to proactively read and utilize this documentation in a new session. This helps the agent avoid repeating past mistakes.
This is probably the number one tip for more efficient coding with AI. After each chat thread with the coding agent, you should ask the AI to generalize the knowledge it has just learned into markdown notes and save them in the repository. Similarly, every time a bug is detected and successfully fixed, you should save a description of the bug along with the fix.
If done consistently, the repository will gradually become a very strong knowledge base, helping the AI coding agent understand the project better and create fewer errors over time.
An excessively long context window can make AI write worse code.
One interesting point in the article is that an overly large context window can sometimes make the coding agent less efficient.
For example, Claude Code currently supports contexts up to 1 million tokens. This sounds impressive, but in the author's practical experience, model performance often drops sharply after around 300,000–400,000 tokens.
The reason is that when the context is too long, the model has to process a lot of 'noise' that isn't really relevant to the current task. And with LLM, distinguishing between noise and important information isn't always easy.
As a result, AI is more likely to make inaccurate decisions and generate error-prone code.
Therefore, unless it's absolutely necessary to maintain a lot of specific context, the author recommends working with smaller and cleaner contexts for more efficient coding agent performance.
Let someone review the AI's own code.
No matter how well-planned, coding agents will still make mistakes. That's why the importance of post-implementation verification needs to be emphasized.
One of the simplest yet most effective ways is to use another AI to review the code generated by the AI.
The proposed workflow is quite clear: create a new coding agent with a clean context, then instruct it to analyze the pull request and find potential bugs.
Interestingly, the prompt for the 'review agent' should also be improved over time. For example, you could add information about bugs that have appeared in the past, how they were discovered, and how they were fixed, so that the review agent becomes increasingly smarter.
Using a different model for review can sometimes yield better results. For example, if the code is written in Claude Code, GPT or Gemini can be used for review. Because the models 'think' differently, they sometimes detect errors that the original model missed.
Pre-commit detection remains a very important line of defense.
It's also worth mentioning pre-commit hooks—mechanisms that run automatic checks before each code commit.
This technique has been around for a long time but remains extremely useful in the age of AI coding. For example, many codebases have pre-commit hooks to check for missing translations or static errors before a commit is made. This helps detect errors immediately instead of waiting until code review or production.
However, not all errors can be detected using static rules. In those cases, you can use an AI agent as a quick 'pre-commit reviewer' before committing. The method is very simple: instruct the agent to go through the newly created implementation and ask itself:
Is this code production-ready?
It sounds simple, but according to the author, this question can sometimes help detect quite a few potential errors before the code progresses further in the development pipeline.
The future of coding agents is not just about 'writing code'.
Coding agents have improved dramatically since the advent of ChatGPT in 2022. However, they are still prone to errors if not used correctly. The future of AI coding will not only lie in smarter models, but also in the ability to 'tune workflows' to make AI as efficient as possible.
Techniques such as planning, knowledge base markdown, context management, AI code review, and pre-commit verification are likely to remain crucial even as the quality of LLMs continues to rise sharply in the coming years.
Ultimately, the goal isn't just to write code faster—but to write code that's fast enough yet stable enough for practical use.