If you've ever used AI coding agents in a software engineering project, you've probably seen their impressive capabilities. AI can read codebase structures, understand abstractions, find the right relevant modules, and even generate new code that's quite 'logical' for the entire system.
But when transitioning to a data science environment, especially a notebook workflow, the experience often changes quite noticeably.
AI can still write valid Python code. It can still call libraries, process dataframes, or create visualizations. However, in many cases, AI doesn't truly understand the most important aspects of the problem: why use this dataset, why filter the data in that way, why the time window changes, or why one analysis result necessitates a complete overhaul of the research direction.
In other words, AI often treats notebooks like regular software projects, when in reality notebooks don't quite work that way.
This is also why more and more developers are realizing that current AI coding agents are often significantly more powerful in software engineering than in data science workflows.
The problem isn't just about tooling.
Initially, many believed the primary cause was insufficient tooling. For example, AI couldn't efficiently read notebooks, track state effectively, or integrate properly with data workflows.
However, when analyzing hundreds of GitHub repositories belonging to both software engineering and data science, an interesting finding emerged: the issue wasn't simply about the tools. The real difference lay in how 'meaning' was stored within the code of these two fields.
In software engineering, much of the meaning lies within the system's structure. Abstractions, interfaces, modules, types, and class hierarchy all contribute to explaining the program's logic.
Meanwhile, in data science, meaning lies more in the data, context, intermediate output, analytical decisions, and the constantly changing state of the workflow. This is a crucial difference that is often overlooked when discussing AI coding agents.
Data science code often 'looks' more complex than it actually is.
One of the most interesting findings is a phenomenon that could be called 'entropy inversion'.
When measuring Shannon entropy at various levels of the codebase, data science projects often appear very chaotic on the surface. They contain numerous column names, temporary variables, dataset labels, or domain-specific identifiers, making the code look quite noisy and unpredictable.
However, when analyzed in more detail at the AST (Abstract Syntax Tree) level, the picture is completely reversed.
Software engineering code typically has a much higher degree of structural diversity. Large projects generate a great deal of different behavior through abstractions, modules, interfaces, dependencies, and internal logic.
Meanwhile, data science workflows typically repeat a familiar set of actions such as loading data, filtering, grouping, aggregating, visualizing, and then further adjusting the results. In other words, data science code often appears more complex on the surface, while software engineering contains more complexity in its internal structure.
This is extremely important for AI agents, as most current coding agents are heavily optimized for structural reasoning rather than contextual reasoning.
Software engineering 'encapsulates' meaning within a structure.
In software engineering, many layers of meaning are compressed into the system's architecture itself.
Function names, interfaces, types, module boundaries, or design patterns all help describe the purpose of the code. An AI agent can understand a great deal about the system simply by reading the call graph or dependency structure.
That's why AI coding agents are now very powerful at tasks like refactoring code, fixing bugs, adding features, or maintaining consistency in large codebases.
Much of the crucial context is already embedded within the code structure itself.
Data science, however, is highly dependent on external context.
Conversely, data science workflows often depend heavily on context outside of the code.
For example, a data filter might only contain a few very simple Python lines, but its existence is based on business logic, dataset characteristics, edge cases, or insights gained from the previous analysis step.
Many decisions in the notebook cannot be fully understood by simply reading the source code. A comment like, "Remove the outlier on percentile 99 because it doesn't affect the main cohort," actually contains reasoning crucial to the entire analysis. Similarly, a chart or dataframe output within the notebook can sometimes be the very reason an analyst changes their research direction.
That's the part that many current AI agents often miss.
Why does notebook workflow still exist despite its many limitations?
This is a question that puzzles many software engineers: if notebooks are so difficult to maintain, why do data scientists still use them so extensively?
The answer lies in the nature of data science.
In software engineering, requirements are usually relatively clear before starting to build a system. But in data science, the research question itself is sometimes constantly changing during the work process.
Analysts typically have to test hypotheses, check edge cases, review visualizations, change filters, and then continuously adjust their analytical approach.
Notebooks keep code, output, comments, and reasoning in one place. This makes the workflow extremely intuitive for analysts living in that context. If notebooks are refactored into modules too early, a lot of analytical context can be lost.
Therefore, notebooks aren't necessarily 'bad engineering software,' but rather another type of artifact serving a different kind of work.
AI Agents are now more optimized for software engineering.
This is perhaps the most important conclusion. AI coding agents currently perform best when meaning is embedded in the code structure, dependencies are clear, abstractions are stable, and the architecture is sufficiently clean.
That's almost exactly the environment of modern software engineering. Meanwhile, data science workflows require AI to understand more about real-world data, constantly changing states, variable provenance, the reasoning behind transformations, and the business context.
An agent that's only good at reading code won't automatically become a good agent for data science. The issue isn't just Python syntax. The problem is that the information structures of these two fields are fundamentally different.
The biggest gap: from notebook to production.
One of the biggest opportunities for AI agents in the future lies in bridging the gap between exploratory notebooks and production workflows. Currently, many insights are discovered in notebooks, but it is difficult to translate them into stable pipelines, reproducible workflows, or practical deployment systems.
The reason is that notebooks prioritize discovery speed, flexibility, and context, while production systems prioritize structure, reproducibility, monitoring, and maintainability. A truly powerful AI agent for data science in the future will likely need to understand both the analytical context and the technical requirements of the production system. This is far more difficult than simply reading the codebase and generating new functions. But that's the real challenge.
Initially, many people thought data science code was simply a 'less clean' version of software engineering. But upon closer inspection, it becomes clear that these two workflows are actually optimized for two completely different goals.
Software engineering focuses on abstraction, structure, and stable behavior. Data science, on the other hand, focuses on data exploration, reasoning, context, and the continuous process of refining analytical questions.
That's why AI agents are now much more powerful in software engineering: much of the meaning has already been encoded into the structure.
In data science, the most important part sometimes lies outside the code — in the dataset, output, comments, business logic, or even the analyst's own thought process.
Once you understand that, the notebook will no longer resemble a 'failed software project.' It's essentially a matter of a workspace designed for reasoning on constantly changing data.