Better coding models don't necessarily mean AGI is around the corner

Every time a new coding model is released, the same conclusion spreads quickly: AGI is basically here.

The logic feels straightforward. Software engineering is one of the most cognitively demanding white-collar skills. If AI can write code, surely general intelligence can't be far behind.

That conclusion doesn't follow.

Not because coding models aren't impressive — they are — but because coding is unusually well-suited to partial automation, and the parts being automated today are the most structured layers of the work. Progress in coding tells us far more about advances in text modeling and pattern completion than it does about whether artificial general intelligence is imminent.

One could argue that all cognition might ultimately reduce to sophisticated pattern completion over the right representations. Perhaps. But even if that's true, current coding models operate over a narrow slice of the representations that would be required for general intelligence. The gap between "pattern completion over code" and "pattern completion over everything humans navigate" remains vast.

Human and AI

The Big Bang Was Already Impressive

When ChatGPT launched in late 2022, it felt like a discontinuous leap. Within days, millions of people were experiencing fluent, coherent text generation for the first time. The model could write essays, explain concepts, draft emails, and yes — produce syntactically correct code.

That initial release already demonstrated the core capabilities we're still refining today: grammatical fluency, consistent tone, contextual awareness within a conversation, and the ability to follow instructions across domains. The syntax was clean. The prose was smooth. The code compiled.

What's improved since then is scale, reliability, and specialization — not the fundamental nature of what these systems do. Models have gotten better at longer contexts, more complex reasoning chains, and domain-specific tasks like coding. But the underlying mechanism — pattern completion over text — was already working remarkably well from the start.

This matters because it reframes what "progress in coding" actually means. We're not witnessing AI suddenly acquiring new cognitive capabilities. We're watching text models get better at a task that was always within their wheelhouse. Code is text. Syntax is pattern. The surprise isn't that today's models write better code — it's that we keep mistaking incremental improvement for qualitative transformation.

The big bang wasn't a glimpse of AGI. It was a demonstration of how far text modeling could go — and we're still exploring that same territory.

Coding Is Messy — and Only Parts of It Are Automatable

Real-world software engineering is not clean or well-defined.

It involves vague requirements, shifting priorities, legacy systems, unclear ownership, human disagreement, trade-offs under uncertainty, and constant context switching. Much of the difficulty isn't writing code — it's deciding what should be built, why it matters, and how it fits into a larger system.

Today's coding models are not automating software engineering as a whole. They are automating specific layers of it: writing boilerplate and common patterns, implementing well-specified features, refactoring localized sections of code, translating clear intent into concrete implementations, and solving self-contained problems with known constraints.

These capabilities are extremely valuable, but they represent a shift in abstraction, not the elimination of the role. As lower-level execution becomes cheaper, human engineers move up the stack — toward architecture, system design, integration, prioritization, and judgment under uncertainty.

Automation doesn't remove the job; it raises the level at which humans operate.

Why Parts of Coding Are Easier to Automate Than Most Knowledge Work

Even though software engineering is messy, some aspects of it have properties that make them especially amenable to automation.

Code is textual and formal. At its core, code is structured text governed by explicit grammar and semantics. Large language models are designed to model text, which makes code a natural extension of their strengths. This doesn't make coding trivial — but it does make certain layers of it far more compressible than work dominated by implicit meaning, subtext, and human interpretation.

Code is highly testable. Software offers unusually strong feedback loops. It compiles or it doesn't. Tests pass or fail. Programs crash or run. This makes it possible to automate iteration and correction at the implementation level. Many white-collar tasks lack anything comparable; feedback is often delayed, subjective, or politically mediated.

Code operates in a bounded environment. A codebase has boundaries. Libraries behave predictably. While systems evolve, the rules don't change arbitrarily mid-task. This boundedness makes portions of coding easier to automate than work that operates in open systems shaped by markets, organizations, and human behavior.

None of this means software engineering is solved. It means some layers are more automatable than others — and those layers happen to be where current AI systems excel.

What Remains Is the Hard Part

As AI takes over more of the execution layer, the center of gravity of engineering work shifts upward. Engineers spend less time translating intent into syntax and more time deciding what should be built, how systems should evolve, and which trade-offs are acceptable.

This residual work is defined by ambiguity and judgment: interpreting incomplete requirements, reconciling conflicting stakeholder goals, designing systems that must adapt over time rather than just function today, managing risk and failure modes rather than optimizing for idealized success paths, and making decisions under uncertainty with limited information and no single correct answer.

Logistics offers a useful parallel. On paper, it resembles a clean optimization problem: move goods from point A to point B efficiently. In practice, it is dominated by disruptions — delays, equipment failures, weather, supplier constraints, human error, coordination breakdowns, and conflicting deadlines under partial information. The job isn't solving a static problem; it's continuously adapting to a shifting one.

Software engineering increasingly operates under similar conditions. Modern systems are long-lived, interconnected, and embedded in organizations. Requirements change midstream. Constraints conflict. Information arrives late or is wrong. The hardest part is no longer producing correct code — it's deciding what correct even means in a shifting environment.

In this regime, programming becomes a medium rather than an identity. The most effective engineers are those who can translate messy real-world conditions into evolving systems while continuously revising their understanding as reality pushes back. Their value comes from knowing which constraints matter, which simplifications are dangerous, and which problems are artifacts of the organization rather than the technology.

This is precisely where current AI systems struggle. They perform best when goals are stable, constraints are explicit, and feedback is immediate. As work moves toward ambiguity, evolving objectives, and judgment under uncertainty, the gap between code generation and general intelligence becomes clearer, not smaller.

Coding Skill Is Not General Intelligence

In humans, being an excellent programmer does not automatically make someone a great manager, strategist, or operator. Coding is a powerful but narrow cognitive skill.

The same applies to AI.

An AI that writes excellent code does not necessarily understand human motivations or incentives, adapt goals as circumstances change, reason robustly across unrelated domains, or operate effectively when success criteria are unclear.

Fluency at one layer of abstraction should not be mistaken for general intelligence.

AGI Requires More Than Better Code Generation

If we take AGI seriously — even purely digital AGI — the required capabilities go far beyond writing code.

Current models lack persistent agency. They respond to prompts but do not maintain goals across time, initiate action independently, or revise their objectives based on accumulated experience. They operate in isolated episodes rather than as continuous participants in ongoing work.

They lack grounded world models. They can describe how systems behave but do not maintain robust internal representations that update correctly as situations evolve. They predict text, not dynamics.

They lack genuine adaptability under ambiguity. When instructions are vague, contradictory, or simply wrong, current systems either fail silently or confabulate. They cannot recognize when to push back, ask for clarification, or abandon a flawed premise.

They lack the ability to manage open-ended goals. Real work involves objectives that shift, compete, and require continuous reprioritization. Current models optimize for the immediate prompt, not for long-term outcomes that require sustained judgment.

Coding models exercise only a fraction of this capability stack.

What Would Actually Convince Me

I'll be more speculative here, but I think it's worth being concrete about what would actually shift my view.

I would be convinced that digital AGI exists not when a system writes great code, but when it can operate as a first-class, autonomous participant in a digital workplace.

I'd expect it to take vague, high-level goals and turn them into concrete, multi-week plans. To ask clarifying questions without being prompted. To revise its approach as new information arrives. To maintain long-term context across many interactions. It shouldn't just follow instructions but adapt its goals, recognize mistakes, and correct itself without explicit feedback.

Crucially, it wouldn't be a system that only works when prompted. An AGI should be persistently present and capable of initiative — surfacing issues, starting conversations, and escalating concerns on its own when context demands it.

Operationally, it should have full parity with humans in digital environments: natural, full-duplex communication; shared context through screen-level awareness or equivalent; and the ability to use the same tools humans use — browsers, documents, terminals, dashboards — without special scaffolding.

In short, AGI has arrived when an AI can function as a reliable, self-directed knowledge worker in an open-ended digital environment — handling ambiguity, taking initiative, and bearing real responsibility for outcomes.

Conclusion

Better coding models represent real and meaningful progress. They will continue to reshape how software is written and push engineers toward higher levels of abstraction.

But they do not imply that AGI is around the corner.

What's being automated is primarily execution under structure, not judgment under uncertainty. The hardest parts of software engineering — like the hardest parts of most knowledge work — live in the messy regime of shifting constraints, incomplete information, and human complexity.

AGI won't arrive because AI learned to write better code.

It will arrive — if it does — when machines learn to operate reliably in the mess.