I finally have an engineer I can summon who never complains

AI 2026-06-19 · Satsuma Creative · 8 min read

Anthropic studied 235,000 people across 400,000 sessions and proved it: whether AI coding succeeds doesn't depend on whether you can code, but on whether you understand what you're actually doing. A PM of thirty years finally has an engineer who never complains—but he recognizes the danger inside the joy: it catches your bugs, but not the human heart; and the one who finally signs off on you is the market.

Anthropic studied 235,000 people across 400,000 sessions and proved it: success doesn't depend on whether you can write code, but on whether you understand what you're actually doing. For someone who has been a PM for thirty years, that conclusion should make me happy—and it does, but inside that happiness is something I recognize, and it's danger.

—A PM's heaven, and the brake in that heaven that no one is pressing

Let me start with the most honest part: I love this.

What I've done for nearly thirty years is, at its core, telling other people to get things done. Estimating timelines, chasing progress, absorbing the eye-rolls behind "well, that's a strange requirement." Most of my working life has been a wrestling match with the fact that the execution end is human—people get tired, get annoyed, have opinions, and take time off right when you're most pressed.

Now, the execution end has a new arrival.

It doesn't ask why. It doesn't sneer at ugly requirements. It doesn't complain at three in the morning. It doesn't lose its temper after ten rounds of revisions. You tell it to do something, and it does it, and it comes back with a progress report it wrote itself, in a tone more polite than mine.

The only thing it "runs out of" is tokens. But tokens are money, and money I can calculate; human patience I can't. This is already the most calculable cost of any vendor I've ever dealt with in my life.

On June 16, Anthropic released a report titled "Agentic coding and persistent returns to expertise." It laid out 235,000 people and nearly 400,000 Claude Code sessions and arrived at a conclusion that's a bit barbed for engineers, but reads like a love letter to someone like me:

Humans decide what to do; AI decides how to do it. On average, humans make 70% of the planning decisions, and Claude makes 80% of the execution decisions.

In my words: I open my mouth, and it runs a whole afternoon. And I don't have to buy it coffee.


The "expert" the report describes happens to be exactly the PM kind of expert

The smartest thing about this report is that it redefines "expertise."

The expertise it describes is not a job title, not whether you can write code. It istask-specific. The report gives an example I love: a senior engineer asked his first-ever Rust question is, on that task, a beginner; conversely, an accountant who has never touched Python—if he can tell the AI precisely which rules a reconciliation must follow, and can spot at a glance the edge case the AI missed during month-end closing—is, on that task, an expert.

The report's numbers are blunt: a beginner issues a command and triggers, on average, five AI actions and 600 words of output; an expert issues a command and triggers more than twice the actions and five times the output.The more you understand the task, the more the AI does for you.

And the success it measured doesn't rely on a coding background. Across sessions that produced code, every one of the top ten occupations had a success rate within seven percentage points of software engineers. The highest weren't engineers—they were management roles.

The report's own explanation: management skills transfer to "directing an agent."

Isn't that just a PM? A PM's core ability was never doing the work, but knowing what should be done, breaking it down correctly, explaining it clearly, and then watching it through to completion. The report measured 235,000 people and, after a long detour, measured exactly this.


Up to here the happiness is real. But those words—"never complains"—I have to stop on.

Thirty years in this trade have given me an instinct that the report's numbers can't rescue.

Let me make one thing clear first, so I'm not mistaken for taking shots at AI:At catching errors and testing bugs in code, Claude is excellent. I have it do this every day. It helps me see which logic will blow up, which boundary isn't guarded, where a 502 will fire—this kind of "verifiable right-or-wrong," it's faster than me, more thorough than me, more patient than me. The verified success in the report rests on exactly this kind of sign-off, and it's well-earned.

That's not where the problem is. The problem is in another kind of sign-off.

I've done advertising for nearly twenty years, and what I've feared most was never "does the program run." It's "when the target audience sees this, what is their real reaction." That reaction won't show up in any test suite. A film, an idea, a slogan, put in front of real people—will they stop, will they share it, will they think "this is about me," or will they think "what's it got to do with me"—this is taste at work, and it requires a hard-to-articulate judgment about people.

Claude doesn't know our target audience's real reaction. It can't know. It can write copy that's grammatically perfect, beautifully structured, logically sound—and then that grammatically perfect thing gets put on the market and no one cares. It can test my bugs, but it can't test the human heart.

And I used to have colleagues who would test the human heart for me.


The report can measure "success," but not "whether it should be done"

This is where I part ways with the report.

Its "success" is defined clearly and honestly: verified success—based on git commits, passing tests, the user explicitly saying "yes, that's it." This is a kind ofverifiable success. If the program runs, it's a success.

But what I've spent thirty years measuring never leaves a git commit.

The reason the "Shaqua" idea was that idea wasn't because it could run, compile, or pass tests. It's because, in some meeting room, it made a gutsy boss's eyes light up for a second, and made another, more conservative person frown and say "is this really okay?" That spark, that frown—that's taste at work. It has no test suite.

The report says "the gap in success rate between intermediate and expert is small; understand the gist and you capture most of the benefit." On its ruler, this is true. But its ruler measures "can it be built." Switch to "should this be made," "does this have guts," "what is the client really afraid of"—that line isn't even in the report's coordinate system.

I'm not saying the report is wrong. I'm saying it honestly measured only the half it could measure. And the skill that puts food on my table happens to live entirely in the other half it can't.

So when the report implies that "the tool pulls the competent close to the masterful, and the marginal value of thirty years of accumulation gets diluted"—I don't panic. Because what gets diluted is the "can do" part. And these thirty years, I never survived on being able to do.


Closing

How happy is it to have an engineer I can summon who never complains?

Very happy. I mean it. I can carry ten projects at once and still sleep, entirely thanks to it. It freed me from the fact that execution is human—a bottleneck, something to be coaxed. That curve the report describes, "task length doubling every few months," I live on it every day, and I benefit.

But past a certain point of happiness, I recognized that familiar shape.

That it doesn't complain means it won't test the human heart for me. It can help me catch bugs, but it can't catch the target audience's indifference. The report calls this "returns to expertise," framing it as good news—for someone with judgment, the tool gives you greater leverage.

Yes. But leverage amplifies not only your judgment, but also your blind spots. A colleague who tests real reactions for you will frown when you misread the market. A tool that only tests code will, with unprecedented efficiency, finish what you misjudged—finish it well, finish it beautifully—and then toss it, along with everything else, into that market where no one cares.

So what this report really means to me isn't "I finally don't have to manage people." It's:

The person who would judge for me "will the target audience really buy this" is now only myself. Claude guards the code end for me; the human-heart end, I have to guard myself.

On this, no model can replace me.

Then again, even I am not the final gate.

I guard the human heart, but what I'm guarding is only my "judgment" of the human heart. However accurate the judgment, it's still just judgment. The one who actually presses right-or-wrong was never me, and never Claude—it's the market. It's whether, after that film goes out, anyone stops, anyone shares, anyone pays. The market doesn't complain, and it doesn't reason; it just tells you, through silence or through conversion rate, whether you guessed right.

So the chain actually goes like this: Claude signs off on the code, I sign off on the human heart, the market signs off on me. What it does, what I judge—all of it goes, in the end, before the same judge from whom there is no appeal. AI made "being able to do" cheap and pushed me toward "judgment"; but in pushing me all the way up, it only made me stand before the market sooner, and more nakedly.

This is probably the most honest version: I finally have an engineer I can summon who never complains, and it lets me work faster and go further—further forward, to face the thing I've actually been facing for thirty years, the thing that will never change.

The market will complain. And its complaint, no one can overrule.


Further reading: - When AI starts building itself, "being able to do" stops being worth money - AI moved into every pocket. Now what? - How Satsuma Creative does advertising - Clients say they want a Big Idea, but what they really want is for nothing to go wrong - Original:Agentic coding and persistent returns to expertise — Anthropic