The day the director couldn't watch the rehearsal

AI 2026-06-11 · Satsuma Creative · 7 min read

After using a Mythos-class model, Ethan Mollick went from wizard to patron: describe, pay, judge — the process entirely out of sight. When AI makes hundreds of judgments inside a nine-and-a-half-hour black box and you voted on none of them, verification degrades into a one-hour spot check — so what gives you the right to put your name on the result?

— A response to Ethan Mollick's "What it feels like to work with Mythos"

Original source: What it feels like to work with Mythos — Ethan Mollick, One Useful Thing

I read Ethan Mollick's new article.

He'd gotten early access to a Mythos-class model — Claude Fable 5. He gave it an ambitious prompt: build a piece of software that the research community had needed for years but that, having no commercial value, no one had ever built.

It worked for nine and a half hours.

And finished.

Mollick is a professor at the Wharton School at the University of Pennsylvania, and over the past few years probably the most prolific scholar writing about AI in the English-speaking world. Last year he described working with AI as working with a wizard: you say the incantation, and things happen.

This year he says the spells have gotten so strong he's no longer sure he's still the wizard.

He used a word: patron.

"I describe what I want, I pay, and I judge the result. The actual spellcasting happens somewhere I can't see, across hundreds of small judgments I never got to vote on."

"I'm no longer driving. I'm commissioning."

I know this feeling

A few days ago I watched Claude run a core update across two of my servers on its own, fixing a service I'd broken myself along the way. My only real job in the whole thing was to say, "Okay, go."

That day I wrote an article arguing that building is getting cheaper, and that three things won't automate away: deciding what to build, taste, and verifying that it's right.

At the time I thought the answer held up.

Deciding is a value judgment, not a computation. Taste is knowing which of a thousand workable options is the right one. Verification is that AI can do the whole thing but doesn't know what it missed — someone has to watch.

Building will depreciate. These three will stay expensive. I took that as solid ground.

Then Mollick's article took away half of the third one.

Verification is degrading into a spot check

Notice how he describes that nine-and-a-half-hour product.

He says that, as an expert in the field, he caught some errors and omissions and had the AI fix them. Then he added something very honest:

He'd only spent an hour looking at the result. He was sure there were still bugs in it he couldn't find.

One hour, against nine and a half.

And within those nine and a half hours, the model dispatched its own swarms of cheaper models to do research, and adversarial agents to check each other's results. Most of the verification had already been internalized. What was left for the human was the outermost ring.

I'd said verification "won't automate away." Technically I wasn't wrong. But Mollick's case showed me what human verification actually looks like at that scale —

Not line-by-line checking. A spot check.

You're not verifying the process. You're verifying the surface within your spot-checking ability.

The process is too long to follow, the judgments too many to go through one by one. All you can do is poke at a few spots you understand; if nothing turns up, you sign.

The director doesn't act, but the director gets to watch the rehearsal

I've spent thirty years in theater.

The director's role is a bit like the patron Mollick describes: doesn't act, doesn't build the set, but is accountable for the final result.

So at first I thought there was nothing to panic about. Directors have long lived in a "don't do it yourself" mode of working; we know perfectly well that not doing it yourself doesn't mean having no authorship.

But halfway through that thought, I stopped.

The director gets to watch the rehearsal.

Every blocking choice, every reading of a line, the moment in week three when an actor suddenly clicks — the director is there for all of it. He doesn't act, but he watches the whole way through. His judgments aren't made on opening night; they're made in the rehearsal room, a thousand hours, one at a time.

Every choice in the final product, he voted on.

The patron does not. The patron doesn't even get to see the rehearsal. He walks into the theater on opening night, watches, and applauds or doesn't.

This is exactly what Mollick describes. The model made hundreds of judgments, and he voted on none of them. All he can do is watch the finished product and pick out the few flaws he's able to see.

The difference between a director and a patron isn't whether they do the work themselves. It'swhether you're in the process。

The question of the credit

This brings me back to a question I've asked before, only now it's sharper.

I once wrote: the question is mine, the answer is ours — so whose is the article? My answer then was that I decided the direction of the question, I caught what was wrong — that judgment was still mine. So I couldn't say it wasn't mine at all.

That was an answer for the age of conversation. Back and forth, and I was present for every exchange.

But a nine-and-a-half-hour black box isn't a conversation. It's a commission.

The question of the commissioning age has to be asked again: a finished product whose process I never saw, with hundreds of judgments I took no part in, that I spot-checked for only an hour —

what gives me the right to put my name on it?

The honest answer might be: the two ends.

The front end — the brief is mine. Mollick's software got built because he knew the research community lacked this tool, because he could write a nineteen-page design document, because he knew what "right" meant. Someone who didn't understand research methods, given the same model, would not produce the same thing. The brief carries thirty years of his accumulation.

The back end — the acceptance criteria are mine. Where I spot-check, what standard I use to judge pass or fail, which flaws I let through and which I don't — those are mine.

The middle stretch is not mine. Admit it.

You can still sign your name. But know what you're signing: you're not signing "I made this," you're signing "I held these two ends, and chose to trust the middle."

Trust can only be given once it's been verified

Writing this, I realized this already has a counterpart in my own work.

Before my customer-service system went live, what I did was not read its code line by line — I couldn't finish it, and I couldn't understand all of it. What I did was design the acceptance test: I hit it with real complaint scenarios and watched whether it would make things up in the places I feared most (prices, dates, policy). It got through a round without making anything up, and only then did I let it go live.

That wasn't watching the rehearsal. That was designing an exam.

Maybe this is what verification looks like in the commissioning age: you can't be in the process, so you push your judgment forward into the brief and backward into the design of acceptance. For the middle you can't see, you trade "it passed the exam I designed" for trust.

This trust isn't blind. It's been verified — only the object of verification has shifted from "the process" to "its performance on my exam."

Is it enough?

I don't know. Neither does Mollick. He guesses the black box may simply be the price of power: the stronger the model, the less is left for the human. He may be right.

But I noticed one thing: in his article he writes that the more ambitious the prompt, the better the result.

The brief still makes a difference. The exam still makes a difference. Those two ends, for now, are still human.

In closing

In the age when the director got to watch the rehearsal, authorship was self-evident — you were there, so it was yours.

What we may have to learn now is another, more uncomfortable kind of authorship: you weren't there, but the two ends are yours. You set the question, you set the test. That long dark stretch in the middle — you admit it's dark, and then decide whether to put your name on what walks out of it.

Some will feel this kind of credit is a dilution.

Others will say architects never laid a single brick themselves either.

I'm still thinking about it.

But there's at least one thing I'm now sure of: the next time someone asks me "did you make this?", I won't rush to answer. I'll think for a moment about which two ends I held this time.

Further reading: - When AI starts building itself, "being able to do it" is no longer what's valuable - The person who talked to AI too much - Asking an LLM how it works itself: the question is mine, the answer is ours — so whose is the article? - I built myself a knowledge base, then refused to let it speak for me