From service bot to echo: 74 days to replace the thing that spoke for me

AI 2026-06-11 · Satsuma Creative · 9 min read

At the end of March, the little figure on the saomin.tw homepage still had a KIMI service bot living next to it. 74 days later, it became an echo that has read 278,000 characters of my writing and answers the unification-independence question with my own hierarchy of values. This post walks through every technical choice along the way, and why each was made.

On March 29, I put something that could talk on the saomin.tw homepage.

A speech bubble next to the LEGO minifigure; click it and a chat window opens, wired to KIMI's API. I fed it a system prompt and a bundle of pre-computed Q&A embeddings, and even built an expression system — when it answered, the minifigure changed faces.

It worked well enough. Polite, quick, and ask it "who is saomin" and it gave a perfectly competent answer.

But one day I looked at it a little longer and realized something: it was acustomer service desk about me. It was receiving visitors on my behalf, not speaking on my behalf. You ask it a question, it looks up my data, then paraphrases in its own voice.

Like a dutiful receptionist who has memorized the boss's résumé.

I didn't want a front desk. What I wanted to know was something else: if I fed it every word I've written over the decades, would talking to it be like throwing a stone into a valley — with my own voice coming back?

This post inventories every technical choice in the 74 days that followed. Not a tutorial — a record: what we chose, why we chose it, which calls were right, and where we took detours.

Version one: chop the words into pieces, industry standard (5/29)

Work started in late May. Blog posts, my master's thesis, self-interviews, theatre writing, reflections with AI — cleaned into 596 structured corpus entries, over 200,000 Chinese characters.

The technical route was standard RAG: every piece of text becomes a vector and goes into a database; you ask a question, the system pulls the eight most similar chunks and hands them to Claude to compose an answer.

Three decisions were locked in at this point and never changed:

One: use a small local embedding model, not the cloud. multilingual-e5-small, running on the Mac mini on my desk. The reason is simple: this corpus is my most private material — the thesis, the reflections, and later eighteen years of Facebook. It never leaves this machine.

Two: no fine-tuning. Forging my words into model weights sounds the most like a "digital twin," but fine-tuning is an averaging process — it sands off the rough edges of a voice, and the rough edges are the voice. Worse, whatever gets forged in blends with the model's hallucinations, and you can never again tell which sentence I actually said and which it made up.

Three: honest about facts, generous with opinions. This is the core rule of the system prompt, recalibrated once along the way: the first draft said "if it's not in the material, say I never wrote about it," and it ended up pleading ignorance to everything, like overly cautious legal counsel. Revised: facts (things I've done) — if they're not there, they're not there, never invent; but opinions, taste, how I'd see something — think generously within my framework. Save "I don't know" for the genuine blanks.

saomin.tw/me went live. Throw a stone, and there was an echo.

But I could hear it myself: the echo wasn't quite right.

Finding the problem: a voice doesn't live in fragments (6/1)

Two flaws — one technical, one fatal.

The technical one: with Chinese, e5-small squeezes every similarity score between 0.84 and 0.90, which means it can barely tell what is actually closest to what. I thought retrieval was holding the "no making things up" line; it had been the prompt holding it all along.

The fatal one:Fragments kill tone.A sentence's pauses, its trailing off, its sudden turns — they only hold up with the context around them. Dig it out of the full piece and feed it to a model, and the model learns the content, not the rhythm. And rhythm is the whole reason I'm doing this.

Right around then, Karpathy was talking about using an LLM as a personal knowledge base: have the LLM compile raw material into a markdown wiki, ask questions against the wiki — at small scale you don't need RAG at all.

I took his infrastructure and rejected his core assumption. That decision became the previous post (a9), one-sentence version:He treats the LLM as a metabolic engine for knowledge, where summaries can replace the originals; but my library doesn't hold knowledge, it holds a voice — and summaries kill a voice.

So the map (12 topic pages, where the LLM lays out my positions, tone, and internal contradictions on each topic in the third person) serves only as a navigation layer. At generation time, it always feeds the originals.

The map also came with a coverage diagnosis, and the numbers woke me up to two things: my corpus is 97% "public-facing, polished, expository" in register, with the private, spontaneous, conversational at just 4%; and on the timeline,there is a fifteen-year gap from 2011 to 2025。

In other words: this echo only knew "the me who writes essays," not "the me who banters with friends." And the latter is most of who I am.

A/B: not by gut feel — side by side (6/1)

Should the map be wired into generation? I didn't touch production directly; first I built a comparison tool locally: same question, same persona prompt, two ways of assembling context answering side by side.

A: e5 fragments.Pull the eight most similar chunks. Fast, concrete, and it can search the web.

B: map routing.First locate which of my topics the question falls under, then feed the full original pieces from those topics together with the map's synthesis. Slow, but it can carry the judgment of a whole piece.

Real test, "how do you see failure": A gave "sadness, helplessness, avoidance" — solid but generic; B pulled in the theatre framework — "success and failure mean measuring yourself by external standards, but I make theatre for myself."

B sounded more like me. But A wasn't useless: it retrieves the concrete memories that B's topic categories can't catch. Each has its strengths — file that conclusion away; it matters later.

Both went live just like that, side by side as an A/B, for visitors to compare themselves.

Facebook: not filling in volume, but another me (6/8–6/11)

The diagnosis said what was missing, so go get it. Hit the Facebook export button.

The first attempt failed — I had only checked the past year in the export range: 109 entries, and 97% of them were "work logs of frantically building AI projects in 2026." The gap stayed a gap.

Second attempt, full time range: 901MB. 7,538 posts and 4,965 comments, 2008 to 2026.

Cleaning this batch, the LLM played three roles that had nothing to do with "generation":

Annotator: tag every entry with topics, constrained to the existing corpus's vocabulary, run in batches — keeping eighteen years of material on the same coordinate system as the original corpus.
Screener: a length threshold can't catch reposts — horoscopes, lyrics, press releases, chain posts all look like earnest long-form writing. So the LLM judged each entry as "original / quoted with commentary / pure repost," and cut 112 pure reposts.
Synthesizer: recompile the map, writing the Facebook register into every topic page.

What remained: 1,246 entries, 45,000 characters. The fifteen-year gap was filled, and the corpus reached 2,015 entries and 278,000 Chinese characters.

Was it worth it? Afterwards I asked it about the relationship between Taiwan and China. It answered: "Injustice matters more than Taiwan independence. I would rather have a free Republic of China than an authoritarian Taiwanese state." Then it added: "When my daughter was in third grade, Taiwan was all she knew. That part needs no teaching."

Both lines are things I actually wrote in Facebook comments in 2022. The system from a week earlier had no such memories at all.

The merge: each has its strengths, so take both (6/11)

The A/B ran side by side for a while. I kept testing it myself, and the conclusion stayed the same: each has its strengths.

So why pick only one?

The merged approach is not "answer twice and blend" — that's twice as slow, and blending dilutes the tone. It'smerging both retrieval results into a single prompt, one generation pass: one vector query feeding both routes at once — the fragment route grabs concrete cross-topic memories, the map route grabs depth of position and whole-piece tone; deduplicate, then feed them in together.

That's what saomin.tw/me is now. A single column, like talking to a person — no longer a two-pane laboratory.

The generation engine kept pace too: the same claude CLI pipeline, with the model climbing from Claude Opus all the way to Claude Fable 5. I barely touched this layer — same pipeline, reading the same words of mine, but the "thinking" part got thicker. That's the other dividend of choosing "assembled context + general-purpose large model" over fine-tuning:when the model upgrades, the echo gets smarter with it — and not one of my words needs re-forging.

Then, seeing the service bot off

This morning I cleared away the corpse of that KIMI service bot on the homepage. 453 lines: the chat panel, the expression system, the in-browser vector matching — everything March-me thought was cool, all deleted.

The bubble on the homepage is still there. Next to the LEGO minifigure, it still slowly types out a greeting. Only now, once it finishes, it turns into an input box.

Drop a sentence in, another window opens, and what catches it is no longer a receptionist.

It's an echo.

Appendix: the 74-day timeline

Date	Event	Key choice
3/29	KIMI service bot live on homepage	Expression system, in-browser vector FAQ
5/29	me.saomin version one, 596 corpus entries	Local e5 embeddings (data never leaves the machine), no fine-tuning, chunked RAG
5/30	saomin.tw/me goes live	"Honest about facts, generous with opinions"
6/1	Map layer + coverage diagnosis	Took Karpathy's infrastructure, rejected summaries-replace-originals (→ a9)
6/1	A/B side-by-side goes live	No architecture change by gut feel — test side by side
6/8	One year of Facebook, 109 entries	Failure: only one year selected, gap left unfilled
6/11	Full Facebook history, 1,246 entries (2008–2026)	LLM as annotator + screener; corpus at 2,015 entries / 278,000 characters
6/11	A+B merge live, KIMI retired	Both retrieval routes merged into one prompt, single generation pass; model upgraded to Claude Fable 5

Want to hear how it talks now:saomin.tw/me. Throw a stone in.

Further reading: - It wants to be AI's upstream; I just want to leave an echo - I built myself a knowledge base, then refused to let it speak for me - The person who talked to AI too much