How to build your own personal LLM: and what this question is really asking

AI 2026-05-29 · Satsuma Creative · 8 min read

Building an LLM from scratch costs $100 million, but system prompt / RAG / LoRA are three low-barrier paths. The real question isn't how to do it — it's which three layers you need to "put yourself into it."

—and what this question is really asking

I asked Claude a question.

Can an ordinary person build their own LLM?

Not "use an LLM" — build one that's your own.

And then we talked for a long time. By the end, I realized the answer to this question didn't matter. What mattered was what the question was really asking underneath.

Starting with the technical side

Building an LLM from scratch comes with these barriers:

資料   → TB 級的乾淨文字
算力   → 幾千顆高端 GPU，跑幾個月
錢     → 訓練一次 GPT-4 級別，估計超過一億美金
人     → 幾十到幾百個頂尖研究員

An ordinary person can't do it. Forget this path.

But "your own LLM" can mean several different things.

The lightest is the System Prompt—tell the model who you are, your tone, your frameworks, and it carries that into every conversation. It doesn't change the model; it just dresses it in a new outfit each time.

In the middle is RAG—vectorize everything you've ever written into a knowledge base, and the model draws on your own words when it answers. What it says will carry the shadow of what you've said.

What actually changes the model itself is LoRA fine-tuning—take an open-source model (say, Qwen2.5-7B) and retrain it on your data, so it drifts toward you at its core.

How fine-tuning works

For model choice, Qwen2.5-7B makes the most sense for Chinese-language use. It has the strongest Chinese ability among open-source models, and 7B runs on ordinary hardware.

The data format is simple:

{
  "instruction": "用你的語氣寫一段關於意義的思考",
  "output": "意義不是找到的。是在差異裡浮現的......"
}

A few hundred to a few thousand such data pairs. The source is everything you've ever written—blog posts, papers, creative work, conversation logs.

Training doesn't require owning a GPU. Spin up a GCP T4 instance, run it for two or three hours, spend a few dozen dollars, and shut it down when done. Use a tool called Unsloth—it's memory-efficient, fast, and low-barrier.

Once it's done, move the model back to your local machine and run it with Ollama. Completely offline, your data never leaves.

What LoRA does, in one sentence

An ordinary Qwen learns the next token that's most reasonable for everyone. Its "reasonable" comes from the statistical average of nearly everything humans have ever written.

What LoRA does is layer a direction that shifts toward you on top of that average.

原始模型：對人類整體最合理的猜想
    +
LoRA 層： 修正向量，把「合理」的定義往你漂移
    =
微調後：  對黃少民最合理的猜想

But the phrase "most reasonable" is worth pausing on.

Your "most reasonable" isn't reasonable in a statistical sense. It's the first reaction you have to something after decades of accumulation. That reaction sometimes isn't the most common one—sometimes it bypasses common sense entirely and lands somewhere no one else would think to go.

LoRA can learn the direction you shift in, but the most valuable part of that shift—why you think this way, where the thought came from—it can't learn. It learns the shape of the result, not the path that produced it.

A more precise way to put it:

Toward the reasonable that appears most often in the things I've said.

The reasonable you haven't voiced yet, it can't guess. The reasonable you arrive at after changing your mind, it has to relearn. The reasonable you only have in a particular moment, a particular situation, it can't handle.

That's why this thing needs continuous feeding. You keep writing, it keeps learning, and only then does its "best guess at what's reasonable for you" move along with you. Where there's no trace, it still fills in with the human average.

But model size matters

I asked: would a bigger model be better?

Yes. But there's a turning point.

7B  → 能學到語氣和表達模式
14B → 開始能學到思考結構
32B → 能學到推理方式和價值判斷
70B → 接近知識深度和論述邏輯

The bigger the model, the higher its demand for data quality. Feed a few hundred articles to a 70B and it still learns only the surface. Better to feed the same data to a 14B—it ends up more focused.

But honestly, when it comes to "does it sound like you," a large model with good context will beat a fine-tuned small one. Claude or GPT-4 with your knowledge base and a careful system prompt gets closer to what you want than a fine-tuned 7B.

The real value of fine-tuning a small model is: fully offline, low cost, running at scale.

And then the question changed

I said, what I want isn't just a tone that sounds like me. I want something that's like me at the core.

Claude said something that made me stop.

Fine-tuning can learn tone, frameworks, preferences, style. But what it can't learn is: the bodily memory of thirty years in theater, the feeling of suddenly getting something while reading a line one afternoon, the hesitation over something and the reason behind that hesitation, the things not yet written.

"Like you at the core" requires two things done at once.

A base model with depth—because your thinking has depth, and it takes a deep model to catch it. Plus feeding your material in structured form—not just articles, but the process of how you see something, the moments you changed your mind, the things you feel matter but haven't yet articulated.

And this isn't a one-time engineering job. It's continuous accumulation. You keep writing, and every article, every conversation goes into the knowledge base, and the model's understanding of you deepens over time.

Everyone can have their own personal LLM

This is already happening.

Memory systems, personal knowledge-base RAG, continuous fine-tuning—several major directions are all heading here. Apple Intelligence watches your calendar and habits on your device, NotebookLM lets the model live inside the documents you upload, and Claude's memory system understands you more the more you use it.

The technical problem is no longer the problem.

But it raises another one.

When everyone has a model that grows more and more like themselves, do the things this model says count as you saying them? Do the decisions it makes on your behalf count as your decisions? If it understands your patterns better than you do, is it your tool, or a part of you?

This is the same question as "whose is the article" from the last piece, pushed one step further.

What I actually want to do

I have a small chat feature on saomin.tw.

I've long imagined that one day, after I'm gone, there could still be something that answers and interacts with the living according to my thinking and my memory.

This isn't a chatbot. It's an externalization of cognition—turning how I see the world, how I think, how I respond into something that can keep operating.

This idea has a name—some call it Digital Afterlife, others Persona AI. Companies are already working on it, but no one has reached this depth—because those are all generic, not grown from one person's decades of accumulation.

What it takes to "put yourself into it"

It breaks into three layers.

Layer one: the things I've said. Blog posts, papers, tarot writing, theater writing, conversation logs. This is the easiest to collect, and the most surface-level layer.

Layer two: how I think. When I hit a problem, what's my first reaction. Which angle I tend to come at it from. When I stop, when I keep pushing. How I handle the things I'm unsure about. This layer is harder—it takes deliberate recording. This entire conversation today is great material—not just the conclusions, but the way the questions are asked.

Layer three: my value judgments. What I think matters. What I don't care about. Where I stand on certain things. My contradictions and hesitations. This layer is the hardest, and the most crucial. Without it, what comes out talks like me but gives the wrong answers when it hits a real problem.

So the first step you can take right now

Isn't technical—it's recording.

Start consciously writing downhow you think—this. Not the conclusions, the process.

This conversation today is one example. It went from how LLMs work to Derrida to the hard problem of consciousness to wanting to externalize myself into a model. This thread, this way of leaping, represents who you are better than any conclusion.

In the past, a person's way of thinking vanished when they died. All that remained was text—with no way to interact.

Now there's another possibility—not just leaving behind text, but leaving behind something you can keep having a conversation with.

Whether this is a good thing or a bad thing, I don't know.

But it's becoming possible. And I want to give it a try.

Further reading: - We don't know where consciousness comes from—LLMs just make it impossible to keep pretending - Whose is the article? The reader's. - Asking a thing how it works itself