What is embedding? A plain-language explanation of how AI "reads" your knowledge base

AI 2026-06-02 · Satsuma Creative · 11 min read

Embedding turns text into coordinates, placing similar meanings close together. That's why when a customer asks "I want to return this," the AI can find "Return and Exchange Policy"—even when the two phrases share no words in common.

If you read the RAG article, you probably know this: AI customer service knowledge bases don't search for answers by matching keywords.

So what do they use?

Embedding。

The term sounds technical. But the logic behind it is more intuitive than you'd think.


TL;DR

  • Embedding is the technology that turns text into "numerical coordinates"
  • Text with similar meanings clusters together in coordinate space
  • RAG uses embedding to find data with "the closest meaning," not data with "matching keywords"
  • That's why when a customer asks "I want to return this," the AI can find "Return and Exchange Policy"—even when the two phrases share no words in common
  • The quality of your embeddings directly determines whether your AI customer service can find the right answers
  • Satsuma uses HuggingFace's open-source multilingual model, running locally with zero API costs, and data never leaves the machine
  • Claude Sonnet 4.6 and GPT-5.5 themselves are not embedding models—they handle the final step in a RAG architecture (generating the answer), not the step that finds the data
  • Choosing Claude or GPT affects the tone of the response and how reliably it follows instructions—not the quality of the embedding

Starting with something you've already used

When you use Spotify or YouTube, it recommends things "you might also like."

That recommendation isn't based on keywords. It's not saying "this song is also called pop music, so here you go."

It's based on a judgment that "this song and the one you've been listening to are close together in some kind of space."

That's exactly what embedding does. The only difference is the subject—text instead of songs.


How does text become coordinates?

Picture a space. Not the three-dimensional space you can see, but an abstract space with many dimensions.

Every piece of text can be converted into a point in this space—a set of numbers representing its location.

For example: - "How do I return this?" lands at one location after conversion - "What's the exchange process?" lands at a nearby location - "Do you have it in black?" lands at a far-away location

The AI doesn't need to see the exact same words—it only needs to see "points close enough together."

That's the core logic of embedding:Turning semantics into distance.


What does this have to do with RAG?

The previous articleexplained that RAG is an architecture that "finds the data first, then lets the AI answer."

The "finding the data" step relies on embedding.

Here's the flow:

  1. Your knowledge base (return policies, FAQs, product descriptions) is converted into embeddings in advance and stored in a vector database
  2. A customer asks a question; that question is also converted into an embedding
  3. The system finds the few pieces of content in the knowledge base that are "closest" in distance
  4. These pieces are passed to the AI, which generates an answer based on them

The key is step three. It's not looking for "the same words"—it's looking for "similar meanings."


Why does this matter?

Traditional search relies on keyword matching.

A customer says "I want to return this," and the system looks for the word "return." Found it.

But what if the customer says "I don't want it anymore, can I send it back?"

Traditional search: nothing found (no "return" keyword). Embedding search: found it (because "don't want it anymore, send it back" sits very close to "return" in semantic space).

That's why a well-built RAG-based AI customer service is far smarter than traditional search-based customer service.

It's not that the AI itself is smarter—it's that the way it finds answers is closer to "how humans actually understand questions."


What does embedding quality affect?

Not all embeddings are the same.

How well "I want to return this" gets converted into coordinates depends on which embedding model you use.

A good embedding model: accurately understands Chinese semantics, places "want to return" close to "want to exchange," and groups "asking about color" near "asking about size."

A bad embedding model: makes a mess of Chinese semantics—"I'm satisfied" might end up close to "I'm not satisfied" (because the characters look similar), and "refund" might be treated as completely unrelated to "return."

When Taiwanese brands build AI customer service, this is the most overlooked piece.

Everyone compares feature lists and pricing, but nobody tests whether the embedding model behind the service actually understands Traditional Chinese semantics correctly.

Then things go wrong after launch, and they assume the AI isn't strong enough.

In reality, the foundation is already crooked.


How does Satsuma handle embedding?

It's more honest to be specific about what we use ourselves.

Right now Satsuma's own AI customer service (the one you see in the bottom-right corner of this site) uses, for the embedding layer, HuggingFace's open-source multilingual model paraphrase-multilingual-MiniLM-L12-v2, running locally on a Mac Mini.

Not OpenAI, not the cloud—open-source, local.

Why this choice?

Data stays on the machine. Knowledge base content and customer questions are processed locally and never sent out. For clients sensitive about data, this is a very practical consideration.

Multilingual coverage is good enough. The model is designed to be multilingual—mixing Traditional Chinese, Simplified Chinese, and English in a single question still produces correct semantic matching. For the everyday questions in Taiwanese customer service scenarios, recall is solid.

Vectors are stored as .npy files (numpy arrays), kept locally. We don't run a separate vector database; queries use cosine similarity for matching. When the knowledge base isn't huge, the speed is more than enough.

When the knowledge base is updated, clicking "Rebuild" in the admin panel triggers automatic re-computation of the vectors and a hot reload—no service restart required.


This architecture isn't "the most powerful." It's "good enough and low-maintenance."

If your knowledge base has thousands of documents, high query volume, and needs more precise recall, then it's worth considering an OpenAI embedding + pgvector combination. That's a different set of requirements, not a sign that this architecture is insufficient.


How do Claude Sonnet 4.6 and GPT-5.5 handle embedding?

Here's something that often gets confused.

Claude Sonnet 4.6 and GPT-5.5 themselves are not embedding models.

These are "large language models"—you ask them a question, they answer. Their role in a RAG architecture is the final step:Read the retrieved data and generate an answer

Embedding—converting text into vectors and finding the nearest data—is a separate layer, handled by an independent embedding model.

As a flow:

客戶問題
  ↓
[Embedding 模型] ← 開源多語言模型(本地)或 text-embedding-3-large(OpenAI)
  ↓
向量比對搜尋
  ↓
找到最相關的知識庫段落
  ↓
[大語言模型] ← Claude Sonnet 4.6 或 GPT-5.5 在這裡
  ↓
生成回答

So when you're asking "should I use Claude or GPT for AI customer service," what you're actually asking iswho handles the final step. The embedding layer is a separate choice.


So what's the difference between Claude and GPT at this step?

Context window size

Claude Sonnet 4.6 supports a 1 million token context window, and GPT-5.5 is in a similar range.

What does this mean for a knowledge base? In theory, you could shove more data directly into the AI and find answers without embedding.

But in practice, dumping too much data into the AI scatters its "attention," and accuracy drops when finding answers in long documents. On top of that, token usage is billed by volume, so every Q&A costs a lot.

So even with a large context window, RAG is still cheaper and more accurate than "stuffing everything in."

Traditional Chinese response quality

Both Claude and GPT-5.5 perform well in Traditional Chinese, but their tones differ.

Claude's Chinese is more formal and slightly translation-flavored (because its Chinese is built on top of an English foundation, which I covered inanother article). GPT-5.5's Chinese is more conversational, closer to the way people in Taiwan actually speak.

For customer service work, you can compensate for this difference through prompt design, but the raw feel of the language really is different.

Reliability in following instructions

You tell the AI "only answer based on the knowledge base; if you don't know, say you don't know." Will it actually do that?

Every version and every model is improving on this—there's no permanent best answer. Satsuma's approach is to test both, run them against the same set of questions, and see which one is more stable on the current version.


An easy way to test

If you want to quickly test the embedding quality of your AI customer service, try this:

Prepare a few groups of "same question, phrased differently": - "How do I return this?" vs "I don't want it anymore" vs "Can I send it back?" - "How much is shipping?" vs "How much to ship to Taipei?" vs "What's the free shipping threshold?" - "Do you have black?" vs "Color options" vs "Dark-color styles"

All three questions in a group should find the same answer.

If they don't, the problem is most likely the embedding, not the AI itself.


Summary

Traditional keyword search Embedding-based semantic search
Finds data with "matching words" Finds data with "similar meanings"
Customer phrasing must be precise Works no matter how the customer phrases it
Simple to set up Requires choosing the right model
Traditional Chinese is usually fine Model quality varies significantly

RAG + good embedding = the foundation of AI customer service.

Once these two are right, everything else becomes possible.

Get them wrong, and any feature you add on top is a house built on sand.


This article is part of the "AI Customer Service Tech Primer" series: - What is RAG? - What is embedding? ← You are here - What is AI memory? - GPT vs Claude vs Gemini as the customer service backbone


FAQ

Q: Are embedding and "AI training" the same thing?

No. Training is teaching the AI to understand language and answer questions. Embedding uses an already-trained model to convert text into numbers. You don't need to train an embedding model yourself—you can use existing ones, but you have to choose the right one.

Q: What is a vector database? How is it different from a regular database?

Regular databases store structured data (names, order numbers, amounts). Vector databases store the numerical sets produced by embedding, and they're specifically optimized for the operation of "finding the nearest point." Common ones include Pinecone, Qdrant, and pgvector (a PostgreSQL extension).

At small scale, you don't necessarily need a vector database—storing as files and comparing directly during queries works fine. At larger scale, you need a vector database to keep things fast.

Q: When the knowledge base is updated, do you have to re-run embedding?

Yes, for the updated portions. How it's triggered depends on the system design—some use a manual button in the admin panel, others detect file changes and run automatically. Either way, after re-running you have to confirm the new content actually made it into the vector index, otherwise the AI is still answering from the old data.