One month of running AI customer service on our own site — real numbers and three things we didn't see coming

Customer Service 2026-05-15 · Satsuma Creative · 7 min read

Satsuma's in-house assistant Xiao-Ai has been live for a month. Here are all the backend numbers — costs, conversation quality, conversion rate, and three things we didn't see coming.

TL;DR

Real numbers from 30 days of running Satsuma's in-house AI assistant Xiao-Ai:

  • 47 visitors · 328 messages · 66% left contact info · 9 hot leads · 2 progressed to formal brief
  • Surprise #1:66% left contact info(expected 30%) — because the AI asks proactively without nagging
  • Surprise #2: 80% of the questions AI couldn't answerweren't knowledge base gaps — they were vague visitor questions → Changed the UNKNOWN handling strategy
  • Surprise #3: visitors' favorite activity isstress-testing the AI(deliberately asking things not in the KB to see how the AI responds) — and this brought in the most hot leads
  • What didn't pan out: conversion didn't hit our target because many visitors are still in the "just browsing" stage
  • 100% of conversations are logged, and in 30 days she hasn't gotten a single answer wrong

Or: what actually happens when a company website goes from "static directory" to "a coworker who talks back."


I hesitated three times before writing this.

Exposing your own data carries two risks: people see it whether it's good or bad, and competitors will copy. But strategically,not exposing it is the bigger loss— what we sell is "the AI coworker niche," and showing the real results from running it ourselves is the strongest possible advertisement.

So below is the real data (visitor identifiers anonymized), including three things wegenuinely didn't anticipate.


The setup

  • Site: satsumacreative.tw, an integrated marketing agency website
  • Xiao-Ai launch date: late April 2026
  • Observation period: 30 days
  • Knowledge base: 22 entries (later expanded to 38)
  • Tech: Claude + RAG vector retrieval

If you're curious about the technical details:What is RAG? →. Below we only discussresults


30-day numbers at a glance

總對話訪客:          47 人
其中留下聯絡方式:     31 人(66%)
總對話訊息:           328 則(訪客發 174、小愛回 154)
平均每訪客對話:        7 輪
AI 平均回應時間:      11.4 秒
hot lead 識別:        9 人(由 AI 自動偵測 [HANDOFF] 觸發)
轉真人(找真人 + UNKNOWN auto): 4 人
TG 雙向處理(專員親回): 2 人
從 chat 直接導到合作表單: 3 件
表單轉換 → 實際 brief: 2 件(進行中)

Total investment: 8 hours of design + launch (in-house), then 5-10 minutes a day reviewing the backend


Surprise #1: 66% of visitors left contact info

I had originally estimated40% would already be high. The reasoning: visitors come to look at information; asking them to leave an email before talking will make most of them just close the tab.

In practice it was 66%. Two reasons (in hindsight):

  1. The gate is conversational, not a form: Xiao-Ai says, "Mind telling me what to call you? And drop an email or phone — either one works," so it feels like chatting, not filling out a form
  2. You can chat without leaving info: we "ask once and then let you through," so visitors know they're not locked in.More people end up leaving info as a result

Lesson:don't force it — showing a little vulnerability gets more people to opt in. Same principle as cold outreach: the less you ask for, the more people give.


Surprise #2: 80% of the "AI couldn't answer" cases weren't KB gaps — they were vague visitor questions

The most common [UNKNOWN] message types:

  1. "How are you guys?" (in what sense? business? weather?) — 32%
  2. "I want to work with you" (on what?) — 21%
  3. "How much is the package?" (which package? which product line?) — 18%
  4. Actual KB gaps (e-commerce, ERP, niche industries) — 19%
  5. Other (random typing, testing, phishing) — 10%

→ 80% isvague visitor phrasing, and adding to the KB doesn't help —you can only have the AI ask clarifying questions

So I added an [CLARIFY] action tag that lets Xiao-Ai proactively ask for clarification when a question is too vague, rather than immediately escalating. Month-two data showed this drop to 11%.

Lesson:[UNKNOWN] don't treat everything as a KB gap. Categorize first — most of the time it's NLU, not NLG


Surprise #3: visitors love playing "stump the AI"

Expected: visitors would askfunctional questions

about services, pricing, and process Actual: roughly30% of conversationswere visitors "」——

  • testing the AI
  • "What's your name?" (probing the AI's personality)
  • "Are you human?" (testing the Turing line)
  • "Do you speak English?"
  • "Do you do e-commerce?" "Do you do the adult industry?" (deliberately asking things not in the KB)

"Can you tell a joke?"At first I thought this was "unproductive conversation" and that we should steer visitors back on topic. I changed my mind later:

these conversations are exactly the demo's selling point.Visitor stress-tests the AI → AI responds honestly (answers what it knows, says it doesn't know what it doesn't) → visitor thinks "oh, she's not making things up

" → trust is built → then they go on to ask what they actually wanted to ask.


Lesson: at the demo stage, an AI's ability to "not bullshit" is 100x more important than answering fast

One expectation that didn't pan out

An article can't only cover the wins. One thing that didn't work:

I thought Xiao-Ai would "proactively pitch" Satsuma's services. In practice she's very passive.When a visitor asks "how to shoot a TVC," she answers about TVC,and rarely extends to "by the way, we also do AI coworkers"

. Even though the system prompt says "cross-sell proactively when appropriate," Claude doesn't follow that instruction.Two hypotheses: - A. Claude is trained to lean toward "answer only what was asked

," and proactive cross-selling conflicts with that training objective - B. My prompt isn't strong enoughIn month two I'll rewrite the prompt with higher weighting on this and test whether it improves.


This is an experiment in progress

Who would I recommend this to right now?Strongly recommend

: - Companies that already have a brand book and run ad campaigns, but customer service can't keep up - Mid-sized B2B / SaaS / service businesses where customers ask context-dependent questions - Anyone who wants to use AI customer service as a sales funnel, not just to answer FAQsNot recommended

(just buy SaaS): - Monthly interaction volume < 500 - Pure e-commerce with standard order-related support - No patience to write a persona for the AIDetailed evaluation logic:


AI customer service selection guide →

Try Xiao-Ai yourself

Better than just reading — go play a round:

find Xiao-Ai in the bottom-right corner of the homepageand throw a few hard questions at her.She won't make things up


— that's what we put the most effort into guaranteeing.

[ ] 把 [UNKNOWN] 細分為 [CLARIFY] 跟 [ESCALATE] 兩類
[ ] 補強 cross-sell prompt
[ ] 開放小愛回英文 / 日文(已加 system prompt)
[ ] 寫一個 admin gaps 頁,把 [UNKNOWN] 群聚出來,直接導入 KB
[ ] 加自動週報(每週統計 hot lead + 對話量寄到 TG)

What we'll do in month two


Satsuma Creative

In month two I'll write a follow-up looking at what worked and what was over-engineered.