14个Claude Code子Agent的60天生存实验：4个存活，10个烧光token

Mnimiy 在真实的 47K LOC TypeScript 仓库上做了 60 天实验，建了 14 个 Claude Code 子 Agent。4 个存活，10 个在一周内被删除。

关键数字：20K token 的 spawn overhead

每个 Claude Code 子 Agent 启动时，在任何实际工作之前就要消耗约 20K token。这个数字埋在社区 write-up 里，不在营销页面上。大多数 builder 永远看不到它。

实验方法

60 天，一个真实仓库（中型 TypeScript 后端，47K LOC）
每周同一套任务
HTTP proxy 记录每个 Claude Code session
对每个候选子 Agent 追踪：
- Spawn cost：第一回合消耗的 token（实际工作之前）
- 往返延迟：从 mention 到 summary 返回的 wall-clock 时间
- 输出有用性：我是否根据 summary 采取了行动，还是在主 session 里自己重做
- 存活率：60 天后是否还在 .claude/agents/ 里，且过去 7 天内至少用过 4 次

7 天窗口很重要。最初试 14 天，有 3 个额外 Agent 通过，但它们已经滑入"我记得它有用，只是最近没用"的区域。7 天强制回答：这个 Agent 是在你的 active workflow 里 earning 位置，还是只是 furniture？Furniture 子 Agent 仍然是 20K overhead。它们只是坐着不动，所以感觉免费。

仓库是真实生产代码，不是 sandbox。这很重要，因为下面一半失败模式只在代码库足够混乱、能迷惑模糊 prompt 时才出现。Clean toy project 让每个子 Agent 看起来都工作。

存活的 4 个

Survivor 1: code-reviewer

读取当前分支对 main 的 diff。只用 Read + Grep + Glob。返回 markdown checklist，对照团队 style guide 打 pass/fail。

存活原因：每次 push 前运行。捕获主 session 一直漏掉的三件事：callsite 里未 wrap 的 error、public export 缺少 JSDoc、library code 里残留的 console.log。

具体胜利：第 11 天的一个 PR，主 session 和我都觉得干净。code-reviewer 标记了一个 async chain 三帧深处的 unhandled rejection，会在凌晨 4 点的 production 暴露。这一个 catch 支付了接下来 50 次调用的成本。

模式：只读工具、单一有界输入（diff）、单一明确输出形状（checklist）。没有写权限意味着我不需要检查它碰了什么。

Survivor 2: doc-maintainer

扫描 README.md、docs/ 和 inline doc comments。标记 code 和 docs 之间的 drift。返回 stale sentences 列表带行号。

存活原因：docs 在任何超过一个季度的代码库里都会腐烂。doc-maintainer 每周运行一次，捕获人类永远不会捕获的东西——3 月改了的 function signature，README 仍然显示旧调用形状。

模式：跑在 Haiku（便宜）、紧密 scope 的工具列表、有界输入（docs directory）、持久 artifact（写入 docs-drift.md）。

Survivor 3: security-auditor

扫描代码库里的 hardcoded secrets、unsafe SQL construction、已知 dependency CVE。Read + Grep + Glob only。不能修改任何东西。

存活原因：我赶时间时会跳过的安全检查，正是它捕获的。每周一次，周五中午，deploy 周末安静之前。

60 天内，merge 前捕获了三件事：branch 上的 secrets、new endpoint 里的 unsafe query pattern、有 open CVE 的 outdated dependency。前两个最终会在 PR review 里被捕获。第三个不会。

模式：Opus 模型（微妙模式识别是 Opus 超过 Sonnet 的地方）、只读、scheduled invocation 而非 ad-hoc。

Survivor 4: test-runner

运行 test suite，parse failures，返回结构化 summary 带 file:line 和失败的 assertion。

存活原因：每次主 session 和 test suite disagreement，主 session 会 burn context 跟 stack traces 争论。Raw test output 是 6,000 token 的垃圾。test-runner 返回干净的 200-token summary。

数学很简单：200 token in，每次 test run 省 6,000 token。~10 次 test run 后，20K spawn overhead 回本。50 次后，纯 profit。

模式：narrow 工具列表（Bash + Read）、单一任务、输出故意做小。整个 point 是把 6,000 token 的 test garbage 压缩成 200 token 的 signal。

死去的 10 个及死因

Agent	死因
migration-planner	Migrations 是 sequential。Step 4 依赖 step 3 依赖 step 2。子 Agent 运行一次返回 summary，无法随着新信息出现继续规划。每个"plan"都缺少 mid-execution 才明显的约束。教训：子 Agent 丢失 context。Sequential work 需要 context retention。不要把 sequential work 给子 Agent。
dep-auditor	已经有工具做这个（npm audit, pip-audit 等）。我的子 Agent 是 20K-token wrapper 围绕一个 0.4 秒的命令。20K overhead 单独就比直接跑 npm audit 47 次还贵。教训：如果 deterministic CLI 能做，子 Agent 是纯 waste。
type-checker	tsc --noEmit 2 秒搞定。同样的失败形状。子 Agent 花 22 秒 23K token 告诉我编译器免费告诉我的东西。教训：把 fast CLI 包在 LLM 里是 theater，不是 productivity。
perf-profiler	Profiling 需要真实 runtime 数据。子 Agent 能读 code 但不能对 live process 跑 profiler。每个"performance review"都是 speculation 打扮成 analysis——"如果输入大这个 loop 可能慢"而不是"我们测量过这个 loop 慢"。教训：需要访问不了的数据的子 Agent 生产 storytelling，不是 work。
commit-formatter	Trivial task。把 "wip: fix" 重格式化成 "fix(auth): handle null token in refresh path" 主 session 花 1 秒。Spawn 子 Agent：3-5 秒 overhead 加几百 token。每次调用负 ROI。教训：trivial tasks 不是子 Agent work。它们是 inline work。
pr-summarizer	主 session 已经有完整 PR context。它刚写了 code。让子 Agent "summarize" 主 session 已经知道的东西，是 double context cost 换 zero new signal。子 Agent 返回的 summary，predictably，比主 session 直接写的差。教训：如果主 session 有 context，子 Agent 没什么可加的。
branch-renamer	4 行 shell script 搞定。git branch -m old new 然后 remote push。建子 Agent 花的时间比手工跑 200 次还长。教训：有时候建子 Agent 的冲动是 building 的冲动，不是 solving 的冲动。Catch yourself。
readme-updater	和 doc-maintainer 混淆，从未有清晰边界。两个 Agent 都会碰 README，有时 overwrite 彼此的 suggestions。建在我理解"两个子 Agent 写同一个文件是子 Agent structurally 无法解决的协调问题"之前。教训：如果两个子 Agent 需要同一个 output，你设计错了。Merge 它们，或者 scope 它们不重叠。
env-validator	父 session 已经知道 env。子 Agent 重新加载一切，validate nothing new，返回 "looks fine"。23K token overhead 换一句我自己能写的句子。教训：如果子 Agent 的工作是确认你已经知道的东西，它没在做 work。
deploy-checker	跑 pre-deploy validation。有用 idea，wrong implementation。子 Agent 不能在 sandbox 里实际跑 deploy，所以 inspect scripts 返回 probabilistic "this should work" 答案。我需要 yes/no with proof。我得到的是 vibes-based summary。教训：子 Agent 不擅长 high-stakes binary decisions 当它们没有 runtime 来 verify。

3 个存活特征

Trait 1 — Single responsibility

每个存活者做一件事。code-reviewer reviews。test-runner runs tests。doc-maintainer audits docs。security-auditor scans for security issues。

死去的 10 个试图跨多个 concerns 当 helper。deploy-checker validated, scanned, and predicted。migration-planner planned, sequenced, and adjusted。每个额外 responsibility 都是一个额外 failure mode。

第 22 天尝试建了一个 pr-flow agent，把 code-reviewer、test-runner、pr-summarizer bundle 成一个。看起来 elegant。4 天内死亡，因为每个 individual job bundle 后都变差了。

规则：如果你不能用一句话一个动词描述子 Agent 的工作，你已经输了。

Trait 2 — Bounded context

每个存活者读取特定的、有限的输入集。code-reviewer 读 diff。doc-maintainer 读 docs/。test-runner 读 test output buffer。security-auditor 读 codebase 但只通过 grep-style 工具。

死去的 10 个要么需要整个项目的 context（它们得不到），要么被给 vague 输入（"the codebase"）导致 token 消费膨胀到 spawn cost 超过任何 savings。

规则：如果子 Agent 的输入是 "look at the project"，现在就 delete 它。

Trait 3 — Observable output

每个存活者返回结构化、有限的输出。code-reviewer 返回 markdown checklist。doc-maintainer 写 docs-drift.md。test-runner 返回 200-token failure summary。security-auditor 返回 numbered findings list。

死去的 10 个返回 narratives。"I checked the deploy and it looks like it should work." "The performance profile suggests there might be a bottleneck around..." Narratives 是 unverifiable 的。Narratives 浪费你的时间。

规则：如果你在建之前写不出子 Agent 的 output schema，这个子 Agent 没有存在的 business。

子 Agent YAML 模板

这是现在每个新子 Agent 用的 frontmatter 模板。丢进 .claude/agents/<name>.md，填四个字段，写 body。如果四个都填不干净，子 Agent 不应该存在。

---
name: <one-word, hyphenated if needed>
description: Use this agent when <one sentence, action verb, specific trigger>.
model: <haiku | sonnet | opus | inherit>
tools: <comma-separated, scoped tight>
---

# Role

You <one sentence — what you do, nothing more>.

# When invoked

1. <step 1, concrete, verb-led>
2. <step 2>
3. <step 3>

# Input

You receive: <one sentence describing exact input shape>.

# Output

Return: <exact output schema — markdown table, JSON, checklist, etc.>.
Length: <hard cap on output tokens>.
Format rules: <specific formatting constraints>.

# Never

- <thing this agent should never do>
- <thing this agent should never do>

模板在写任何 Agent logic 之前强制三个存活特征。描述有两个动词——已经违反 trait 1。工具列表超过四个——已经违反 trait 2。Output section 说 "report findings" 而没有 schema——违反 trait 3。

Pre-build Checklist

建新子 Agent 之前，走这七个问题。任何答案不确定，先别建。

## Sub-agent pre-build checklist
- [ ] Can I describe the job in one sentence with one verb?
- [ ] Is the input bounded (a file, a diff, a directory — not "the project")?
- [ ] Can I write the output schema before building?
- [ ] Is there a deterministic CLI that already does 80% of this?
- [ ] Does the main session already have the context this needs?
- [ ] Is this task sequential (each step depends on the previous)?
- [ ] Would I still build this if it cost me 20K tokens every spawn?

如果 #4 是 yes，用 CLI。#5 是 yes，inline 做。#6 是 yes，不要子 Agent 它。如果 #7 是 "no"，delete draft。

没写进文章但 worth naming

Multi-agent orchestration：尝试让子 Agent 调用其他子 Agent。Anthropic 明确不支持，workarounds（通过主 session 路由）加的 overhead 比原始任务还多。等 agent teams（separate product）而不是强行做。

Background sub-agents：Background spawn 听起来很棒。实践中，summary 返回时我已经 lost 了我问什么的 thread。Background work 需要 persistent state，子 Agent 没有。Stick to synchronous spawning。

Cross-project agents：把 Agent 放 ~/.claude/agents/ 让它在每个 project 工作听起来干净。实践中，code-reviewer 需要 team 的 style guide，这是 project-specific。User-level agents drift toward generic 并 lose value。Project-level wins for anything domain-specific。

Self-improving agents：尝试让 Agent 读自己过去的 summaries 并 refine prompt。Improvements weren't real，meta-loop 吃 token。Weekly 手动读 bad outputs 并 patch system prompt by hand。

结论

子 Agent 不是 X discourse 卖的 Agent stack revolution。它们是 narrow tool with narrow fit。

4 个存活者 earning 位置是因为它们 fit 子 Agent structurally 擅长的东西：isolated、single-responsibility work with bounded inputs and observable outputs。10 个死去的试图成为子 Agent 不能成为的东西——coordinators、planners、runtime verifiers、context-sharing partners。

Meta-lesson：每个你建的子 Agent 都是一个 bet，赌 20K token overhead 对后面的 work worth it。大多数时候，honest answer 是 no。"Yes" 的 bar 比 X discourse 告诉你的高。

如果你现在 .claude/agents/ 里有 8 个子 Agent，用 pre-build checklist 走一遍。我 audit 过的 team 第一次 pass 删除 60-70%。存活的真的 earning keep。

第一个 worth building 的子 Agent 几乎总是 code-reviewer。它自然命中所有三个特征，cost-benefit 从 day one 就 obvious，read-only constraint 让它 impossible to break anything。从那里开始。先感受 well-shaped sub-agent 长什么样，再建更 ambitious 的。死去的 10 个里大多数存在是因为我在理解要找什么形状之前就建了它们。