Agent 需要 Runbook，不是更长的 Prompt

Rohit Ghumare 发了一篇关于生产级 Agent 的 thread，核心判断很直接：如果你的 Agent 工作流依赖 "IMPORTANT / DO NOT SKIP / MAKE SURE YOU VERIFY" 这类短语，你不是在构建 Agent 系统，你是在和一个随机实习生谈判。

问题不在 Agent 不够聪明

真正的工作不是 single prompt。真正的工作是：

读 issue → 检查代码库 → 理解现有约定 → 做修改 → 跑测试 → debug 失败 → 更新计划 → 检查 diff → 请求审批 → 创建 PR → 监控 CI → 回应 review → 必要时回滚

这不是"聊天"。这是一个运营循环（operating loop）。而运营循环需要 runbook，不是 vibe。

我们已经在 DevOps 里学过这一课

DevOps 成熟之前，基础设施工作活在人的脑子里：

部署是部落知识
事故响应是 Slack 混乱
生产 debug 意味着找到那个知道该 SSH 进哪台服务器的资深工程师

然后我们慢慢把工作移进系统：runbook、CI/CD pipeline、健康检查、告警、dashboard、日志、回滚脚本、基础设施即代码、postmortem。

教训很简单：如果流程重要，就编码它。不要指望有人在凌晨 2 点记得它。

现在我们正在对 Agent 重复同样的错误——把所有运营知识塞进一个 giant prompt，然后惊讶于 Agent 忘掉了某些东西。

Prompt 不是 Runbook

Prompt 可以描述应该发生什么。Runbook 控制能发生什么。这个区别很重要。

Prompt 说	Runbook 说
"Run tests before finalizing"	`step: run_tests, command: pytest, required: true, on_failure: stop_and_debug`
"Be careful with production data"	`permissions: read: staging_db, write: none, production: denied`
"Create a good PR"	`before_pr: [check_diff_size, run_linter, run_tests, verify_no_secrets, summarize_risk, request_human_review]`

一个是建议，另一个是基础设施。

验证必须外部化和程序化

最常见的失败模式：让同一个模型做工作和验证工作。这就像问部署脚本"生产环境健康吗"而不检查指标。答案可能是对的，但这不是控制系统。

验证必须变成外部和程序化的：

tests must actually run
screenshots must actually be captured
API responses must actually be checked
files must actually exist
diffs must actually be inspected
links must actually resolve
commands must return real exit codes
generated claims must cite real sources

如果 Agent 说 "done" 但没有系统验证它，工作没有完成——它只是被自信地描述了。

未来 Agent 栈看起来很无聊

这就是重点。下一代有用的 Agent 基础设施看起来不像魔法，更像 DevOps：

task queues
worker boundaries
tool registries
memory scopes
approval gates
logs
traces
policy files
evals
CI checks
rollback hooks

Agent 不会是一个 giant autonomous brain。它会是一个受控运行时里的 worker。

系统决定：

下一步是什么
哪些工具可用
什么算成功
失败时怎么办
什么时候必须人工审批

可靠性来自这里，不是来自给 system prompt 再加一段。

错误的抽象："Agent 作为员工"

很多人把 Agent 描述成员工：给目标、给工具、让它自己 figuring out。这很诱人，也很危险。

更好的抽象是：Agent 作为生产 worker。Worker 有：

a job
a queue
permissions
inputs
outputs
logs
retries
failure states
escalation paths

你不会让 worker "be careful"。你约束环境，让 carelessness 不能摧毁系统。

CLAUDE.md 不够

Project instructions 有用，skills 有用，memory 有用——但它们本身不够。

CLAUDE.md 可以告诉 Agent 项目怎么工作，不能保证 Agent 遵循发布流程
skill 可以教 Agent 工作流，不能保证工作流被正确执行
memory 可以保留上下文，不能替代验证

真正的 Agent 系统需要两层：

知识层： docs、skills、memory、conventions
控制层： state machine、tool policy、checks、approvals、rollback

大多数团队过度投资第一层， under-building 第二层。

Agent 平台是新的平台团队问题

Agent 正在成为另一类生产 actor，会 touch：

repos、CI/CD、cloud accounts、tickets、dashboards、incidents、docs、customer data、internal tools、deployment workflows

如果你不会给 junior engineer 直接访问生产的权限而不设 guardrails，你也不应该给 Agent。而 Agent 要在你的工程系统里操作，它需要和其他一切一样的平台原语：identity、access control、audit logs、environment boundaries、policy、reproducibility、recovery。

Agent 不是替代平台，Agent 是平台的客户。