/goal 命令的产品经理视角：从模糊需求到可验证目标

核心洞察

/goal 不是「更智能的模型」，而是「Ralph Wiggum 循环的产品化封装」。

关键不是模型变聪明了，而是每次运行开始时重新加载持久文件——spec、plan、task list、test suite、status notes。

对话可以腐烂，但真相源始终在外部。

/goal 的本质

工作机制

用户给目标 → Agent 持续工作
                ↓
    单独的评估器 / 长期运行的 harness 检查目标是否满足
                ↓
    Agent 规划、编辑、运行测试、修复失败、更新状态、继续
                ↓
    直到目标达成或循环到达边界

Ralph Wiggum 循环的粗糙版本

loop:
  给 Agent 相同的 repo 上下文
  读取 spec 和 implementation plan
  选择下一个未检查任务
  完成任务
  写或运行测试
  测试通过 → 标记任务完成
  重新开始

有用部分不是更聪明的模型，而是每次运行开始时的 fresh context。

核心原则

循环的好坏取决于它重新加载的 plan、tests、acceptance criteria 和留下的 evidence。

对 PM 工作的影响

需求范式的转变

旧范式	新范式
"写足够多的细节让工程师理解意图"	"定义 done 足够清晰，让 Agent 能持续尝试"
目标：人类理解	目标：harness 能检查证据
产出： prose（散文）	产出：target state（目标状态）

这是比普通 ticket 高得多的 bar。

杠杆效应

你越远离循环（让 Agent 自主运行），杠杆越大。

但设置和规划变得越重要。

弱版本 vs 强版本的 /goal

弱版本：像愿望

/goal improve onboarding

Agent 能做的事：

重命名按钮
添加 checklist
修改 copy
简化屏幕
创建测试
做出看起来合理的 PR

问题：goal 本身没有给 Agent 任何方式知道 onboarding 是否改善了。

结果：Agent 优化最容易证明的东西：

UI 更干净（因为截图看起来更干净）
测试通过（因为测试通过）
步骤减少（因为步骤少听起来更好）
满足评估器（因为 transcript 包含工作发生的证据）

但产品不一定变好。

同样的失败出现在 PM 需求中

弱需求	问题
"make activation easier"	模糊
"reduce friction"	无法验证
"improve the dashboard"	没有完成标准
"clean up the settings experience"	范围不清
"make search smarter"	不可测

人类会通过对话拯救这些 ticket。Agent 在无人值守时更差——会把模糊性转化为实现。

强版本：给循环终点线、证明方法和边界

/goal implement the new onboarding checklist from docs/onboarding-spec.md
  All acceptance criteria in the spec must pass
  npm test -- onboarding exits 0
  npm run lint exits 0
  No files outside app/onboarding, components/onboarding, or test/onboarding are changed
  Stop after 20 turns with a status report if any criterion remains blocked

不是完美的产品需求，但给了 Agent 接近可评估条件的东西。

在 spec 内 sharpening 产品意图

验收标准示例：

The checklist is done when:
- first-time users see three setup steps after account creation
- each step has a visible complete/incomplete state
- completing a step persists after refresh
- users can skip the checklist and return to it later
- existing users do not see the checklist unless they have no completed setup steps
- analytics emits onboarding_checklist_viewed, onboarding_step_completed, and onboarding_checklist_dismissed
- the empty state links to the checklist when no setup steps are complete

配套验证证据：

Validation evidence required:
- unit tests for persisted step state
- integration test for first-time user visibility
- regression test for existing users
- browser smoke test for completing, refreshing, and dismissing the checklist
- screenshot or DOM evidence for the empty state link
- event capture test or mocked analytics assertion for each event

现在 /goal 有东西可用了。

循环可以：

运行测试
展示输出
检查变更的文件
报告哪些标准仍被阻塞

评估器可以判断 transcript，因为 transcript 包含 proof 而非 vibes。

PM 必须 internalize 的差异

类型	本质	结果
Prompt	要求 effort（努力）	Agent 持续工作
Contract	定义 effort 停止的条件	Agent 知道何时完成

观察前几次循环

最佳建议：watch the first iterations。

不要：

写巨大 spec
启动循环
合上笔记本
希望机器人做产品开发

要：

启动循环
观察 Agent 如何处理指令
误解目标 → 停止、编辑 spec、重启
写了坏测试 → 停止、修复验证协议、重启
触碰无关文件 → 停止、添加范围边界、重启
反复问同样问题 → 停止（spec 有歧义，人类在默默解决）

**前几次循环是校准。**它们教你 Agent 如何解释 plan。

这看起来像是「不信任 Agent」，但恰恰相反——观察前几次迭代是让后来的无人值守工作不那么愚蠢的方法。

循环在 spec 经受住模型接触后才变得有用。

/goal 真正擅长的场景

1. 迁移工作（自然适合）

/goal migrate all imports from legacyAuth to authClient in app/auth
  No legacyAuth imports remain in app/auth
  npm test -- auth exits 0
  npm run typecheck exits 0
  Stop after 15 turns if any usage is ambiguous

2. 清理积压（自然适合）

/goal resolve every failing test marked @auth-regression
  Each fix must include the smallest relevant production change
  Do not delete tests
  After each fix, update docs/status-auth-regressions.md with cause, files changed, and validation output

3. 文件拆分（自然适合）

/goal split app/components/AccountSettings.tsx into modules under 250 lines
  Behavior must stay the same
  Existing tests must pass
  No new component should mix billing, profile, and notification concerns

4. 暴力测试（另一个好选择）

Ralph 在以下场景很强：

持续尝试 attack vectors
checkout paths
login flows
search cases
forms
permissions
edge states
unhappy paths
直到队列清空

5. 探索模式（可以工作，但期望应不同）

探索目标应产生 learning，不是 production code：

/goal explore three viable approaches for making project search faster
  For each approach, create a short note with expected complexity, risks, files touched, and how we would validate it
  Do not edit production code
  Stop after producing docs/search-speed-options.md

危险版本：让探索循环默默变成生产循环。

PM 应该停止给 Agent 的指令形状

❌ 形容词

不要	替换为
make it better	Reduce empty-state decision path from four visible actions to two
make it cleaner	Keep "Import CSV" and "Create manually", move advanced settings behind disclosure
make it easier	Add regression test that empty state still exposes both setup paths
make it smarter	（具体定义「smart」意味着什么）

❌ Vibe

不要	替换为
polish the onboarding flow	After signup, users land on setup checklist, first incomplete step expanded, completing workspace-profile marks complete without full-page refresh, API failure returns to incomplete with existing toast error pattern

❌ 伪装成产品目标的实现偏好

不要	替换为
Refactor settings into cleaner architecture	Split settings so billing, profile, and notification changes can be tested independently, each module owns form state/validation/save action, existing behavior must not change, done when each module has its own test file and regression suite passes

实用的 Goal 模板

/goal [specific target state]

Source of truth:
- read [spec file]
- follow [implementation plan]
- update [status file]

Acceptance criteria:
- [observable behavior 1]
- [observable behavior 2]
- [negative case]
- [non-regression condition]

Validation:
- [test command]
- [lint/typecheck/build command]
- [browser/visual/manual evidence if needed]

Boundaries:
- only edit [paths]
- do not change [systems]
- preserve [contract/data/API behavior]

Loop behavior:
- after each meaningful change, run relevant validation
- update status file with changed files, result, and remaining risk
- stop after [N turns/time] if blocked and report blocker

状态文件：持久记忆层

本质：JIRA epic 的重构。

内容：

what changed
which checks passed
which checks failed
what decision the agent made
what remains risky
what a human should inspect next

作用：

避免上下文腐烂
每次 fresh turn 重新加载 spec 和 status
不用从腐烂的对话中重建项目

核心结论

工具是新的，但标准是旧的。

定义 done、证明 done、把证明放在聊天之外。

运营规则：

有好测试和清晰验收标准的团队 → 获得更有用的循环
需求模糊的团队 → 得到更长、更快、更浪费 token 的 mush

资源

作者：George from 🕹prodmgmt.world (@nurijanian)
原文：https://x.com/nurijanian/status/2055927283991654775
Claude Code /goal：https://code.claude.com/docs/en/goal
Codex /goal：https://developers.openai.com/cookbook/examples/codex/using_goals_in_codex
PM OS：https://prodmgmt.world/ai-pm-os
Ryan Singer Shaping Skills（Basecamp 前产品）
Matt Pocock /grill-me