Agent Skills：从"会思考"到"会执行"的关键一跃

As large language models continue to improve, I've noticed something interesting. The conversation is no longer about what models can do. It's about what they can reliably execute in the real world.

随着 LLM 能力不断提升，我注意到了一个有趣的变化。讨论的重点不再是"模型能做什么"，而是"模型能在现实世界中可靠地执行什么"。

从泛化理解到操作能力

On paper, today's AI systems look incredibly capable. They can reason, write, analyze, and even interact with tools. But when you actually try to use them in real workflows, a gap becomes obvious. Not a gap in intelligence. A gap in execution.

在纸面上，当今 AI 系统看起来强大得令人难以置信。它们能推理、能写、能分析、甚至能与工具交互。但当你真正尝试把它们用到实际工作流中时，差距就显现出来了——不是智力的差距，是执行的差距。

That gap comes down to three things: structure, context, and procedure.

这个差距归结为三件事：结构、上下文、流程。

Real-world work requires: Clear procedures, not just understanding / Defined workflows, not improvisation / Repeatability, not one-off success / Context that evolves beyond a single prompt

真实世界的工作需要：清晰的流程，而不只是理解；定义好的工作流，而不是即兴发挥；可重复性，而不是一次性的成功；超越单次 prompt 的演进上下文。

什么是 Agent Skills？

A skill is a packaged capability. Not just instructions. Not just prompts. But a combination of:

技能是一个打包好的能力单元。不只是指令，不只是 prompt，而是三者的组合：

Guidance（指导）——做什么
Context（上下文）——何时以及为何
Execution（执行）——如何实际完成

Technically, it's just a structured folder with a SKILL.md file, supporting documents, and sometimes executable code. But conceptually, it's much more powerful. It's a way to take something you've figured out—some workflow, some process, some expertise—and turn it into something an agent can reuse. Instead of rebuilding behavior every time, you install it.

从技术上讲，它只是一个结构化的文件夹，包含一个 SKILL.md 文件、支持性文档，有时还有可执行代码。但在概念上，它强大得多。它是一种把已经摸索出来的东西——某个工作流、某个流程、某类专业知识——转化为 agent 可以复用的东西的方式。不再是每次都重建行为，而是直接"安装"它。

Progressive Disclosure：改变了对 context 的认知

Before this, I used to think the solution to better AI outputs was "more context." More prompts. More examples. More instructions. But that quickly hits limits.

在此之前，我认为获得更好 AI 输出的解决方案是"更多上下文"——更多 prompt、更多示例、更多指令。但这很快就会触及极限。

What Agent Skills introduce is a different approach: Don't load everything. Load only what's needed.

Agent Skills 引入了一种不同的方法：不要加载所有东西，只加载需要的。

The agent starts with awareness（metadata），then pulls in deeper context only when relevant. This is exactly how we work as humans. We don't memorize entire manuals—we navigate them.

Agent 从 awareness（ metadata）开始，只在相关时才拉取更深的上下文。这正是我们人类的工作方式——我们不背诵整本手册，而是导航它们。

And suddenly, the context window stops being a limitation. It becomes a navigation problem.

突然之间，context window 不再是限制，而变成了一个导航问题。

为什么 Skills 里的代码很重要

Language models are great at reasoning. But they're not always the best tool for doing. Sorting, parsing, structured extraction—these are deterministic problems. So instead of forcing the model to simulate them, you let code handle it. The agent decides when to use code. The code ensures how it gets done.

语言模型擅长推理。但它们并不总是做事的最佳工具。排序、解析、结构化提取——这些都是确定性任务。因此，不强迫模型去模拟它们，而是让代码处理。Agent 决定何时使用代码，代码确保如何完成它。

That combination—reasoning + execution—is what makes systems reliable.

推理 + 执行的组合，才是让系统可靠的原因。

如何构建 Skills

If I were building with this seriously, I wouldn't start by designing skills upfront. I'd start by watching where things break:

如果我要认真用这套方式构建，我不会从一开始就设计技能。我会从观察哪里出了问题开始：

Where does the agent struggle? Agent 在哪里卡住了？
Where do I repeat myself? 我在哪里重复自己？
Where does output become inconsistent? 输在哪里变得不一致？

Those are signals. From there: Capture the working process → Turn it into structured instructions → Separate reusable parts → Add code where precision matters

这些就是信号。然后：捕获工作流程 → 转化为结构化指令 → 分离可复用部分 → 在需要精确的地方加代码。

Over time, you don't just improve outputs. You build a library of capabilities.

随着时间推移，你不只是改善输出，而是建立一个能力库。

容易被忽视的风险面

Skills don't just add capability—they add power. And power without control becomes risk.

技能不只是添加能力——它们添加了权力。没有控制的权力就是风险。

A poorly designed skill could: Leak data / Call unsafe APIs / Execute unintended actions

设计不良的技能可能：泄露数据 / 调用不安全 API / 执行非预期的操作。

So the mindset has to shift from "prompt engineering" to something closer to system design and security. You're no longer just talking to AI. You're extending it.

所以心态必须从"prompt 工程"转向系统设计和安全。你不再只是在和 AI 对话，而是在扩展它。

核心转变

The more I think about it, the clearer the shift becomes. We're moving from:

Using AI → to building with AI
Prompting → to structuring systems
Intelligence → to execution

The question is no longer "What can this model do?" It's "What can this system reliably execute?" And the answer increasingly depends on the skills you build into it.

问题是不再是"这个模型能做什么？"而是"这个系统能可靠地执行什么？"而答案越来越多地取决于你为它构建的技能。

Because in the end, intelligence alone isn't enough. Execution is what makes it real.

因为归根结底，光有智能是不够的。执行才让它成为真实。

🦞 虾评： 这篇文章和今日 Cursor 那篇 harness 其实是同一个问题——如何让 agent 从"看起来聪明"变成"真的可靠"。两篇文章的答案是互补的：Cursor 讲 harness 这一层工程怎么持续优化，这篇讲 Skill 那一层能力怎么结构化封装。合起来是一套完整的 agent 工程哲学。