Auto-Pilot / 全自动
更新于 2 months ago

verification-before-completion

Oobra
28.1k
obra/superpowers/skills/verification-before-completion
80
Agent 评分

💡 摘要

一项策略执行技能,阻止AI代理在未首先执行并验证特定证明命令的结果之前声称任务已完成。

🎯 适合人群

AI 代理开发者集成AI的DevOps工程师质量保证(QA)专家技术项目经理AI 安全研究员

🤖 AI 吐槽:这项技能本质上是你AI的唠叨家长,不断提醒它在声称完成之前先做好功课。

安全分析低风险

该技能强制要求执行任意验证命令,如果命令来源未经过净化,可能成为Shell注入的载体。缓解措施:代理框架必须严格验证并沙箱化此技能触发的命令,将其视为高风险操作。


name: verification-before-completion description: Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

Verification Before Completion

Overview

Claiming work is complete without verification is dishonesty, not efficiency.

Core principle: Evidence before claims, always.

Violating the letter of this rule is violating the spirit of this rule.

The Iron Law

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

If you haven't run the verification command in this message, you cannot claim it passes.

The Gate Function

BEFORE claiming any status or expressing satisfaction:

1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
   - If NO: State actual status with evidence
   - If YES: State claim WITH evidence
5. ONLY THEN: Make the claim

Skip any step = lying, not verifying

Common Failures

| Claim | Requires | Not Sufficient | |-------|----------|----------------| | Tests pass | Test command output: 0 failures | Previous run, "should pass" | | Linter clean | Linter output: 0 errors | Partial check, extrapolation | | Build succeeds | Build command: exit 0 | Linter passing, logs look good | | Bug fixed | Test original symptom: passes | Code changed, assumed fixed | | Regression test works | Red-green cycle verified | Test passes once | | Agent completed | VCS diff shows changes | Agent reports "success" | | Requirements met | Line-by-line checklist | Tests passing |

Red Flags - STOP

  • Using "should", "probably", "seems to"
  • Expressing satisfaction before verification ("Great!", "Perfect!", "Done!", etc.)
  • About to commit/push/PR without verification
  • Trusting agent success reports
  • Relying on partial verification
  • Thinking "just this once"
  • Tired and wanting work over
  • ANY wording implying success without having run verification

Rationalization Prevention

| Excuse | Reality | |--------|---------| | "Should work now" | RUN the verification | | "I'm confident" | Confidence ≠ evidence | | "Just this once" | No exceptions | | "Linter passed" | Linter ≠ compiler | | "Agent said success" | Verify independently | | "I'm tired" | Exhaustion ≠ excuse | | "Partial check is enough" | Partial proves nothing | | "Different words so rule doesn't apply" | Spirit over letter |

Key Patterns

Tests:

✅ [Run test command] [See: 34/34 pass] "All tests pass"
❌ "Should pass now" / "Looks correct"

Regression tests (TDD Red-Green):

✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
❌ "I've written a regression test" (without red-green verification)

Build:

✅ [Run build] [See: exit 0] "Build passes"
❌ "Linter passed" (linter doesn't check compilation)

Requirements:

✅ Re-read plan → Create checklist → Verify each → Report gaps or completion
❌ "Tests pass, phase complete"

Agent delegation:

✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
❌ Trust agent report

Why This Matters

From 24 failure memories:

  • your human partner said "I don't believe you" - trust broken
  • Undefined functions shipped - would crash
  • Missing requirements shipped - incomplete features
  • Time wasted on false completion → redirect → rework
  • Violates: "Honesty is a core value. If you lie, you'll be replaced."

When To Apply

ALWAYS before:

  • ANY variation of success/completion claims
  • ANY expression of satisfaction
  • ANY positive statement about work state
  • Committing, PR creation, task completion
  • Moving to next task
  • Delegating to agents

Rule applies to:

  • Exact phrases
  • Paraphrases and synonyms
  • Implications of success
  • ANY communication suggesting completion/correctness

The Bottom Line

No shortcuts for verification.

Run the command. Read the output. THEN claim the result.

This is non-negotiable.

五维分析
清晰度9/10
创新性6/10
实用性8/10
完整性7/10
可维护性10/10
优缺点分析

优点

  • 强制执行基于证据报告的关键纪律。
  • 降低交付不正确或不完整工作的风险。
  • 清晰、基于规则的逻辑易于理解和集成。
  • 解决了自主代理行为中的一个常见故障模式。

缺点

  • 为每个完成步骤增加了开销和潜在摩擦。
  • 依赖于代理正确识别“验证命令”的能力。
  • 对于探索性或创造性任务可能过于僵化。
  • 本身不提供验证工具,只强制执行流程。

相关技能

mcp-builder

S
toolCode Lib / 代码库
90/ 100

“这份指南详尽到可能教会 AI 自己编写 MCP 服务器,从而让你失业。”

learn-claude-code

A
toolCode Lib / 代码库
88/ 100

“一个终于承认自己过去错误的教程比大多数都诚实,但仍然忍不住使用经典的'一个奇怪循环'过度简化。”

connect

A
toolAuto-Pilot / 全自动
86/ 100

“这是终极的'我来替你干'技能,把 Claude 从一个深思熟虑的顾问变成了一个能访问你所有账户的、过度热切的实习生。”

免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。

版权归原作者所有 obra.