Auto-Pilot / 全自动

更新于 6 months ago

verification-before-completion

Name: verification-before-completion
Rating: 4.0 (28144 reviews)
Author: obra

Oobra

28.1k

obra/superpowers/skills/verification-before-completion

Agent 评分

💡 摘要

一项策略执行技能，阻止AI代理在未首先执行并验证特定证明命令的结果之前声称任务已完成。

🎯 适合人群

AI 代理开发者集成AI的DevOps工程师质量保证（QA）专家技术项目经理AI 安全研究员

🤖 AI 吐槽: “这项技能本质上是你AI的唠叨家长，不断提醒它在声称完成之前先做好功课。”

安全分析低风险

该技能强制要求执行任意验证命令，如果命令来源未经过净化，可能成为Shell注入的载体。缓解措施：代理框架必须严格验证并沙箱化此技能触发的命令，将其视为高风险操作。

name: verification-before-completion description: Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

Verification Before Completion

Overview

Claiming work is complete without verification is dishonesty, not efficiency.

Core principle: Evidence before claims, always.

Violating the letter of this rule is violating the spirit of this rule.

The Iron Law

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

If you haven't run the verification command in this message, you cannot claim it passes.

The Gate Function

BEFORE claiming any status or expressing satisfaction:

1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
   - If NO: State actual status with evidence
   - If YES: State claim WITH evidence
5. ONLY THEN: Make the claim

Skip any step = lying, not verifying

Common Failures

| Claim | Requires | Not Sufficient | |-------|----------|----------------| | Tests pass | Test command output: 0 failures | Previous run, "should pass" | | Linter clean | Linter output: 0 errors | Partial check, extrapolation | | Build succeeds | Build command: exit 0 | Linter passing, logs look good | | Bug fixed | Test original symptom: passes | Code changed, assumed fixed | | Regression test works | Red-green cycle verified | Test passes once | | Agent completed | VCS diff shows changes | Agent reports "success" | | Requirements met | Line-by-line checklist | Tests passing |

Red Flags - STOP

Using "should", "probably", "seems to"
Expressing satisfaction before verification ("Great!", "Perfect!", "Done!", etc.)
About to commit/push/PR without verification
Trusting agent success reports
Relying on partial verification
Thinking "just this once"
Tired and wanting work over
ANY wording implying success without having run verification

Rationalization Prevention

| Excuse | Reality | |--------|---------| | "Should work now" | RUN the verification | | "I'm confident" | Confidence ≠ evidence | | "Just this once" | No exceptions | | "Linter passed" | Linter ≠ compiler | | "Agent said success" | Verify independently | | "I'm tired" | Exhaustion ≠ excuse | | "Partial check is enough" | Partial proves nothing | | "Different words so rule doesn't apply" | Spirit over letter |

Key Patterns

Tests:

✅ [Run test command] [See: 34/34 pass] "All tests pass"
❌ "Should pass now" / "Looks correct"

Regression tests (TDD Red-Green):

✅ Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass)
❌ "I've written a regression test" (without red-green verification)

Build:

✅ [Run build] [See: exit 0] "Build passes"
❌ "Linter passed" (linter doesn't check compilation)

Requirements:

✅ Re-read plan → Create checklist → Verify each → Report gaps or completion
❌ "Tests pass, phase complete"

Agent delegation:

✅ Agent reports success → Check VCS diff → Verify changes → Report actual state
❌ Trust agent report

Why This Matters

From 24 failure memories:

your human partner said "I don't believe you" - trust broken
Undefined functions shipped - would crash
Missing requirements shipped - incomplete features
Time wasted on false completion → redirect → rework
Violates: "Honesty is a core value. If you lie, you'll be replaced."

When To Apply

ALWAYS before:

ANY variation of success/completion claims
ANY expression of satisfaction
ANY positive statement about work state
Committing, PR creation, task completion
Moving to next task
Delegating to agents

Rule applies to:

Exact phrases
Paraphrases and synonyms
Implications of success
ANY communication suggesting completion/correctness

The Bottom Line

No shortcuts for verification.

Run the command. Read the output. THEN claim the result.

This is non-negotiable.

五维分析

清晰度9/10

创新性6/10

实用性8/10

完整性7/10

可维护性10/10

优缺点分析

优点

强制执行基于证据报告的关键纪律。
降低交付不正确或不完整工作的风险。
清晰、基于规则的逻辑易于理解和集成。
解决了自主代理行为中的一个常见故障模式。

缺点

为每个完成步骤增加了开销和潜在摩擦。
依赖于代理正确识别“验证命令”的能力。
对于探索性或创造性任务可能过于僵化。
本身不提供验证工具，只强制执行流程。