💡 摘要
Terraform 和 OpenTofu 最佳实践综合指南,涵盖测试策略、模块架构、CI/CD 和生产模式。
🎯 适合人群
🤖 AI 吐槽: “这个技能就像一个经验丰富的 DevOps 工程师,总是不停地谈论最佳实践,但至少它们都是正确的。”
该技能推荐使用安全扫描工具(trivy、checkov),但未强制要求使用。主要风险是用户可能遵循架构模式却未实施推荐的安全扫描,导致不安全的 IaC 部署。缓解措施:始终将安全扫描作为强制关卡集成到 CI/CD 流水线中。
name: terraform-skill description: Use when working with Terraform or OpenTofu - creating modules, writing tests (native test framework, Terratest), setting up CI/CD pipelines, reviewing configurations, choosing between testing approaches, debugging state issues, implementing security scanning (trivy, checkov), or making infrastructure-as-code architecture decisions license: Apache-2.0 metadata: author: Anton Babenko version: 1.5.0
Terraform Skill for Claude
Comprehensive Terraform and OpenTofu guidance covering testing, modules, CI/CD, and production patterns. Based on terraform-best-practices.com and enterprise experience.
When to Use This Skill
Activate this skill when:
- Creating new Terraform or OpenTofu configurations or modules
- Setting up testing infrastructure for IaC code
- Deciding between testing approaches (validate, plan, frameworks)
- Structuring multi-environment deployments
- Implementing CI/CD for infrastructure-as-code
- Reviewing or refactoring existing Terraform/OpenTofu projects
- Choosing between module patterns or state management approaches
Don't use this skill for:
- Basic Terraform/OpenTofu syntax questions (Claude knows this)
- Provider-specific API reference (link to docs instead)
- Cloud platform questions unrelated to Terraform/OpenTofu
Core Principles
1. Code Structure Philosophy
Module Hierarchy:
| Type | When to Use | Scope | |------|-------------|-------| | Resource Module | Single logical group of connected resources | VPC + subnets, Security group + rules | | Infrastructure Module | Collection of resource modules for a purpose | Multiple resource modules in one region/account | | Composition | Complete infrastructure | Spans multiple regions/accounts |
Hierarchy: Resource → Resource Module → Infrastructure Module → Composition
Directory Structure:
environments/ # Environment-specific configurations
├── prod/
├── staging/
└── dev/
modules/ # Reusable modules
├── networking/
├── compute/
└── data/
examples/ # Module usage examples (also serve as tests)
├── complete/
└── minimal/
Key principle from terraform-best-practices.com:
- Separate environments (prod, staging) from modules (reusable components)
- Use examples/ as both documentation and integration test fixtures
- Keep modules small and focused (single responsibility)
For detailed module architecture, see: Code Patterns: Module Types & Hierarchy
2. Naming Conventions
Resources:
# Good: Descriptive, contextual resource "aws_instance" "web_server" { } resource "aws_s3_bucket" "application_logs" { } # Good: "this" for singleton resources (only one of that type) resource "aws_vpc" "this" { } resource "aws_security_group" "this" { } # Avoid: Generic names for non-singletons resource "aws_instance" "main" { } resource "aws_s3_bucket" "bucket" { }
Singleton Resources:
Use "this" when your module creates only one resource of that type:
✅ DO:
resource "aws_vpc" "this" {} # Module creates one VPC resource "aws_security_group" "this" {} # Module creates one SG
❌ DON'T use "this" for multiple resources:
resource "aws_subnet" "this" {} # If creating multiple subnets
Use descriptive names when creating multiple resources of the same type.
Variables:
# Prefix with context when needed var.vpc_cidr_block # Not just "cidr" var.database_instance_class # Not just "instance_class"
Files:
main.tf- Primary resourcesvariables.tf- Input variablesoutputs.tf- Output valuesversions.tf- Provider versionsdata.tf- Data sources (optional)
Testing Strategy Framework
Decision Matrix: Which Testing Approach?
| Your Situation | Recommended Approach | Tools | Cost |
|----------------|---------------------|-------|------|
| Quick syntax check | Static analysis | terraform validate, fmt | Free |
| Pre-commit validation | Static + lint | validate, tflint, trivy, checkov | Free |
| Terraform 1.6+, simple logic | Native test framework | Built-in terraform test | Free-Low |
| Pre-1.6, or Go expertise | Integration testing | Terratest | Low-Med |
| Security/compliance focus | Policy as code | OPA, Sentinel | Free |
| Cost-sensitive workflow | Mock providers (1.7+) | Native tests + mocking | Free |
| Multi-cloud, complex | Full integration | Terratest + real infra | Med-High |
Testing Pyramid for Infrastructure
/\
/ \ End-to-End Tests (Expensive)
/____\ - Full environment deployment
/ \ - Production-like setup
/________\
/ \ Integration Tests (Moderate)
/____________\ - Module testing in isolation
/ \ - Real resources in test account
/________________\ Static Analysis (Cheap)
- validate, fmt, lint
- Security scanning
Native Test Best Practices (1.6+)
Before generating test code:
-
Validate schemas with Terraform MCP:
Search provider docs → Get resource schema → Identify block types -
Choose correct command mode:
command = plan- Fast, for input validationcommand = apply- Required for computed values and set-type blocks
-
Handle set-type blocks correctly:
- Cannot index with
[0] - Use
forexpressions to iterate - Or use
command = applyto materialize
- Cannot index with
Common patterns:
- S3 encryption rules: set (use for expressions)
- Lifecycle transitions: set (use for expressions)
- IAM policy statements: set (use for expressions)
For detailed testing guides, see:
- Testing Frameworks Guide - Deep dive into static analysis, native tests, and Terratest
- Quick Reference - Decision flowchart and command cheat sheet
Code Structure Standards
Resource Block Ordering
Strict ordering for consistency:
countorfor_eachFIRST (blank line after)- Other arguments
tagsas last real argumentdepends_onafter tags (if needed)lifecycleat the very end (if needed)
# ✅ GOOD - Correct ordering resource "aws_nat_gateway" "this" { count = var.create_nat_gateway ? 1 : 0 allocation_id = aws_eip.this[0].id subnet_id = aws_subnet.public[0].id tags = { Name = "${var.name}-nat" } depends_on = [aws_internet_gateway.this] lifecycle { create_before_destroy = true } }
Variable Block Ordering
description(ALWAYS required)typedefaultvalidationnullable(when setting to false)
variable "environment" { description = "Environment name for resource tagging" type = string default = "dev" validation { condition = contains(["dev", "staging", "prod"], var.environment) error_message = "Environment must be one of: dev, staging, prod." } nullable = false }
For complete structure guidelines, see: Code Patterns: Block Ordering & Structure
Count vs For_Each: When to Use Each
Quick Decision Guide
| Scenario | Use | Why |
|----------|-----|-----|
| Boolean condition (create or don't) | count = condition ? 1 : 0 | Simple on/off toggle |
| Simple numeric replication | count = 3 | Fixed number of identical resources |
| Items may be reordered/removed | for_each = toset(list) | Stable resource addresses |
| Reference by key | for_each = map | Named access to resources |
| Multiple named resources | for_each | Better maintainability |
Common Patterns
Boolean conditions:
# ✅ GOOD - Boolean condition resource "aws_nat_gateway" "this" { count = var.create_nat_gateway ? 1 : 0 # ... }
Stable addressing with for_each:
# ✅ GOOD - Removing "us-east-1b" only affects that subnet resource "aws_subnet" "private" { for_each = toset(var.availability_zones) availability_zone = each.key # ... } # ❌ BAD - Removing middle AZ recreates all subsequent subnets resource "aws_subnet" "private" { count = length(var.availability_zones) availability_zone = var.availability_zones[count.index] # ... }
For migration guides and detailed examples, see: Code Patterns: Count vs For_Each
Locals for Dependency Management
Use locals to ensure correct resource deletion order:
# Problem: Subnets might be deleted after CIDR blocks, causing errors # Solution: Use try() in locals to hint deletion order locals { # References secondary CIDR first, falling back to VPC # Forces Terraform to delete subnets before CIDR association vpc_id = try( aws_vpc_ipv4_cidr_block_association.this[0].vpc_id, aws_vpc.this.id, "" ) } resource "aws_vpc" "this" { cidr_block = "10.0.0.0/16" } resource "aws_vpc_ipv4_cidr_block_association" "this" { count = var.add_secondary_cidr ? 1 : 0 vpc_id = aws_vpc.this.id cidr_block = "10.1.0.0/16" } resource "aws_subnet" "public" { vpc_id = local.vpc_id # Uses local, not direct reference cidr_block = "10.1.0.0/24" }
Why this matters:
- Prevents deletion errors when destroying infrastructure
- Ensures correct dependency order without explicit
depends_on - Particularly useful for VPC configurations with secondary CIDR blocks
For detailed examples, see: Code Patterns: Locals for Dependency Management
Module Development
Standard Module Structure
my-module/
├── README.md # Usage documentation
├── main.tf # Primary resources
├── variables.tf # Input variables with descriptions
├── outputs.tf # Output values
├── versions.tf
优点
- 全面覆盖真实世界的 IaC 挑战
- 清晰的测试和架构决策框架
- 基于 terraform-best-practices.com 的成熟最佳实践
- 提供实用的示例并突出反模式
缺点
- 假设用户具备中高级 Terraform 知识
- 对初学者可能过于复杂
- 严重依赖 AWS 示例
- 需要用户在多个参考文档间导航
相关技能
免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。
版权归原作者所有 antonbabenko.
