Terraform IaC 从入门到精通系列文档

面向 DevOps / SRE / 云架构师的基础设施即代码(IaC)实战指南 基于 Terraform v1.5+,覆盖 AWS / Azure / GCP / 阿里云多云场景


01-terraform-intro

什么是基础设施即代码(IaC)?

IaC(Infrastructure as Code)是一种通过代码定义和管理基础设施的方法,将服务器、网络、存储等资源声明为配置文件,实现:

  • 版本控制(Git 管理变更)
  • 可重复部署(消除“雪花服务器”)
  • 自动化(CI/CD 集成)
  • 审计与合规

为什么选择 Terraform?

工具 优点 缺点
Terraform 多云支持、声明式、状态管理、模块化 学习曲线中等
CloudFormation AWS 原生、深度集成 仅限 AWS
Pulumi 支持通用语言(Python/Go) 运行时依赖复杂
Ansible 无代理、适合配置管理 非声明式,状态难管理

💡 Terraform 核心优势Provider 生态(3000+ 云/服务支持)

核心概念

  • Provider:云厂商插件(如 aws, azurerm
  • Resource:基础设施单元(如 aws_instance
  • State:记录已创建资源的元数据(.tfstate
  • Plan/Apply:预览 → 执行变更

02-install-and-cli

安装 Terraform

Linux (APT)

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform

macOS

brew tap hashicorp/tap
brew install hashicorp/tap/terraform

验证

terraform version
# Terraform v1.6.0

核心 CLI 命令

命令 作用
terraform init 初始化工作目录(下载 Provider)
terraform fmt 格式化代码
terraform validate 验证语法
terraform plan 预览变更(不执行)
terraform apply 应用变更
terraform destroy 销毁所有资源
terraform state list 查看当前状态

💡 最佳实践:始终先 planapply


03-hello-world-aws

前提条件

  • AWS 账号 + IAM 用户(具备 EC2 权限)
  • 配置 AWS CLI:aws configure

步骤 1:创建 main.tf

provider "aws" {
  region = "us-east-1"
}

resource "aws_instance" "web" {
  ami           = "ami-0c02fb55956c7d316" # Amazon Linux 2
  instance_type = "t3.micro"

  tags = {
    Name = "HelloWorld"
  }
}

步骤 2:初始化并部署

terraform init
terraform plan
terraform apply

步骤 3:验证

aws ec2 describe-instances --filters "Name=tag:Name,Values=HelloWorld"

步骤 4:清理

terraform destroy

⚠️ 注意:不要在生产账号直接操作!使用沙箱账号。


04-configuration-syntax

HCL(HashiCorp Configuration Language)基础

变量定义

variable "region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

资源引用

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_subnet" "public" {
  vpc_id     = aws_vpc.main.id  # 引用 VPC ID
  cidr_block = "10.0.1.0/24"
}

表达式与函数

# 条件表达式
instance_type = var.env == "prod" ? "m5.large" : "t3.micro"

# 函数
tags = merge(var.common_tags, { Name = "web-${var.env}" })

for_each 循环

locals {
  subnets = ["10.0.1.0/24", "10.0.2.0/24"]
}

resource "aws_subnet" "public" {
  for_each = toset(local.subnets)
  vpc_id   = aws_vpc.main.id
  cidr_block = each.value
}

💡 提示:避免 count,优先使用 for_each(更稳定)


05-state-management

什么是 State?

  • 记录实际创建的资源配置的映射关系
  • 默认存储为 terraform.tfstate(JSON 格式)

远程后端(Remote Backend)

避免本地 state 丢失,支持团队协作。

AWS S3 + DynamoDB(推荐)

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/web/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

🔒 DynamoDB 表用于状态锁定,防止并发冲突

State 操作技巧

# 导入现有资源
terraform import aws_instance.web i-1234567890abcdef0

# 移动资源(重构)
terraform state mv aws_instance.web module.web.aws_instance.main

# 手动修复(慎用!)
terraform state rm aws_instance.broken

06-modules

为什么需要模块?

  • 复用:一次编写,多处调用
  • 抽象:隐藏复杂性(如 VPC 创建细节)
  • 标准化:团队统一最佳实践

创建模块目录结构

modules/
└── vpc/
    ├── main.tf
    ├── variables.tf
    └── outputs.tf

模块示例(modules/vpc/main.tf

resource "aws_vpc" "this" {
  cidr_block = var.cidr_block
}

output "vpc_id" {
  value = aws_vpc.this.id
}

调用模块

module "prod_vpc" {
  source     = "./modules/vpc"
  cidr_block = "10.0.0.0/16"
}

# 引用输出
resource "aws_subnet" "public" {
  vpc_id = module.prod_vpc.vpc_id
}

发布模块

  • 本地路径:./modules/vpc
  • Git:github.com/org/terraform-aws-vpc?ref=v1.0.0
  • Terraform Registry:terraform-aws-modules/vpc/aws

👉 下一步变量与输出管理


07-variables-and-outputs

变量类型

variable "instance_count" {
  type = number
}

variable "tags" {
  type = map(string)
  default = {}
}

variable "db_password" {
  type      = string
  sensitive = true  # 不在 plan/apply 中显示
}

输出定义

output "public_ip" {
  value       = aws_instance.web.public_ip
  description = "Web server public IP"
}

敏感数据安全

tfvars 文件

# prod.tfvars
region = "us-west-2"
instance_type = "m5.large"

调用:

terraform apply -var-file="prod.tfvars"

08-multi-cloud-strategy

统一多云管理

# providers.tf
provider "aws" {
  region = "us-east-1"
  alias  = "us_east"
}

provider "azurerm" {
  features {}
  alias = "eastus"
}

# resources.tf
module "aws_web" {
  source   = "./modules/web"
  providers = { aws = aws.us_east }
}

module "azure_web" {
  source    = "./modules/azure-web"
  providers = { azurerm = azurerm.eastus }
}

抽象层设计(推荐)

  • 创建统一接口模块,屏蔽云差异
  • 示例:module "database" 内部根据 cloud 参数选择 RDS 或 Azure SQL

💡 挑战:网络、安全组、IAM 等无法完全抽象,需谨慎设计


09-on-prem-with-terraform

支持的本地平台

  • VMware vSphere (vsphere provider)
  • Proxmox (proxmox provider)
  • Libvirt (libvirt provider)
  • Custom scripts (null_resource + local-exec)

示例:Proxmox VM

provider "proxmox" {
  pm_api_url      = "https://proxmox.example.com:8006/api2/json"
  pm_user         = "terraform@pve"
  pm_password     = var.proxmox_password
  pm_tls_insecure = true
}

resource "proxmox_vm_qemu" "web" {
  name        = "web-server"
  target_node = "pve-node1"
  clone       = "ubuntu-template"
  cores       = 2
  memory      = 2048
}

⚠️ 注意:本地环境需确保 API 可访问、凭证安全


10-secrets-management

安全原则

  • 绝不提交密钥到 Git
  • 最小权限原则
  • 自动轮换

集成方案

AWS SSM Parameter Store

data "aws_ssm_parameter" "db_password" {
  name = "/prod/db/password"
}

resource "aws_rds_cluster" "main" {
  master_password = data.aws_ssm_parameter.db_password.value
}

HashiCorp Vault

provider "vault" {
  address = "https://vault.example.com"
}

data "vault_generic_secret" "db" {
  path = "secret/data/prod/db"
}

resource "aws_rds_cluster" "main" {
  master_password = data.vault_generic_secret.db.data["password"]
}

Terraform Cloud/Enterprise

使用 Variables 功能标记为 Sensitive


11-remote-backend-best-practices

S3 + DynamoDB 后端完整配置

1. 创建 S3 Bucket(启用版本控制 + 加密)

aws s3api create-bucket --bucket my-terraform-state --region us-east-1
aws s3api put-bucket-versioning --bucket my-terraform-state --versioning-configuration Status=Enabled
aws s3api put-bucket-encryption --bucket my-terraform-state --server-side-encryption-configuration '{
  "Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]
}'

2. 创建 DynamoDB 表

aws dynamodb create-table \
  --table-name terraform-locks \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

3. 配置 backend

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "global/s3/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

🔐 IAM 权限最小化:仅授予 s3:GetObject, s3:PutObject, dynamodb:GetItem 等必要权限


12-ci-cd-integration

GitHub Actions 示例

name: Terraform
on:
  push:
    branches: [ main ]

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3

      - name: Terraform Init
        run: terraform init

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        run: terraform plan -out=tfplan
        env:
          AWS_ACCESS_KEY_ID: $
          AWS_SECRET_ACCESS_KEY: $

      - name: Terraform Apply
        if: github.ref == 'refs/heads/main'
        run: terraform apply -auto-approve tfplan

审批流程(企业级)

  • 使用 Terraform CloudRun Triggers + Manual Approval
  • 或在 CI 中暂停等待人工确认

13-testing-terraform

测试金字塔

  • 单元测试:验证 HCL 逻辑(terraform validate + checkov
  • 集成测试:部署到临时环境验证(Terratest)
  • 合规测试:扫描安全策略(OPA/Conftest)

Terratest 示例(Go)

package test

import (
  "testing"
  "github.com/gruntwork-io/terratest/modules/terraform"
)

func TestTerraformHelloWorld(t *testing.T) {
  terraformOptions := &terraform.Options{
    TerraformDir: "../examples/hello-world",
  }

  defer terraform.Destroy(t, terraformOptions)
  terraform.InitAndApply(t, terraformOptions)

  // 验证输出
  output := terraform.Output(t, terraformOptions, "public_ip")
  assert.NotEmpty(t, output)
}

🧪 运行go test -v .


14-workspaces-and-environments

Workspaces vs 目录分离

方式 优点 缺点
Workspaces 单一代码库 难以差异化配置
目录分离 环境完全独立 代码重复

推荐:目录分离 + 模块复用

environments/
├── dev/
│   ├── main.tf
│   └── terraform.tfvars
├── staging/
└── prod/

Workspace 适用场景

  • 临时测试环境(terraform workspace new test-123
  • 多租户 SaaS(每个客户一个 workspace)

15-policy-as-code

使用 Open Policy Agent (OPA)

1. 定义策略(policy.rego

package terraform

deny[msg] {
  input.resource_changes[_].change.actions[_] == "delete"
  input.resource_changes[_].type == "aws_s3_bucket"
  msg := "Deleting S3 buckets is not allowed"
}

2. 扫描计划

terraform show -json tfplan > plan.json
conftest test plan.json -p policy.rego

Terraform Sentinel(企业版)

  • 在 Terraform Cloud 中强制执行策略
  • 支持更复杂的逻辑(如成本控制)

A1-cheat-sheet

常用命令

# 初始化
terraform init

# 格式化
terraform fmt -recursive

# 验证
terraform validate

# 预览
terraform plan -var="env=prod"

# 应用
terraform apply -auto-approve

# 销毁
terraform destroy -auto-approve

# 查看状态
terraform state list
terraform state show aws_instance.web

快速调试

TF_LOG=DEBUG terraform apply  # 输出详细日志

A2-troubleshooting

常见错误

1. Error: Invalid for_each argument

  • 原因for_each 的集合在 plan 和 apply 阶段不一致
  • 解决:确保依赖资源已创建,或使用 depends_on

2. BucketRegionError: incorrect region

  • 原因:S3 bucket 在不同 region
  • 解决:在 backend 配置中指定正确 region

3. ResourceInUse: resource is in use

  • 原因:资源被其他服务依赖(如 EIP 绑定到实例)
  • 解决:先解绑,或使用 lifecycle { ignore_changes = [...] }

4. State 锁定冲突

  • 解决terraform force-unlock <LOCK_ID>

🆘 终极手段:手动编辑 state(terraform state pull → 修改 → push),极度危险!

results matching ""

    No results matching ""