The Problem with Clicking
You click through the AWS console to set up a VPC, a few subnets, security groups, an RDS instance. It works. You document the steps in a Confluence page.
Six months later, you need a staging environment. You follow the Confluence doc. It's 30% out of date. You end up with a staging environment that's subtly different from production. You spend two days debugging a prod issue that didn't reproduce in staging.
Infrastructure as Code (IaC) solves this. Your infrastructure is a file. You version it. You review it. You apply it the same way every time.
The Main Options
Terraform: The most widely adopted IaC tool. Supports every major cloud and hundreds of services. Uses its own declarative language (HCL). The state file is both its superpower and its biggest footgun.
AWS CDK / Pulumi: Write infrastructure in a real programming language (TypeScript, Python). More flexible than Terraform for complex logic. Better for teams already deep in one cloud.
Ansible: Configuration management more than provisioning. Great for managing what's running on servers, less ideal for creating cloud resources.
For most teams starting out, Terraform is the right choice. It has the largest ecosystem and the most tutorials.
Your First Terraform Configuration
Here's a minimal but real example: a private S3 bucket with versioning enabled.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
}
variable "aws_region" {
default = "us-east-1"
}
variable "bucket_name" {
type = string
}
resource "aws_s3_bucket" "main" {
bucket = var.bucket_name
}
resource "aws_s3_bucket_versioning" "main" {
bucket = aws_s3_bucket.main.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_public_access_block" "main" {
bucket = aws_s3_bucket.main.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
Apply it:
terraform init # downloads the AWS provider
terraform plan # shows what will be created
terraform apply # creates the resources
Managing State Remotely
Terraform tracks what it created in a state file. By default this is local, which means you can't collaborate and you'll lose it if your laptop dies.
Always store state remotely:
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "production/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
The DynamoDB table prevents two people from running terraform apply at the same time and corrupting the state.
The IaC Workflow That Works
- Write your change in
.tffiles - Open a PR: treat infrastructure changes like code changes
terraform planin CI: post the plan output to the PR for review- Review the plan, not just the diff. The plan tells you what will actually happen.
- Merge → apply: automate
terraform applyon merge to main - Never apply manually from your laptop in production
Step 4 is the one teams skip. The HCL diff shows what changed in your code. The plan output shows what will change in your infrastructure. They're not the same thing, and both matter.
What to Codify First
Start small:
- DNS records
- S3 buckets and IAM policies
- Security groups
- Environment variables / Parameter Store entries
Save the complex stuff (networking, RDS, EKS clusters) for after you're comfortable with the workflow. The patterns are the same; the blast radius of mistakes is larger.
The Payoff
Once your infrastructure is code, you get things that seemed hard:
- New environments in minutes, not days
- Disaster recovery that actually works
- Infrastructure changes reviewed like code, not communicated via Slack
- Automated security scanning of resource configurations
The upfront investment is a few days. The compounding return is years.