WordPress Infrastructure as Code with Terraform: Provisioning Production-Ready AWS Architecture
Running WordPress on a single server works until it does not. The moment traffic spikes, a disk fills up, or a PHP process hangs, that lone instance becomes the single point of failure your business never planned for. The answer is not to manually provision more servers and hope the configuration stays consistent. The answer is to treat your infrastructure like software: versioned, tested, repeatable, and reviewed by your team before any change hits production.
Terraform, created by HashiCorp, gives you exactly that capability. You declare what your infrastructure should look like in HCL (HashiCorp Configuration Language), and Terraform figures out how to make reality match your declaration. For WordPress specifically, this means you can spin up an entire production environment on AWS with a single command: load balancers, auto-scaling application servers, managed databases with failover, Redis caching layers, shared file systems, CDN distributions, and secrets management. Every piece is documented in code, and every change goes through version control.
This article walks through building a production-grade AWS architecture for WordPress using Terraform. We will cover module design, networking, compute, databases, caching, file storage, CDN integration, secrets management, state handling, and cost optimization. Every Terraform block is annotated and explained. By the end, you will have a complete configuration you can adapt for your own deployments.
Why Infrastructure as Code for WordPress
WordPress powers over 40% of the web, yet a surprising number of production WordPress installations still run on manually configured servers. An engineer SSHs in, installs packages, edits configuration files, and the server works. Six months later, nobody remembers exactly what was done or why. When the server needs replacing, the team reverse-engineers the setup from memory and scattered notes.
Infrastructure as Code eliminates this problem entirely. Your Terraform files become the single source of truth. New team members read the code to understand the architecture. Changes go through pull requests with peer review. Rolling back a bad change means reverting a commit. Spinning up a staging environment identical to production takes minutes instead of days.
For WordPress workloads specifically, IaC solves several pain points that are unique to the platform:
- Media uploads need shared storage. When you scale horizontally, uploaded files must be accessible from every application server. EFS handles this, but configuring it manually across multiple instances is error-prone.
- Database connections need careful management. WordPress uses persistent connections by default, and auto-scaling can overwhelm an RDS instance if connection limits are not planned correctly.
- Object caching transforms performance. Redis or Memcached can cut database queries by 80% or more, but the caching layer must be provisioned with the right instance type and memory allocation.
- SSL termination and CDN configuration interact. CloudFront, the Application Load Balancer, and WordPress all need to agree on protocol handling, or you end up with redirect loops.
Terraform lets you solve all of these problems once, encode the solutions in version-controlled files, and replicate them across environments without drift.
Terraform Module Design for WordPress
A well-structured Terraform project uses modules to group related resources. Each module handles one concern, accepts inputs through variables, and exposes outputs that other modules can reference. For a WordPress deployment on AWS, the module layout looks like this:
terraform-wordpress/
├── main.tf # Root module, composes all child modules
├── variables.tf # Top-level input variables
├── outputs.tf # Top-level outputs (URLs, endpoints)
├── terraform.tfvars # Environment-specific values
├── backend.tf # Remote state configuration
├── modules/
│ ├── networking/ # VPC, subnets, route tables, NAT
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── security/ # Security groups, NACLs
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── compute/ # Launch templates, ASG, ALB
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── user_data.sh
│ ├── database/ # RDS MySQL, read replicas
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── cache/ # ElastiCache Redis
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── storage/ # EFS for wp-content/uploads
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── cdn/ # CloudFront distribution
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ └── secrets/ # AWS Secrets Manager, SSM params
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
This separation means your database team can review changes to the database module without wading through networking code. Your security team can audit the security module independently. And when AWS releases a new RDS engine version, you update one module and the change propagates cleanly.
The Root Module
The root module ties everything together. It calls each child module and passes outputs between them. Here is the skeleton:
# main.tf - Root module
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Project = "wordpress"
Environment = var.environment
ManagedBy = "terraform"
}
}
}
module "networking" {
source = "./modules/networking"
environment = var.environment
vpc_cidr = var.vpc_cidr
availability_zones = var.availability_zones
}
module "security" {
source = "./modules/security"
vpc_id = module.networking.vpc_id
environment = var.environment
}
module "secrets" {
source = "./modules/secrets"
environment = var.environment
db_password = var.db_password
}
module "database" {
source = "./modules/database"
environment = var.environment
vpc_id = module.networking.vpc_id
private_subnet_ids = module.networking.private_subnet_ids
db_security_group_id = module.security.db_security_group_id
db_password_arn = module.secrets.db_password_arn
}
module "cache" {
source = "./modules/cache"
environment = var.environment
private_subnet_ids = module.networking.private_subnet_ids
cache_security_group_id = module.security.cache_security_group_id
}
module "storage" {
source = "./modules/storage"
environment = var.environment
vpc_id = module.networking.vpc_id
private_subnet_ids = module.networking.private_subnet_ids
efs_security_group_id = module.security.efs_security_group_id
}
module "compute" {
source = "./modules/compute"
environment = var.environment
vpc_id = module.networking.vpc_id
public_subnet_ids = module.networking.public_subnet_ids
private_subnet_ids = module.networking.private_subnet_ids
app_security_group_id = module.security.app_security_group_id
alb_security_group_id = module.security.alb_security_group_id
db_endpoint = module.database.primary_endpoint
db_password_arn = module.secrets.db_password_arn
redis_endpoint = module.cache.redis_endpoint
efs_id = module.storage.efs_id
instance_type = var.instance_type
ami_id = var.ami_id
min_size = var.asg_min_size
max_size = var.asg_max_size
desired_capacity = var.asg_desired_capacity
}
module "cdn" {
source = "./modules/cdn"
environment = var.environment
alb_dns_name = module.compute.alb_dns_name
domain_name = var.domain_name
certificate_arn = var.certificate_arn
}
Notice how each module receives only the information it needs. The database module gets subnet IDs and a security group but never sees the compute configuration. This principle of least privilege applies to your Terraform code just as it applies to IAM policies.
VPC, Subnets, and Security Groups
The networking foundation determines everything that follows. A poorly designed VPC creates problems that cascade through every layer of the stack. For WordPress on AWS, you need public subnets for the Application Load Balancer, private subnets for the application servers and database, and NAT Gateways so private instances can reach the internet for package updates.
VPC and Subnet Configuration
# modules/networking/main.tf
resource "aws_vpc" "wordpress" {
cidr_block = var.vpc_cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = "${var.environment}-wordpress-vpc"
}
}
# Public subnets - one per AZ, for ALB and NAT Gateways
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.wordpress.id
cidr_block = cidrsubnet(var.vpc_cidr, 4, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.environment}-public-${var.availability_zones[count.index]}"
Tier = "public"
}
}
# Private subnets - one per AZ, for EC2 instances, RDS, ElastiCache
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.wordpress.id
cidr_block = cidrsubnet(var.vpc_cidr, 4, count.index + length(var.availability_zones))
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-private-${var.availability_zones[count.index]}"
Tier = "private"
}
}
# Internet Gateway for public subnets
resource "aws_internet_gateway" "wordpress" {
vpc_id = aws_vpc.wordpress.id
tags = {
Name = "${var.environment}-wordpress-igw"
}
}
# Elastic IP for NAT Gateway
resource "aws_eip" "nat" {
count = 1 # Single NAT for cost savings; use count = length(var.availability_zones) for HA
domain = "vpc"
tags = {
Name = "${var.environment}-nat-eip"
}
}
# NAT Gateway in the first public subnet
resource "aws_nat_gateway" "wordpress" {
count = 1
allocation_id = aws_eip.nat[0].id
subnet_id = aws_subnet.public[0].id
tags = {
Name = "${var.environment}-nat-gw"
}
depends_on = [aws_internet_gateway.wordpress]
}
# Route table for public subnets - routes to Internet Gateway
resource "aws_route_table" "public" {
vpc_id = aws_vpc.wordpress.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.wordpress.id
}
tags = {
Name = "${var.environment}-public-rt"
}
}
# Route table for private subnets - routes to NAT Gateway
resource "aws_route_table" "private" {
vpc_id = aws_vpc.wordpress.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.wordpress[0].id
}
tags = {
Name = "${var.environment}-private-rt"
}
}
# Associate public subnets with public route table
resource "aws_route_table_association" "public" {
count = length(var.availability_zones)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
# Associate private subnets with private route table
resource "aws_route_table_association" "private" {
count = length(var.availability_zones)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private.id
}
The cidrsubnet function automatically carves the VPC CIDR into smaller blocks. With a /16 VPC (10.0.0.0/16) and a newbits value of 4, each subnet gets a /20 block, providing 4,091 usable IP addresses per subnet. That is more than enough for even large WordPress deployments.
A single NAT Gateway keeps costs down for development and staging environments. For production, you should deploy one NAT Gateway per Availability Zone so that a zone failure does not cut off internet access for instances in surviving zones. Change the count parameter and adjust the route tables accordingly.
Security Groups
Security groups act as virtual firewalls for each tier. The key principle: only allow traffic that has a legitimate reason to flow between components.
# modules/security/main.tf
# ALB Security Group - accepts HTTP/HTTPS from anywhere
resource "aws_security_group" "alb" {
name_prefix = "${var.environment}-alb-"
description = "Security group for WordPress ALB"
vpc_id = var.vpc_id
ingress {
description = "HTTP from anywhere"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS from anywhere"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
tags = {
Name = "${var.environment}-alb-sg"
}
}
# Application Security Group - accepts traffic only from ALB
resource "aws_security_group" "app" {
name_prefix = "${var.environment}-app-"
description = "Security group for WordPress application servers"
vpc_id = var.vpc_id
ingress {
description = "HTTP from ALB"
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
ingress {
description = "Health check from ALB"
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
tags = {
Name = "${var.environment}-app-sg"
}
}
# Database Security Group - accepts MySQL only from app servers
resource "aws_security_group" "db" {
name_prefix = "${var.environment}-db-"
description = "Security group for WordPress RDS"
vpc_id = var.vpc_id
ingress {
description = "MySQL from application servers"
from_port = 3306
to_port = 3306
protocol = "tcp"
security_groups = [aws_security_group.app.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
tags = {
Name = "${var.environment}-db-sg"
}
}
# ElastiCache Security Group - accepts Redis only from app servers
resource "aws_security_group" "cache" {
name_prefix = "${var.environment}-cache-"
description = "Security group for WordPress ElastiCache"
vpc_id = var.vpc_id
ingress {
description = "Redis from application servers"
from_port = 6379
to_port = 6379
protocol = "tcp"
security_groups = [aws_security_group.app.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
tags = {
Name = "${var.environment}-cache-sg"
}
}
# EFS Security Group - accepts NFS only from app servers
resource "aws_security_group" "efs" {
name_prefix = "${var.environment}-efs-"
description = "Security group for WordPress EFS"
vpc_id = var.vpc_id
ingress {
description = "NFS from application servers"
from_port = 2049
to_port = 2049
protocol = "tcp"
security_groups = [aws_security_group.app.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
tags = {
Name = "${var.environment}-efs-sg"
}
}
Every security group references other security groups rather than CIDR blocks for internal traffic. This is intentional. When an auto-scaling group launches a new instance, that instance automatically inherits the correct access rules through its security group membership. You never need to update firewall rules when instances come and go.
The create_before_destroy lifecycle rule prevents downtime during security group updates. Terraform creates the new group, attaches it to resources, and only then deletes the old group.
Auto-Scaling Groups with Custom AMIs
Running WordPress on a single EC2 instance limits you to vertical scaling: bigger instance, bigger cost, same single point of failure. Auto-scaling groups let you scale horizontally. You define a launch template that describes your ideal WordPress server, set minimum and maximum instance counts, and let AWS handle the rest.
Building Custom AMIs with Packer
Before Terraform can launch instances, you need an Amazon Machine Image that has WordPress, PHP, Nginx, and all dependencies pre-installed. Building this with Packer (another HashiCorp tool) keeps your AMI creation reproducible. Here is a condensed Packer template:
# wordpress-ami.pkr.hcl
source "amazon-ebs" "wordpress" {
ami_name = "wordpress-${formatdate("YYYYMMDD-hhmm", timestamp())}"
instance_type = "t3.medium"
region = "us-east-1"
source_ami_filter {
filters = {
name = "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"
root-device-type = "ebs"
virtualization-type = "hvm"
}
owners = ["099720109477"] # Canonical
most_recent = true
}
ssh_username = "ubuntu"
}
build {
sources = ["source.amazon-ebs.wordpress"]
provisioner "shell" {
inline = [
"sudo apt-get update",
"sudo apt-get install -y nginx php8.1-fpm php8.1-mysql php8.1-redis php8.1-curl php8.1-gd php8.1-xml php8.1-mbstring php8.1-zip php8.1-intl nfs-common amazon-efs-utils",
"sudo systemctl enable nginx php8.1-fpm",
"cd /var/www && sudo wget https://wordpress.org/latest.tar.gz",
"sudo tar -xzf latest.tar.gz -C /var/www/html --strip-components=1",
"sudo chown -R www-data:www-data /var/www/html",
]
}
provisioner "file" {
source = "configs/nginx-wordpress.conf"
destination = "/tmp/wordpress.conf"
}
provisioner "shell" {
inline = [
"sudo mv /tmp/wordpress.conf /etc/nginx/sites-available/wordpress",
"sudo ln -sf /etc/nginx/sites-available/wordpress /etc/nginx/sites-enabled/",
"sudo rm -f /etc/nginx/sites-enabled/default",
]
}
}
The AMI bakes in everything that does not change between deployments: the operating system, web server, PHP runtime, and WordPress core files. Configuration that varies by environment (database credentials, Redis endpoints, EFS mount points) gets injected at boot time through the launch template’s user data script.
Launch Template and Auto-Scaling Group
# modules/compute/main.tf
# Application Load Balancer
resource "aws_lb" "wordpress" {
name = "${var.environment}-wordpress-alb"
internal = false
load_balancer_type = "application"
security_groups = [var.alb_security_group_id]
subnets = var.public_subnet_ids
enable_deletion_protection = var.environment == "production" ? true : false
tags = {
Name = "${var.environment}-wordpress-alb"
}
}
# Target group for WordPress instances
resource "aws_lb_target_group" "wordpress" {
name = "${var.environment}-wp-tg"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
path = "/wp-login.php"
port = "traffic-port"
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 10
interval = 30
matcher = "200,302"
}
stickiness {
type = "lb_cookie"
cookie_duration = 3600
enabled = true
}
tags = {
Name = "${var.environment}-wp-tg"
}
}
# HTTPS listener (primary)
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.wordpress.arn
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = var.certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.wordpress.arn
}
}
# HTTP listener - redirect to HTTPS
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.wordpress.arn
port = "80"
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}
# IAM role for EC2 instances
resource "aws_iam_role" "wordpress" {
name = "${var.environment}-wordpress-instance-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
}
# Policy: allow reading secrets from Secrets Manager and SSM
resource "aws_iam_role_policy" "secrets_access" {
name = "${var.environment}-secrets-access"
role = aws_iam_role.wordpress.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue",
"ssm:GetParameter",
"ssm:GetParameters"
]
Resource = [
var.db_password_arn,
"arn:aws:ssm:*:*:parameter/${var.environment}/wordpress/*"
]
}
]
})
}
# Attach SSM managed policy for Session Manager access (replaces SSH)
resource "aws_iam_role_policy_attachment" "ssm" {
role = aws_iam_role.wordpress.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
resource "aws_iam_instance_profile" "wordpress" {
name = "${var.environment}-wordpress-instance-profile"
role = aws_iam_role.wordpress.name
}
# Launch template defines how each instance is configured
resource "aws_launch_template" "wordpress" {
name_prefix = "${var.environment}-wordpress-"
image_id = var.ami_id
instance_type = var.instance_type
vpc_security_group_ids = [var.app_security_group_id]
iam_instance_profile {
arn = aws_iam_instance_profile.wordpress.arn
}
# User data script runs on every boot
user_data = base64encode(templatefile("${path.module}/user_data.sh", {
environment = var.environment
db_endpoint = var.db_endpoint
db_password_arn = var.db_password_arn
redis_endpoint = var.redis_endpoint
efs_id = var.efs_id
aws_region = data.aws_region.current.name
}))
monitoring {
enabled = true
}
metadata_options {
http_endpoint = "enabled"
http_tokens = "required" # Enforce IMDSv2
http_put_response_hop_limit = 1
}
tag_specifications {
resource_type = "instance"
tags = {
Name = "${var.environment}-wordpress-app"
}
}
lifecycle {
create_before_destroy = true
}
}
data "aws_region" "current" {}
# Auto-scaling group manages instance lifecycle
resource "aws_autoscaling_group" "wordpress" {
name_prefix = "${var.environment}-wordpress-"
desired_capacity = var.desired_capacity
min_size = var.min_size
max_size = var.max_size
vpc_zone_identifier = var.private_subnet_ids
target_group_arns = [aws_lb_target_group.wordpress.arn]
health_check_type = "ELB"
health_check_grace_period = 300
launch_template {
id = aws_launch_template.wordpress.id
version = "$Latest"
}
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
instance_warmup = 120
}
}
tag {
key = "Name"
value = "${var.environment}-wordpress"
propagate_at_launch = true
}
lifecycle {
create_before_destroy = true
}
}
# Scale up policy - triggered by high CPU
resource "aws_autoscaling_policy" "scale_up" {
name = "${var.environment}-wordpress-scale-up"
autoscaling_group_name = aws_autoscaling_group.wordpress.name
adjustment_type = "ChangeInCapacity"
scaling_adjustment = 1
cooldown = 300
}
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
alarm_name = "${var.environment}-wordpress-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = 120
statistic = "Average"
threshold = 70
alarm_description = "Scale up when CPU exceeds 70% for 4 minutes"
alarm_actions = [aws_autoscaling_policy.scale_up.arn]
dimensions = {
AutoScalingGroupName = aws_autoscaling_group.wordpress.name
}
}
# Scale down policy - triggered by low CPU
resource "aws_autoscaling_policy" "scale_down" {
name = "${var.environment}-wordpress-scale-down"
autoscaling_group_name = aws_autoscaling_group.wordpress.name
adjustment_type = "ChangeInCapacity"
scaling_adjustment = -1
cooldown = 300
}
resource "aws_cloudwatch_metric_alarm" "low_cpu" {
alarm_name = "${var.environment}-wordpress-low-cpu"
comparison_operator = "LessThanThreshold"
evaluation_periods = 3
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = 120
statistic = "Average"
threshold = 30
alarm_description = "Scale down when CPU below 30% for 6 minutes"
alarm_actions = [aws_autoscaling_policy.scale_down.arn]
dimensions = {
AutoScalingGroupName = aws_autoscaling_group.wordpress.name
}
}
The User Data Script
The user data script runs when each instance boots. It mounts EFS, retrieves secrets, writes the WordPress configuration, and starts the services. This is the bridge between your baked AMI and your environment-specific settings.
#!/bin/bash
# modules/compute/user_data.sh
set -euo pipefail
# Mount EFS for shared uploads directory
mkdir -p /var/www/html/wp-content/uploads
mount -t efs -o tls ${efs_id}:/ /var/www/html/wp-content/uploads
echo "${efs_id}:/ /var/www/html/wp-content/uploads efs _netdev,tls 0 0" >> /etc/fstab
# Retrieve database password from Secrets Manager
DB_PASSWORD=$(aws secretsmanager get-secret-value \
--secret-id "${db_password_arn}" \
--query 'SecretString' \
--output text \
--region ${aws_region})
# Generate WordPress salts
WP_SALTS=$(curl -s https://api.wordpress.org/secret-key/1.1/salt/)
# Write wp-config.php
cat > /var/www/html/wp-config.php << 'WPCONFIG'
There are several WordPress-specific details worth calling out in this script. The FORCE_SSL_ADMIN directive combined with the HTTP_X_FORWARDED_PROTO check prevents the infamous redirect loop that occurs when the ALB terminates SSL and forwards plain HTTP to WordPress. Without this, WordPress sees an HTTP request, tries to redirect to HTTPS, the ALB forwards it as HTTP again, and the browser gives up after too many redirects.
The DISALLOW_FILE_EDIT flag is critical in a multi-server environment. If an admin edits a theme file through the WordPress editor, that change only applies to whichever server handled the request. The other instances still run the old code. Disabling the editor forces all code changes through your deployment pipeline where they apply consistently.
Amazon RDS with Read Replicas and Failover
WordPress is a database-heavy application. Every page load triggers multiple queries, and complex sites with many plugins can execute hundreds of queries per request. RDS gives you a managed MySQL (or MariaDB) instance with automated backups, patching, and Multi-AZ failover without the operational burden of running your own database server.
# modules/database/main.tf
# Subnet group tells RDS which subnets to use
resource "aws_db_subnet_group" "wordpress" {
name = "${var.environment}-wordpress-db-subnet"
subnet_ids = var.private_subnet_ids
tags = {
Name = "${var.environment}-wordpress-db-subnet"
}
}
# Parameter group for MySQL tuning
resource "aws_db_parameter_group" "wordpress" {
family = "mysql8.0"
name = "${var.environment}-wordpress-params"
# Increase max connections for auto-scaling environment
parameter {
name = "max_connections"
value = "500"
}
# Enable slow query log for performance analysis
parameter {
name = "slow_query_log"
value = "1"
}
parameter {
name = "long_query_time"
value = "2"
}
# InnoDB buffer pool - let RDS manage based on instance memory
parameter {
name = "innodb_buffer_pool_size"
value = "{DBInstanceClassMemory*3/4}"
}
# Query cache is deprecated in MySQL 8.0, use Redis instead
# WordPress benefits more from object caching than query caching
tags = {
Name = "${var.environment}-wordpress-params"
}
}
# Primary RDS instance with Multi-AZ for automatic failover
resource "aws_db_instance" "wordpress" {
identifier = "${var.environment}-wordpress-primary"
engine = "mysql"
engine_version = "8.0"
instance_class = var.environment == "production" ? "db.r6g.large" : "db.t4g.medium"
allocated_storage = 100
max_allocated_storage = 500 # Auto-scaling storage
db_name = "wordpress"
username = "wordpress"
password = data.aws_secretsmanager_secret_version.db_password.secret_string
db_subnet_group_name = aws_db_subnet_group.wordpress.name
vpc_security_group_ids = [var.db_security_group_id]
parameter_group_name = aws_db_parameter_group.wordpress.name
multi_az = var.environment == "production" ? true : false
publicly_accessible = false
# Backup configuration
backup_retention_period = 14
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
# Encryption at rest
storage_encrypted = true
# Performance Insights for query analysis
performance_insights_enabled = true
performance_insights_retention_period = 7
# Deletion protection for production
deletion_protection = var.environment == "production" ? true : false
# Final snapshot before deletion
skip_final_snapshot = var.environment == "production" ? false : true
final_snapshot_identifier = var.environment == "production" ? "${var.environment}-wordpress-final-${formatdate("YYYYMMDD", timestamp())}" : null
tags = {
Name = "${var.environment}-wordpress-primary"
}
}
# Read replica for offloading read-heavy queries
resource "aws_db_instance" "wordpress_replica" {
count = var.environment == "production" ? 1 : 0
identifier = "${var.environment}-wordpress-replica"
replicate_source_db = aws_db_instance.wordpress.identifier
instance_class = "db.r6g.large"
publicly_accessible = false
vpc_security_group_ids = [var.db_security_group_id]
parameter_group_name = aws_db_parameter_group.wordpress.name
performance_insights_enabled = true
performance_insights_retention_period = 7
tags = {
Name = "${var.environment}-wordpress-replica"
}
}
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = var.db_password_arn
}
output "primary_endpoint" {
value = aws_db_instance.wordpress.endpoint
}
output "replica_endpoint" {
value = var.environment == "production" ? aws_db_instance.wordpress_replica[0].endpoint : null
}
Using Read Replicas with WordPress
WordPress does not natively support read replicas. Out of the box, every query goes to the same database host. To split reads and writes, you need a drop-in plugin like HyperDB (from Automattic) or LudicrousDB. These plugins intercept WordPress database queries, examine whether they are reads (SELECT) or writes (INSERT, UPDATE, DELETE), and route them to the appropriate server.
Add the replica endpoint to your wp-config.php as a secondary database server. The HyperDB configuration looks like this:
// db-config.php (HyperDB configuration)
$wpdb->add_database(array(
'host' => 'primary-endpoint.rds.amazonaws.com',
'user' => 'wordpress',
'password' => $db_password,
'name' => 'wordpress',
'write' => 1, // Primary handles writes
'read' => 1, // Primary also handles reads as fallback
'dataset' => 'global',
'timeout' => 0.2,
));
$wpdb->add_database(array(
'host' => 'replica-endpoint.rds.amazonaws.com',
'user' => 'wordpress',
'password' => $db_password,
'name' => 'wordpress',
'write' => 0, // Replica never handles writes
'read' => 2, // Higher priority for reads
'dataset' => 'global',
'timeout' => 0.2,
));
The read priority value (2 for the replica vs. 1 for the primary) ensures read queries prefer the replica. If the replica goes down, reads fall back to the primary. This setup can cut the load on your primary instance by 60-80% on read-heavy WordPress sites, which is most of them.
ElastiCache Redis for Object Cache and Sessions
Object caching is the single most effective performance optimization for WordPress after page caching. Every time WordPress needs an option value, a user's metadata, or a transient, it queries the database. With a persistent object cache backed by Redis, these values are stored in memory and retrieved in microseconds instead of milliseconds.
In a horizontally scaled environment, Redis serves a second critical purpose: session storage. PHP sessions are stored on the local filesystem by default. When a user's requests get routed to different servers by the load balancer, their session data disappears. Storing sessions in Redis makes them available from any application server.
# modules/cache/main.tf
# Subnet group for ElastiCache
resource "aws_elasticache_subnet_group" "wordpress" {
name = "${var.environment}-wordpress-cache-subnet"
subnet_ids = var.private_subnet_ids
tags = {
Name = "${var.environment}-wordpress-cache-subnet"
}
}
# Parameter group for Redis tuning
resource "aws_elasticache_parameter_group" "wordpress" {
family = "redis7"
name = "${var.environment}-wordpress-redis-params"
# Maximum memory policy: evict least recently used keys when memory is full
parameter {
name = "maxmemory-policy"
value = "allkeys-lru"
}
# Disable snapshotting for cache-only use (saves memory)
parameter {
name = "save"
value = ""
}
tags = {
Name = "${var.environment}-wordpress-redis-params"
}
}
# Redis replication group with automatic failover
resource "aws_elasticache_replication_group" "wordpress" {
replication_group_id = "${var.environment}-wp-redis"
description = "Redis cluster for WordPress object cache and sessions"
node_type = var.environment == "production" ? "cache.r6g.large" : "cache.t4g.medium"
num_cache_clusters = var.environment == "production" ? 2 : 1
port = 6379
subnet_group_name = aws_elasticache_subnet_group.wordpress.name
security_group_ids = [var.cache_security_group_id]
parameter_group_name = aws_elasticache_parameter_group.wordpress.name
automatic_failover_enabled = var.environment == "production" ? true : false
multi_az_enabled = var.environment == "production" ? true : false
at_rest_encryption_enabled = true
transit_encryption_enabled = false # Set to true if you need encryption in transit
# Maintenance and snapshot windows
maintenance_window = "sun:05:00-sun:06:00"
snapshot_retention_limit = 0 # No snapshots for cache-only use
tags = {
Name = "${var.environment}-wordpress-redis"
}
}
output "redis_endpoint" {
value = aws_elasticache_replication_group.wordpress.primary_endpoint_address
}
The allkeys-lru eviction policy is important for WordPress. When Redis runs out of memory, it evicts the least recently used keys to make room for new ones. This prevents Redis from rejecting writes when it fills up, which would cause WordPress to fall back to database queries silently rather than crashing.
For production deployments, two cache clusters with automatic failover give you a standby node that promotes to primary within seconds if the active node fails. Your WordPress site keeps serving cached content without interruption.
Configuring the Redis Object Cache Plugin
On the WordPress side, install the Redis Object Cache plugin by Till Kruss. The plugin reads the Redis connection details from wp-config.php constants that the user data script already writes. The key constants are:
// Already in wp-config.php from user_data.sh
define('WP_REDIS_HOST', 'primary-endpoint.cache.amazonaws.com');
define('WP_REDIS_PORT', 6379);
define('WP_REDIS_DATABASE', 0); // DB 0 for object cache
define('WP_REDIS_TIMEOUT', 1); // 1 second connection timeout
define('WP_REDIS_READ_TIMEOUT', 1); // 1 second read timeout
// For PHP session storage, add to php.ini or pool config:
// session.save_handler = redis
// session.save_path = "tcp://primary-endpoint.cache.amazonaws.com:6379?database=1"
Using a separate Redis database (database 1) for sessions keeps them isolated from the object cache. When you flush the object cache during a deployment, sessions remain intact and users stay logged in.
EFS for Shared wp-content/uploads
WordPress stores uploaded media files on the local filesystem by default. In a single-server setup, this works fine. In a multi-server auto-scaling environment, it becomes a problem immediately. A user uploads an image, and it lands on server A. The next request goes to server B, which has no knowledge of that file. The image returns a 404.
Amazon Elastic File System (EFS) provides a shared NFS filesystem that multiple EC2 instances can mount simultaneously. Every server sees the same files, and writes from one instance are immediately visible to all others.
# modules/storage/main.tf
# EFS filesystem for shared WordPress uploads
resource "aws_efs_file_system" "wordpress" {
creation_token = "${var.environment}-wordpress-uploads"
encrypted = true
performance_mode = "generalPurpose"
throughput_mode = "bursting" # Use "elastic" for unpredictable workloads
lifecycle_policy {
transition_to_ia = "AFTER_30_DAYS" # Move old files to Infrequent Access tier
}
lifecycle_policy {
transition_to_primary_storage_class = "AFTER_1_ACCESS" # Move back on access
}
tags = {
Name = "${var.environment}-wordpress-uploads"
}
}
# Mount targets - one per AZ so instances in any AZ can connect
resource "aws_efs_mount_target" "wordpress" {
count = length(var.private_subnet_ids)
file_system_id = aws_efs_file_system.wordpress.id
subnet_id = var.private_subnet_ids[count.index]
security_groups = [var.efs_security_group_id]
}
# EFS access point with POSIX permissions matching www-data
resource "aws_efs_access_point" "wordpress" {
file_system_id = aws_efs_file_system.wordpress.id
posix_user {
gid = 33 # www-data group
uid = 33 # www-data user
}
root_directory {
path = "/uploads"
creation_info {
owner_gid = 33
owner_uid = 33
permissions = "0755"
}
}
tags = {
Name = "${var.environment}-wordpress-uploads-ap"
}
}
# Backup policy - daily automated backups
resource "aws_efs_backup_policy" "wordpress" {
file_system_id = aws_efs_file_system.wordpress.id
backup_policy {
status = "ENABLED"
}
}
output "efs_id" {
value = aws_efs_file_system.wordpress.id
}
output "efs_dns_name" {
value = aws_efs_file_system.wordpress.dns_name
}
The lifecycle policies deserve attention. WordPress sites accumulate media files over years, but older content gets accessed far less frequently than recent uploads. The Infrequent Access tier costs about $0.016 per GB-month compared to $0.30 for the standard tier. By automatically transitioning files older than 30 days, you can cut storage costs by 90% for archival media while keeping everything accessible.
The access point sets POSIX user and group IDs to 33, which is the numeric ID for the www-data user on Ubuntu systems. This prevents permission conflicts between the web server and the filesystem. Without this, Nginx or PHP-FPM might not be able to read or write uploaded files, causing mysterious upload failures.
EFS Performance Considerations
EFS has higher latency than local EBS storage. For WordPress media uploads, this latency is usually acceptable because users do not upload files as frequently as they request pages. However, you should NOT mount your entire WordPress installation on EFS. PHP files, theme files, and plugins should live on the local filesystem (or EBS) for fast execution. Only the wp-content/uploads directory belongs on EFS.
If your site serves a lot of media directly (rather than through a CDN), consider EFS Elastic throughput mode instead of Bursting. Bursting provides a baseline of 50 MiB/s per TiB of storage and allows bursts up to 100 MiB/s, but once you exhaust your burst credits, throughput drops dramatically. Elastic mode scales automatically based on workload and charges per-transfer, which is more predictable for high-traffic sites.
CloudFront CDN Integration
A Content Delivery Network caches your site's static assets at edge locations worldwide, reducing load on your origin servers and improving page load times for visitors far from your AWS region. CloudFront integrates tightly with other AWS services and supports custom cache behaviors that map well to WordPress URL patterns.
# modules/cdn/main.tf
resource "aws_cloudfront_distribution" "wordpress" {
enabled = true
is_ipv6_enabled = true
comment = "${var.environment} WordPress CDN"
default_root_object = ""
price_class = "PriceClass_100" # US, Canada, Europe only; use PriceClass_All for global
aliases = [var.domain_name, "www.${var.domain_name}"]
web_acl_id = var.waf_web_acl_arn # Optional: attach AWS WAF
# Origin pointing to the ALB
origin {
domain_name = var.alb_dns_name
origin_id = "wordpress-alb"
custom_origin_config {
http_port = 80
https_port = 443
origin_protocol_policy = "https-only"
origin_ssl_protocols = ["TLSv1.2"]
}
custom_header {
name = "X-CloudFront-Secret"
value = var.cloudfront_secret # Verify requests actually come from CloudFront
}
}
# Default behavior - dynamic WordPress pages (no caching)
default_cache_behavior {
allowed_methods = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "wordpress-alb"
forwarded_values {
query_string = true
headers = ["Host", "Authorization", "CloudFront-Forwarded-Proto"]
cookies {
forward = "whitelist"
whitelisted_names = [
"wordpress_*",
"wp-settings-*",
"comment_author_*",
]
}
}
viewer_protocol_policy = "redirect-to-https"
min_ttl = 0
default_ttl = 0 # Do not cache dynamic pages by default
max_ttl = 0
compress = true
}
# Cache static assets aggressively
ordered_cache_behavior {
path_pattern = "/wp-content/*"
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "wordpress-alb"
forwarded_values {
query_string = true
headers = ["Host", "Origin", "Access-Control-Request-Headers", "Access-Control-Request-Method"]
cookies {
forward = "none"
}
}
viewer_protocol_policy = "redirect-to-https"
min_ttl = 0
default_ttl = 86400 # 24 hours
max_ttl = 31536000 # 1 year
compress = true
}
# Cache wp-includes static files
ordered_cache_behavior {
path_pattern = "/wp-includes/*"
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "wordpress-alb"
forwarded_values {
query_string = true
headers = ["Host"]
cookies {
forward = "none"
}
}
viewer_protocol_policy = "redirect-to-https"
min_ttl = 0
default_ttl = 86400
max_ttl = 31536000
compress = true
}
# Never cache admin or login pages
ordered_cache_behavior {
path_pattern = "/wp-admin/*"
allowed_methods = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "wordpress-alb"
forwarded_values {
query_string = true
headers = ["*"]
cookies {
forward = "all"
}
}
viewer_protocol_policy = "redirect-to-https"
min_ttl = 0
default_ttl = 0
max_ttl = 0
compress = true
}
# SSL certificate
viewer_certificate {
acm_certificate_arn = var.certificate_arn
ssl_support_method = "sni-only"
minimum_protocol_version = "TLSv1.2_2021"
}
restrictions {
geo_restriction {
restriction_type = "none"
}
}
tags = {
Name = "${var.environment}-wordpress-cdn"
}
}
# DNS record pointing to CloudFront
resource "aws_route53_record" "wordpress" {
zone_id = var.hosted_zone_id
name = var.domain_name
type = "A"
alias {
name = aws_cloudfront_distribution.wordpress.domain_name
zone_id = aws_cloudfront_distribution.wordpress.hosted_zone_id
evaluate_target_health = false
}
}
output "cloudfront_domain" {
value = aws_cloudfront_distribution.wordpress.domain_name
}
output "cloudfront_distribution_id" {
value = aws_cloudfront_distribution.wordpress.id
}
The cache behavior ordering matters. CloudFront evaluates ordered behaviors from top to bottom and uses the first matching pattern. Static assets under /wp-content/ and /wp-includes/ get cached for up to a year. Admin pages never get cached. Everything else (the default behavior) passes through to the origin without caching, which lets WordPress handle dynamic content generation, cookies, and authenticated sessions correctly.
The X-CloudFront-Secret custom header is a security measure. Without it, anyone who discovers your ALB's DNS name can bypass CloudFront entirely, skipping your CDN cache and any WAF rules attached to the distribution. Add a check in your Nginx configuration that rejects requests missing this header:
# In nginx WordPress server block
if ($http_x_cloudfront_secret != "your-secret-value-here") {
return 403;
}
Cache Invalidation Strategy
When you publish a new post or update content, the cached version on CloudFront becomes stale. There are two approaches to handle this. The first is to use short TTLs for HTML pages and rely on CloudFront's cache headers. Set Cache-Control: s-maxage=300 in your Nginx config for non-admin pages, giving you a 5-minute cache that balances performance with freshness. The second approach is to create CloudFront invalidations programmatically using a WordPress plugin that calls the CloudFront API when content changes. The first approach is simpler and works well for most sites. The second is better for news sites or any situation where stale content has business impact.
Secrets Management with AWS Secrets Manager and SSM
Hardcoding credentials in Terraform files is a security antipattern that will eventually burn you. Database passwords, API keys, and other sensitive values must never appear in version control. AWS provides two services for this: Secrets Manager for credentials that need rotation, and Systems Manager Parameter Store for configuration values.
# modules/secrets/main.tf
# Database password stored in Secrets Manager
resource "aws_secretsmanager_secret" "db_password" {
name = "${var.environment}/wordpress/db-password"
description = "WordPress database password"
recovery_window_in_days = var.environment == "production" ? 30 : 0
tags = {
Name = "${var.environment}-wordpress-db-password"
}
}
resource "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
secret_string = var.db_password
}
# Optional: automatic rotation for the database password
resource "aws_secretsmanager_secret_rotation" "db_password" {
count = var.environment == "production" ? 1 : 0
secret_id = aws_secretsmanager_secret.db_password.id
rotation_lambda_arn = var.rotation_lambda_arn
rotation_rules {
automatically_after_days = 30
}
}
# SSM Parameter Store for non-secret configuration
resource "aws_ssm_parameter" "wordpress_config" {
for_each = {
"site_url" = "https://${var.domain_name}"
"admin_email" = var.admin_email
"cache_ttl" = "3600"
"max_upload_size" = "64M"
}
name = "/${var.environment}/wordpress/${each.key}"
type = "String"
value = each.value
tags = {
Name = "${var.environment}-wordpress-${each.key}"
}
}
# SSM Parameter for sensitive values that don't need rotation
resource "aws_ssm_parameter" "stripe_key" {
name = "/${var.environment}/wordpress/stripe_secret_key"
type = "SecureString"
value = var.stripe_secret_key
tags = {
Name = "${var.environment}-wordpress-stripe-key"
}
}
output "db_password_arn" {
value = aws_secretsmanager_secret.db_password.arn
}
The difference between Secrets Manager and Parameter Store comes down to features and cost. Secrets Manager costs $0.40 per secret per month plus $0.05 per 10,000 API calls, but it supports automatic rotation with Lambda functions. Parameter Store is free for standard parameters (up to 10,000) and $0.05 per 10,000 API calls for advanced parameters. Use Secrets Manager for database passwords that should rotate automatically. Use Parameter Store for everything else.
In your user data script, the EC2 instance retrieves secrets at boot time using the AWS CLI. The IAM role attached to the instance profile grants permission to read only the specific secrets it needs. If an instance is compromised, the attacker can access the WordPress database password but not, for example, the production Stripe key for a different application.
Handling Secrets in Terraform Variables
The initial secret values still need to come from somewhere. The most common approach is to pass them through environment variables or a .tfvars file that is excluded from version control:
# terraform.tfvars (DO NOT commit this file)
db_password = "your-strong-generated-password"
stripe_secret_key = "sk_live_..."
# .gitignore
*.tfvars
*.tfvars.json
.terraform/
*.tfstate
*.tfstate.backup
For team environments, use a CI/CD pipeline that pulls secrets from a secure vault (like HashiCorp Vault or 1Password) and injects them as environment variables during terraform apply. This way, no human ever needs to see the production database password.
Terraform State Management and Team Workflows
Terraform tracks the current state of your infrastructure in a state file. This file maps Terraform resource addresses to real AWS resource IDs. Without it, Terraform cannot determine what already exists, what needs updating, and what should be destroyed. By default, state lives in a local terraform.tfstate file. For teams, this is unworkable. Two engineers running terraform apply simultaneously from different machines will corrupt the state or create duplicate resources.
Remote State with S3 and DynamoDB
The standard approach is to store state in an S3 bucket with DynamoDB-based locking. The S3 bucket provides durable, versioned storage. DynamoDB provides a distributed lock that prevents concurrent applies.
# backend.tf
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "wordpress/production/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
# You need to create the S3 bucket and DynamoDB table before using this backend.
# Use a separate bootstrap Terraform configuration or create them manually.
# bootstrap/main.tf (run once, store state locally)
resource "aws_s3_bucket" "terraform_state" {
bucket = "mycompany-terraform-state"
tags = {
Name = "Terraform State"
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Name = "Terraform State Lock"
}
}
S3 bucket versioning is essential. If a state file becomes corrupted or someone runs a destructive operation by accident, you can recover a previous version. The KMS encryption ensures state data, which contains sensitive information like database endpoints and resource ARNs, stays encrypted at rest.
Team Workflow with Terraform Cloud or CI/CD
For teams, the safest workflow prevents anyone from running terraform apply on their local machine. Instead, all changes go through a code review process:
- An engineer creates a branch and modifies Terraform files.
- They open a pull request. The CI pipeline runs
terraform planand posts the output as a PR comment. - A teammate reviews the plan, checking for unexpected resource deletions or modifications.
- After approval and merge, the CD pipeline runs
terraform applyagainst the main branch.
Here is a GitHub Actions workflow that implements this pattern:
# .github/workflows/terraform.yml
name: Terraform WordPress Infrastructure
on:
pull_request:
paths:
- 'terraform/**'
push:
branches:
- main
paths:
- 'terraform/**'
permissions:
id-token: write # For OIDC auth with AWS
contents: read
pull-requests: write
jobs:
plan:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/terraform-github-actions
aws-region: us-east-1
- name: Terraform Init
run: terraform init
working-directory: terraform
- name: Terraform Plan
id: plan
run: terraform plan -no-color -out=tfplan
working-directory: terraform
- name: Comment PR with Plan
uses: actions/github-script@v7
with:
script: |
const plan = `${{ steps.plan.outputs.stdout }}`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Terraform Plan\n\`\`\`\n${plan}\n\`\`\``
});
apply:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/terraform-github-actions
aws-region: us-east-1
- name: Terraform Init
run: terraform init
working-directory: terraform
- name: Terraform Apply
run: terraform apply -auto-approve
working-directory: terraform
The OIDC authentication (id-token: write) removes the need for stored AWS access keys. GitHub Actions assumes an IAM role directly, and the temporary credentials expire after the workflow completes. This is far safer than storing long-lived access keys as repository secrets.
Cost Estimation and Right-Sizing
Infrastructure as Code does not mean unlimited infrastructure. Every resource in your Terraform configuration has a cost, and those costs add up quickly on AWS. Understanding the cost profile of each component helps you make informed decisions about instance sizing, redundancy, and optimization.
Monthly Cost Breakdown
Here is a realistic cost estimate for a production WordPress deployment on AWS using the configuration described in this article. All prices are for the us-east-1 region as of early 2022.
Compute (EC2 Auto-Scaling Group)
- 2x t3.large instances (baseline): ~$120/month
- Reserved instances (1-year, no upfront): ~$80/month (33% savings)
- Savings plans (1-year, compute): ~$85/month
Database (RDS MySQL)
- db.r6g.large Multi-AZ primary: ~$350/month
- db.r6g.large read replica: ~$175/month
- 100 GB storage: ~$23/month
- Reserved instances (1-year): ~$230/month for primary (34% savings)
Cache (ElastiCache Redis)
- cache.r6g.large with replica: ~$290/month
- Reserved nodes (1-year): ~$190/month (34% savings)
Storage (EFS)
- 10 GB Standard: ~$3/month
- 50 GB Infrequent Access: ~$0.80/month
- Practically free for most WordPress sites
Networking
- NAT Gateway (1x): ~$32/month + $0.045/GB processed
- Application Load Balancer: ~$16/month + $0.008/LCU-hour
- Data transfer: Varies widely, budget $20-100/month
CDN (CloudFront)
- Depends entirely on traffic volume
- First 1 TB/month: $0.085/GB
- 10 million HTTP requests: ~$7.50
- Typical WordPress site (100K monthly visitors): $10-30/month
Total estimated range: $700-1,200/month for production, $200-400/month for staging (smaller instances, no Multi-AZ, no replicas).
Right-Sizing Strategies
These costs assume a medium-traffic WordPress site. You can reduce them significantly with some targeted choices:
Start small and scale up. Use t3.medium instances instead of t3.large and monitor CPU credit consumption. If your application consistently uses less than 20% CPU, you are paying for capacity you do not need. The auto-scaling group handles traffic spikes, so your base instances can be smaller than you think.
Use Graviton (ARM) instances. The r6g, t4g, and m6g instance families use AWS Graviton processors and cost about 20% less than their Intel equivalents while delivering equal or better performance. WordPress runs perfectly on ARM since PHP has had ARM support for years. Swap t3.large for t4g.large and db.r6g.large for db.r7g.large to capture these savings.
Reserve what you can predict. Your baseline instances (minimum ASG capacity), primary database, and Redis cluster run 24/7. Reserved instances or Savings Plans cut those costs by 30-60% depending on the commitment term and payment option. Only reserve the baseline. Let on-demand pricing handle the variable auto-scaling capacity.
Skip the read replica until you need it. For most WordPress sites, the primary RDS instance handles both reads and writes without breaking a sweat. Add the replica when Performance Insights shows your read IOPS consistently exceeding 70% of the instance maximum, or when you need the additional failover protection.
Integrate Infracost into your CI pipeline. Infracost is an open-source tool that estimates costs from Terraform plan output. Add it to your GitHub Actions workflow so every pull request shows the cost impact of infrastructure changes. Engineers see "This change adds $150/month" directly on the PR, which drives better cost-aware decisions.
# Add to your GitHub Actions workflow
- name: Infracost Breakdown
uses: infracost/actions/setup@v2
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- name: Generate Infracost JSON
run: infracost breakdown --path=terraform --format=json --out-file=/tmp/infracost.json
- name: Post Infracost Comment
run: |
infracost comment github \
--path=/tmp/infracost.json \
--repo=$GITHUB_REPOSITORY \
--pull-request=${{ github.event.pull_request.number }} \
--github-token=${{ github.token }}
Complete Annotated Variables and Outputs
With all the modules defined, here are the top-level variables and outputs that tie the configuration together. These represent the knobs and dials you adjust for each environment.
# variables.tf - Top-level input variables
variable "aws_region" {
description = "AWS region for all resources"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Deployment environment (staging, production)"
type = string
validation {
condition = contains(["staging", "production"], var.environment)
error_message = "Environment must be staging or production."
}
}
variable "vpc_cidr" {
description = "CIDR block for the VPC"
type = string
default = "10.0.0.0/16"
}
variable "availability_zones" {
description = "List of AZs to deploy across"
type = list(string)
default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
variable "instance_type" {
description = "EC2 instance type for WordPress servers"
type = string
default = "t4g.medium"
}
variable "ami_id" {
description = "AMI ID for WordPress servers (built with Packer)"
type = string
}
variable "asg_min_size" {
description = "Minimum number of instances in the auto-scaling group"
type = number
default = 2
}
variable "asg_max_size" {
description = "Maximum number of instances in the auto-scaling group"
type = number
default = 10
}
variable "asg_desired_capacity" {
description = "Starting number of instances in the auto-scaling group"
type = number
default = 2
}
variable "db_password" {
description = "Master password for the RDS instance"
type = string
sensitive = true
}
variable "domain_name" {
description = "Primary domain name for the WordPress site"
type = string
}
variable "certificate_arn" {
description = "ARN of the ACM certificate for SSL"
type = string
}
variable "stripe_secret_key" {
description = "Stripe secret key for payment processing"
type = string
sensitive = true
default = ""
}
variable "admin_email" {
description = "WordPress admin email address"
type = string
default = "[email protected]"
}
# outputs.tf - Top-level outputs
output "alb_dns_name" {
description = "DNS name of the Application Load Balancer"
value = module.compute.alb_dns_name
}
output "cloudfront_domain" {
description = "CloudFront distribution domain name"
value = module.cdn.cloudfront_domain
}
output "cloudfront_distribution_id" {
description = "CloudFront distribution ID (for cache invalidation)"
value = module.cdn.cloudfront_distribution_id
}
output "rds_primary_endpoint" {
description = "RDS primary instance endpoint"
value = module.database.primary_endpoint
}
output "rds_replica_endpoint" {
description = "RDS read replica endpoint (null if not production)"
value = module.database.replica_endpoint
}
output "redis_endpoint" {
description = "ElastiCache Redis primary endpoint"
value = module.cache.redis_endpoint
}
output "efs_id" {
description = "EFS filesystem ID"
value = module.storage.efs_id
}
output "vpc_id" {
description = "VPC ID"
value = module.networking.vpc_id
}
output "estimated_monthly_cost" {
description = "Rough monthly cost estimate"
value = "Run 'infracost breakdown --path=.' for detailed cost estimate"
}
Environment-Specific Configurations
Create separate .tfvars files for each environment. The staging configuration uses smaller instances, skips Multi-AZ, and disables read replicas:
# staging.tfvars
environment = "staging"
instance_type = "t4g.small"
asg_min_size = 1
asg_max_size = 2
asg_desired_capacity = 1
domain_name = "staging.yourdomain.com"
certificate_arn = "arn:aws:acm:us-east-1:123456789012:certificate/staging-cert-id"
admin_email = "[email protected]"
# production.tfvars
environment = "production"
instance_type = "t4g.large"
asg_min_size = 2
asg_max_size = 10
asg_desired_capacity = 2
domain_name = "yourdomain.com"
certificate_arn = "arn:aws:acm:us-east-1:123456789012:certificate/prod-cert-id"
admin_email = "[email protected]"
Deploy to staging with terraform apply -var-file=staging.tfvars and production with terraform apply -var-file=production.tfvars. Same code, different parameters, consistent architecture across environments.
Operational Considerations and Day-Two Operations
Provisioning infrastructure is only the beginning. Running WordPress in production on AWS requires ongoing operational attention. Terraform helps with the infrastructure layer, but you need additional tooling and processes for application-level concerns.
Deployments and Rolling Updates
WordPress deployments in an auto-scaled environment work differently than deploying to a single server. You cannot just SSH in and run git pull. Two approaches work well:
AMI-based deployments. Build a new AMI with the updated WordPress code using Packer. Update the AMI ID in your Terraform variables. Run terraform apply. The auto-scaling group performs a rolling replacement, launching new instances with the updated AMI and draining old instances. This is the most reliable approach because each instance boots from a known-good image.
CodeDeploy-based deployments. Keep the AMI stable and use AWS CodeDeploy to push application code to running instances. This is faster because you do not need to build a new AMI for every code change, but it introduces configuration drift between the AMI and the running state of instances. If an instance is replaced by auto-scaling, it launches with the old code until CodeDeploy catches up.
For WordPress specifically, AMI-based deployments are usually the better choice. WordPress code changes are infrequent (core updates, plugin updates, theme changes), and each change should be tested before deployment anyway. The extra time to build an AMI is a worthwhile tradeoff for the consistency guarantee.
Database Migrations
WordPress handles its own database schema migrations through the wp_upgrade() function. When you update WordPress core, the first admin page load detects the version mismatch and runs the necessary database updates. In a multi-server environment, this means the first server to run the update modifies the shared database. Subsequent servers detect that the migration has already completed and skip it.
This usually works without issues, but there is a race condition risk if multiple servers boot simultaneously with a new WordPress version. To eliminate this risk, include a pre-deployment step that runs wp core update-db via WP-CLI on a single instance before updating the auto-scaling group.
Monitoring and Alerting
Add CloudWatch alarms for the metrics that actually matter for WordPress performance:
# monitoring.tf (add to root module or create a monitoring module)
# RDS: High CPU indicates slow queries or undersized instance
resource "aws_cloudwatch_metric_alarm" "rds_cpu" {
alarm_name = "${var.environment}-rds-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 3
metric_name = "CPUUtilization"
namespace = "AWS/RDS"
period = 300
statistic = "Average"
threshold = 80
alarm_description = "RDS CPU exceeds 80% for 15 minutes"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
DBInstanceIdentifier = module.database.primary_instance_id
}
}
# RDS: Free storage running low
resource "aws_cloudwatch_metric_alarm" "rds_storage" {
alarm_name = "${var.environment}-rds-low-storage"
comparison_operator = "LessThanThreshold"
evaluation_periods = 1
metric_name = "FreeStorageSpace"
namespace = "AWS/RDS"
period = 300
statistic = "Average"
threshold = 5368709120 # 5 GB in bytes
alarm_description = "RDS free storage below 5 GB"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
DBInstanceIdentifier = module.database.primary_instance_id
}
}
# ElastiCache: Memory usage approaching limit
resource "aws_cloudwatch_metric_alarm" "redis_memory" {
alarm_name = "${var.environment}-redis-high-memory"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "DatabaseMemoryUsagePercentage"
namespace = "AWS/ElastiCache"
period = 300
statistic = "Average"
threshold = 80
alarm_description = "Redis memory usage exceeds 80%"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
CacheClusterId = "${var.environment}-wp-redis-001"
}
}
# ALB: High 5xx error rate indicates application problems
resource "aws_cloudwatch_metric_alarm" "alb_5xx" {
alarm_name = "${var.environment}-alb-5xx-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "HTTPCode_Target_5XX_Count"
namespace = "AWS/ApplicationELB"
period = 300
statistic = "Sum"
threshold = 50
alarm_description = "More than 50 5xx errors in 5 minutes"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
LoadBalancer = module.compute.alb_arn_suffix
}
}
# SNS topic for alarm notifications
resource "aws_sns_topic" "alerts" {
name = "${var.environment}-wordpress-alerts"
}
resource "aws_sns_topic_subscription" "email" {
topic_arn = aws_sns_topic.alerts.arn
protocol = "email"
endpoint = var.admin_email
}
These four alarms catch the most common failure modes in WordPress AWS deployments. High RDS CPU usually means a rogue plugin is running unoptimized queries. Low storage means your database is growing faster than expected, often from logging plugins or accumulated post revisions. High Redis memory suggests your eviction policy needs tuning or your instance size needs increasing. And a spike in 5xx errors on the ALB signals application crashes that need immediate attention.
Backup and Disaster Recovery
The Terraform configuration already includes several backup mechanisms: RDS automated backups with 14-day retention, EFS automatic backups through AWS Backup, and S3 versioning on the Terraform state bucket. For a complete disaster recovery plan, add cross-region replication for your RDS backups and S3 assets:
# Cross-region RDS backup replication
resource "aws_db_instance_automated_backups_replication" "wordpress" {
count = var.environment == "production" ? 1 : 0
source_db_instance_arn = aws_db_instance.wordpress.arn
retention_period = 7
provider = aws.dr_region # Requires a second provider for the DR region
}
Test your recovery process regularly. A backup you have never restored from is a backup you cannot trust. Schedule quarterly recovery drills where you restore the database to a new RDS instance, mount the EFS filesystem, and verify the site functions correctly.
Common Pitfalls and How to Avoid Them
After deploying WordPress on AWS with Terraform across dozens of projects, certain mistakes appear repeatedly. Knowing these in advance saves hours of debugging.
WordPress cron and auto-scaling do not mix well. WordPress uses a pseudo-cron system triggered by page visits. In an auto-scaled environment, cron events can fire multiple times (once per instance) or not at all (if instances are scaling down). Disable WP-Cron in wp-config.php with define('DISABLE_WP_CRON', true) and replace it with a real cron job using EventBridge (CloudWatch Events) that hits wp-cron.php on a single instance through the ALB.
ALB health checks can trigger WordPress installation wizard. If your health check hits a page that requires database access before WordPress is fully configured, the health check "succeeds" with a 200 status but actually returns the installation page. Use /wp-login.php as the health check path and accept both 200 and 302 status codes. The login page always exists and requires a working database connection.
Security group changes can cause downtime. Terraform sometimes needs to destroy and recreate security groups when their configuration changes significantly. If instances reference the old security group, they lose network access during the transition. The create_before_destroy lifecycle rule on security groups prevents this, but you must also use name_prefix instead of name to avoid naming conflicts between the old and new groups.
EFS mount can block instance boot. If the EFS mount target is unavailable when an instance boots, the mount command hangs indefinitely unless you set a timeout. Use the _netdev mount option in fstab, which tells the OS to wait for network availability before mounting. Also add noresvport to handle NFS port reuse during reconnection after brief network interruptions.
Terraform state can contain sensitive data. Your state file includes every attribute of every resource, including database passwords if you pass them as variables. Always encrypt your state bucket with KMS, restrict access to the S3 bucket using IAM policies, and consider using Terraform's built-in state encryption features in newer versions. Never share state files over unencrypted channels.
Putting It All Together
The architecture described in this article gives you a WordPress deployment that handles traffic spikes through auto-scaling, survives Availability Zone failures through Multi-AZ RDS and distributed subnets, accelerates global performance through CloudFront, and stores every infrastructure decision in version-controlled code that your team can review, discuss, and improve.
To deploy this from scratch, follow these steps:
- Bootstrap the state backend. Run the bootstrap Terraform configuration to create the S3 bucket and DynamoDB table for remote state.
- Build the WordPress AMI. Run Packer to create an AMI with your OS, web server, PHP, and WordPress pre-installed.
- Set your variables. Create a
production.tfvarsfile with your AMI ID, domain name, certificate ARN, and other environment-specific values. Store sensitive values in environment variables or a secrets manager. - Initialize Terraform. Run
terraform initto download providers and configure the remote backend. - Review the plan. Run
terraform plan -var-file=production.tfvarsand carefully review every resource that will be created. Pay special attention to the security groups, IAM policies, and public accessibility settings. - Apply. Run
terraform apply -var-file=production.tfvarsand confirm. The first apply takes 15-20 minutes, mostly waiting for the RDS instance to provision. - Configure DNS. If you are not using Route 53 (or if your domain is registered elsewhere), create a CNAME record pointing your domain to the CloudFront distribution domain name.
- Complete WordPress setup. Visit your domain and complete the WordPress installation wizard. After that, install and activate the Redis Object Cache plugin, configure HyperDB if using a read replica, and disable WP-Cron in favor of EventBridge.
Every subsequent infrastructure change follows the same pull-request-driven workflow: modify Terraform files, open a PR, review the plan, merge, and let CI/CD apply the changes. Your WordPress infrastructure becomes as manageable and auditable as your application code.
The Terraform configurations in this article are designed as a starting point. Your specific requirements will drive modifications. Maybe you need a WAF to block malicious traffic patterns. Maybe you want to use Aurora Serverless instead of standard RDS to handle unpredictable traffic without pre-provisioning capacity. Maybe you need a VPN connection to an on-premises network for a hybrid deployment. The modular structure makes these additions straightforward: add a new module, wire it into the root module, and apply.
Infrastructure as Code is not just about automation. It is about bringing software engineering discipline to infrastructure management. Version control, code review, automated testing, and continuous deployment are practices that have made application development more reliable and more collaborative over the past two decades. Your infrastructure deserves the same rigor. Terraform gives you the tools. The architecture patterns in this article give you the blueprint. What you build with them is up to you.
Marcus Chen
Staff engineer with 12 years in WordPress infrastructure. Previously at Automattic and a large media company. Writes about hosting platforms, caching, and deployment pipelines.