Zero-Downtime Deployment Strategies for WordPress: Blue-Green, Canary, and Rolling Updates in Practice
Why Downtime During WordPress Deployments Is Unacceptable
Every second your WordPress site is offline costs money. For WooCommerce stores processing orders around the clock, a five-minute maintenance window at the wrong time can mean thousands in lost revenue. For media sites riding a traffic spike, a deployment-induced outage destroys the moment. And for SaaS platforms built on WordPress, downtime erodes the trust you spent months building.
The traditional WordPress deployment method is alarmingly primitive: SSH into the server, pull the latest code, maybe run a quick search-replace, and hope nothing breaks. If something does break, you scramble to fix it while users stare at a white screen or a maintenance page. This approach worked when WordPress powered simple blogs. It does not work when WordPress powers businesses.
Zero-downtime deployment eliminates the gap between “old version running” and “new version running.” Users never see a maintenance page. Requests are never dropped. The transition from one release to the next is invisible to anyone visiting your site.
This article covers three primary strategies for achieving zero-downtime WordPress deployments: blue-green deployments, canary releases, and rolling updates. We will walk through real configurations for Nginx, HAProxy, Deployer PHP, Kubernetes, and GitHub Actions. We will address the hard problems that WordPress-specific concerns introduce, including database migrations, shared uploads directories, WooCommerce session persistence, and instant rollback procedures.
None of this is theoretical. Every configuration example in this article has been tested in production environments handling real traffic.
Symlink-Based Atomic Deployments with Deployer PHP
Before discussing blue-green or canary strategies, you need a deployment mechanism that can switch between releases instantly. Symlink-based atomic deployment is the foundation that makes everything else possible.
The concept is straightforward. Each deployment creates a new directory containing the full application code. A symlink called current points to the active release directory. Switching releases means updating where the symlink points. This operation is atomic on Linux filesystems, meaning there is no moment where the symlink points to nothing.
Deployer PHP (deployer.org) is the most popular tool for this pattern in PHP projects. Here is a practical Deployer configuration for WordPress:
<?php
// deploy.php
namespace Deployer;
require 'recipe/common.php';
set('application', 'wordpress-site');
set('repository', '[email protected]:yourorg/wordpress-site.git');
set('keep_releases', 5);
// Shared files and directories persist across releases
set('shared_files', [
'wp-config.php',
'.htaccess',
]);
set('shared_dirs', [
'wp-content/uploads',
'wp-content/cache',
'wp-content/wflogs',
]);
// Writable directories
set('writable_dirs', [
'wp-content/uploads',
'wp-content/cache',
]);
host('production')
->set('hostname', 'prod-server.example.com')
->set('remote_user', 'deploy')
->set('deploy_path', '/var/www/wordpress-site');
host('staging')
->set('hostname', 'staging.example.com')
->set('remote_user', 'deploy')
->set('deploy_path', '/var/www/staging-wordpress');
// Build assets before uploading
task('build:assets', function () {
runLocally('cd wp-content/themes/your-theme && npm ci && npm run build');
});
// Upload compiled assets
task('upload:assets', function () {
upload('wp-content/themes/your-theme/dist/', '{{release_path}}/wp-content/themes/your-theme/dist/');
});
// Flush object cache after deploy
task('cache:flush', function () {
run('cd {{release_path}} && wp cache flush --allow-root');
});
// Flush OPcache
task('opcache:reset', function () {
run('curl -s https://{{hostname}}/opcache-reset.php?key={{opcache_key}} || true');
});
// Full deployment pipeline
after('deploy:update_code', 'build:assets');
after('deploy:update_code', 'upload:assets');
after('deploy:symlink', 'cache:flush');
after('deploy:symlink', 'opcache:reset');
after('deploy:failed', 'deploy:unlock');
The directory structure on your server looks like this after several deployments:
/var/www/wordpress-site/
├── current -> /var/www/wordpress-site/releases/42
├── releases/
│ ├── 38/
│ ├── 39/
│ ├── 40/
│ ├── 41/
│ └── 42/ <-- active release
└── shared/
├── wp-config.php
├── .htaccess
└── wp-content/
├── uploads/
├── cache/
└── wflogs/
When Deployer runs, it clones your repository into releases/43/, creates symlinks from that release into the shared/ directory for uploads and config files, runs any build tasks, and then atomically switches the current symlink to point at releases/43/.
Your Nginx configuration points its root directive at the current symlink:
server {
listen 80;
server_name example.com;
root /var/www/wordpress-site/current;
index index.php;
location / {
try_files $uri $uri/ /index.php?$args;
}
location ~ \.php$ {
fastcgi_pass unix:/run/php/php8.1-fpm.sock;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
# Critical: resolve symlinks for OPcache
fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;
}
}
Pay close attention to the $realpath_root variable in the PHP location block. This resolves the symlink to the actual filesystem path, which prevents OPcache from serving stale bytecode after a release switch. Without this, PHP-FPM may continue serving cached opcodes from the previous release directory even after the symlink has changed.
An alternative approach to the OPcache problem is to reset OPcache after each deployment. You can do this with a small PHP file that calls opcache_reset(), triggered via a curl request in your deployment pipeline. Some teams do both: use $realpath_root and reset OPcache.
OPcache Reset Script
<?php
// opcache-reset.php - place in web root, protect with a secret key
$secret = getenv('OPCACHE_RESET_KEY');
if (!isset($_GET['key']) || $_GET['key'] !== $secret) {
http_response_code(403);
exit('Forbidden');
}
if (function_exists('opcache_reset')) {
opcache_reset();
echo 'OPcache cleared';
} else {
echo 'OPcache not available';
}
Blue-Green Deployment: Dual DocumentRoot with Nginx Traffic Switching
Blue-green deployment takes the atomic switching concept further by maintaining two complete, independent environments. One environment (let’s call it “blue”) serves live traffic. The other (“green”) sits idle or serves as a staging target. When you deploy, you deploy to the idle environment, verify it works, and then switch traffic from blue to green.
The key difference from simple symlink deployment is that blue-green gives you a full, running environment to test against before any traffic hits it. With symlink-based deployment, the new release goes live the instant the symlink changes. With blue-green, you can hit the green environment with test requests, run smoke tests, check database connectivity, and verify plugin compatibility before switching a single real user over.
Here is how to set this up with Nginx using two upstream blocks:
# /etc/nginx/conf.d/upstream.conf
upstream blue_backend {
server 127.0.0.1:8081;
}
upstream green_backend {
server 127.0.0.1:8082;
}
Each backend runs its own PHP-FPM pool with a separate document root:
# /etc/php/8.1/fpm/pool.d/blue.conf
[blue]
user = www-data
group = www-data
listen = 127.0.0.1:8081
pm = dynamic
pm.max_children = 20
pm.start_servers = 5
pm.min_spare_servers = 3
pm.max_spare_servers = 10
php_admin_value[open_basedir] = /var/www/blue:/tmp
env[WP_ENV] = blue
# /etc/php/8.1/fpm/pool.d/green.conf
[green]
user = www-data
group = www-data
listen = 127.0.0.1:8082
pm = dynamic
pm.max_children = 20
pm.start_servers = 5
pm.min_spare_servers = 3
pm.max_spare_servers = 10
php_admin_value[open_basedir] = /var/www/green:/tmp
env[WP_ENV] = green
The Nginx server block uses a variable to determine which upstream receives traffic:
# /etc/nginx/sites-available/wordpress.conf
map $host $active_backend {
default blue_backend;
}
server {
listen 443 ssl http2;
server_name example.com;
ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
# Dynamic root based on active environment
set $doc_root /var/www/blue;
if ($active_backend = green_backend) {
set $doc_root /var/www/green;
}
root $doc_root;
location / {
try_files $uri $uri/ /index.php?$args;
}
location ~ \.php$ {
fastcgi_pass $active_backend;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
# Static assets with long cache
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff2)$ {
expires 30d;
add_header Cache-Control "public, immutable";
}
}
Switching traffic is done by updating the map directive and reloading Nginx:
#!/bin/bash
# switch-environment.sh
CURRENT=$(grep -oP 'default \K\w+_backend' /etc/nginx/conf.d/upstream-map.conf)
if [ "$CURRENT" = "blue_backend" ]; then
NEW="green_backend"
else
NEW="blue_backend"
fi
sed -i "s/default ${CURRENT}/default ${NEW}/" /etc/nginx/conf.d/upstream-map.conf
# Test config before reloading
nginx -t && systemctl reload nginx
echo "Switched from $CURRENT to $NEW"
The nginx -t command validates the configuration before reloading. If the configuration is invalid, the reload never happens, and the previous environment continues serving traffic. The reload itself is graceful: Nginx finishes processing in-flight requests with old worker processes while new workers pick up the updated configuration.
Smoke Testing the Idle Environment
Before switching, you should run automated tests against the idle environment. Since both environments are running simultaneously, you can hit the inactive one directly:
#!/bin/bash
# smoke-test.sh
IDLE_PORT=$1 # 8081 or 8082
# Basic health check
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:${IDLE_PORT}/)
if [ "$HTTP_CODE" != "200" ]; then
echo "FAIL: Homepage returned $HTTP_CODE"
exit 1
fi
# Check that WordPress loaded correctly
BODY=$(curl -s http://127.0.0.1:${IDLE_PORT}/)
if ! echo "$BODY" | grep -q "wp-content"; then
echo "FAIL: Response does not look like WordPress"
exit 1
fi
# Check REST API
API_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:${IDLE_PORT}/wp-json/wp/v2/posts?per_page=1)
if [ "$API_CODE" != "200" ]; then
echo "FAIL: REST API returned $API_CODE"
exit 1
fi
# Check WooCommerce endpoint if applicable
WC_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:${IDLE_PORT}/shop/)
if [ "$WC_CODE" != "200" ]; then
echo "WARN: WooCommerce shop returned $WC_CODE"
fi
echo "All smoke tests passed"
exit 0
Handling Database Migrations During Zero-Downtime Deploys
The database is where zero-downtime deployment gets genuinely difficult for WordPress. Code can be swapped atomically. Databases cannot.
WordPress itself handles schema changes through its dbDelta() function during updates. But custom plugins, themes, and WooCommerce extensions often need their own migrations. The fundamental challenge: how do you run a migration that changes the database schema while the old code is still handling requests?
The answer is backward-compatible migrations. Every database change must be deployable in a way that does not break the currently running version of the code.
The Expand-Contract Pattern
This pattern splits destructive migrations into two or three phases:
Phase 1: Expand. Add new columns, tables, or indexes. Do not remove or rename anything. The old code ignores the new columns, so it continues working. Deploy the new code that uses the new columns.
Phase 2: Migrate Data. If you need to move data from old columns to new ones, do it now. Both old and new code can run against this schema.
Phase 3: Contract. In a future release, after all servers run the new code, remove the old columns. This is the only step that could break old code, but no old code is running anymore.
Here is a practical example. Suppose you need to rename a column in a custom table from user_email to subscriber_email:
// Release 1: Expand - add the new column, keep the old one
function wpkite_migration_001_expand() {
global $wpdb;
$table = $wpdb->prefix . 'wpkite_subscribers';
// Add new column
$wpdb->query("ALTER TABLE {$table} ADD COLUMN subscriber_email VARCHAR(255) AFTER user_email");
// Copy data
$wpdb->query("UPDATE {$table} SET subscriber_email = user_email WHERE subscriber_email IS NULL");
// Add index on new column
$wpdb->query("CREATE INDEX idx_subscriber_email ON {$table} (subscriber_email)");
}
// Update code to write to BOTH columns
function wpkite_add_subscriber($email, $source) {
global $wpdb;
$wpdb->insert(
$wpdb->prefix . 'wpkite_subscribers',
[
'user_email' => $email, // old column (for backward compat)
'subscriber_email' => $email, // new column
'source' => $source,
'status' => 'active',
]
);
}
// Release 2: Contract - remove old column (deployed weeks later)
function wpkite_migration_002_contract() {
global $wpdb;
$table = $wpdb->prefix . 'wpkite_subscribers';
$wpdb->query("ALTER TABLE {$table} DROP COLUMN user_email");
$wpdb->query("DROP INDEX idx_user_email ON {$table}");
}
Running Migrations Safely
Never run migrations as part of the symlink switch. Run them before the code deployment, during the build phase. If a migration fails, the deployment stops, and the old code continues serving traffic unchanged.
// In Deployer: run migrations before switching symlink
task('database:migrate', function () {
run('cd {{release_path}} && wp eval-file scripts/run-migrations.php');
});
before('deploy:symlink', 'database:migrate');
For large tables (millions of rows), ALTER TABLE operations can lock the table for minutes. Use tools like pt-online-schema-change or gh-ost to perform online schema changes without locking:
# Using pt-online-schema-change for lock-free ALTER TABLE
pt-online-schema-change \
--alter "ADD COLUMN subscriber_email VARCHAR(255) AFTER user_email" \
--execute \
--max-load Threads_running=25 \
--critical-load Threads_running=50 \
D=wordpress,t=wp_wpkite_subscribers,u=admin,p=secret
This tool creates a shadow copy of the table, applies the change to the copy, migrates rows in small batches using triggers, and then performs an atomic rename. The table remains fully available for reads and writes throughout the process.
Shared Persistent Storage: wp-content/uploads Across Releases
WordPress stores uploaded media in wp-content/uploads/. This directory must persist across deployments. If each release gets its own uploads directory, users would lose access to all previously uploaded images and documents.
Deployer handles this through its shared_dirs configuration, which creates a symlink from each release’s wp-content/uploads to a single shared directory. But this introduces its own challenges in multi-server environments.
Single Server: Symlink Approach
On a single server, the shared directory approach is simple and reliable:
# Directory structure
/var/www/site/shared/wp-content/uploads/ # Actual files live here
/var/www/site/releases/42/wp-content/uploads -> /var/www/site/shared/wp-content/uploads/
Multi-Server: Shared Filesystem
When running WordPress across multiple servers (for blue-green or rolling updates), all servers need access to the same uploads. The most common solutions:
NFS Mount:
# /etc/fstab on each web server
nfs-server:/exports/wp-uploads /var/www/site/shared/wp-content/uploads nfs defaults,noatime,_netdev 0 0
NFS works but adds a network dependency. If the NFS server goes down, your entire site breaks. Use NFSv4 with proper caching to reduce latency:
# Mount with caching options
nfs-server:/exports/wp-uploads /var/www/site/shared/wp-content/uploads nfs4 rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,_netdev 0 0
Object Storage with Plugin:
A more scalable solution is offloading media to S3 or a compatible object storage service. Plugins like WP Offload Media or custom implementations using the WordPress media filters can redirect uploads to S3:
// wp-config.php
define('AS3CF_SETTINGS', serialize([
'provider' => 'aws',
'access-key-id' => getenv('AWS_ACCESS_KEY_ID'),
'secret-access-key' => getenv('AWS_SECRET_ACCESS_KEY'),
'bucket' => 'my-wp-uploads',
'region' => 'us-east-1',
'copy-to-s3' => true,
'serve-from-s3' => true,
'remove-local-file' => true,
]));
With S3-backed media, the uploads directory becomes stateless. Each server can operate independently without shared filesystem concerns. This is the recommended approach for any multi-server WordPress deployment.
GlusterFS for Self-Hosted Clusters:
For teams that prefer self-hosted solutions without cloud vendor lock-in, GlusterFS provides a replicated filesystem:
# On storage nodes
gluster volume create wp-uploads replica 2 \
storage1:/data/brick1/wp-uploads \
storage2:/data/brick1/wp-uploads
gluster volume start wp-uploads
# On web servers
mount -t glusterfs storage1:/wp-uploads /var/www/site/shared/wp-content/uploads
Canary Releases with Load Balancer Percentage Routing
Canary deployment sends a small percentage of traffic to the new version while the majority continues hitting the old version. If the new version behaves well (low error rate, acceptable response times), you gradually increase the percentage until all traffic goes to the new version.
This strategy is less common in traditional WordPress hosting but becomes practical when you run WordPress behind a load balancer. HAProxy is particularly well-suited for canary routing because of its flexible backend weighting system.
HAProxy Configuration for Canary Routing
# /etc/haproxy/haproxy.cfg
global
log /dev/log local0
maxconn 4096
stats socket /var/run/haproxy.sock mode 600 level admin
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5s
timeout client 30s
timeout server 30s
retries 3
frontend wordpress_front
bind *:443 ssl crt /etc/ssl/certs/example.com.pem
# Sticky sessions for WooCommerce cart persistence
stick-table type string len 64 size 100k expire 30m
stick on req.cook(wordpress_logged_in) table wordpress_front
default_backend wp_stable
backend wp_stable
balance roundrobin
option httpchk GET /wp-login.php
http-check expect status 200
server wp-stable-1 10.0.1.10:80 check weight 100
server wp-stable-2 10.0.1.11:80 check weight 100
backend wp_canary
balance roundrobin
option httpchk GET /wp-login.php
http-check expect status 200
server wp-canary-1 10.0.1.20:80 check weight 100
To route a percentage of traffic to the canary, use HAProxy ACLs with the rand function:
frontend wordpress_front
bind *:443 ssl crt /etc/ssl/certs/example.com.pem
# Route 5% of traffic to canary
acl is_canary rand(100) lt 5
# Don't canary logged-in users or admin pages
acl is_admin path_beg /wp-admin
acl is_logged_in req.cook(wordpress_logged_in) -m found
use_backend wp_canary if is_canary !is_admin !is_logged_in
default_backend wp_stable
This sends 5% of anonymous, non-admin traffic to the canary backend. Logged-in users and admin requests always go to the stable backend, which prevents inconsistencies in the WordPress admin experience during deployments.
Gradually Increasing Canary Traffic
You can adjust the canary percentage through the HAProxy runtime API without restarting the proxy:
#!/bin/bash
# canary-promote.sh - Gradually increase canary traffic
SOCKET="/var/run/haproxy.sock"
set_canary_weight() {
local pct=$1
echo "Setting canary to ${pct}%"
# Update the ACL threshold via runtime API
echo "set acl #0 canary_pct ${pct}" | socat stdio $SOCKET
}
# Progressive rollout
set_canary_weight 5
echo "Waiting 10 minutes at 5%..."
sleep 600
# Check error rates before proceeding
ERROR_RATE=$(curl -s http://localhost:8404/stats\;csv | grep wp_canary | awk -F, '{print $15}')
if [ "$ERROR_RATE" -gt 1 ]; then
echo "Error rate too high ($ERROR_RATE%), rolling back"
set_canary_weight 0
exit 1
fi
set_canary_weight 25
echo "Waiting 10 minutes at 25%..."
sleep 600
set_canary_weight 50
echo "Waiting 10 minutes at 50%..."
sleep 600
set_canary_weight 100
echo "Canary promoted to 100%"
Monitoring the Canary
The canary is useless without monitoring. You need to compare error rates, response times, and application-level metrics between the stable and canary backends. A basic approach uses HAProxy stats combined with server-side logging:
# Add response time tracking to HAProxy
frontend wordpress_front
# Log backend response time
log-format "%ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"
# Tag canary requests in logs
http-request set-header X-Canary true if is_canary
On the WordPress side, you can log the canary header for application-level monitoring:
// In your theme's functions.php or a must-use plugin
add_action('shutdown', function() {
if (isset($_SERVER['HTTP_X_CANARY'])) {
$response_time = microtime(true) - $_SERVER['REQUEST_TIME_FLOAT'];
error_log(sprintf(
'CANARY request=%s time=%.4f status=%d memory=%d',
$_SERVER['REQUEST_URI'],
$response_time,
http_response_code(),
memory_get_peak_usage(true)
));
}
});
Rolling Updates in Containerized WordPress (Kubernetes)
If you run WordPress in containers (Docker/Kubernetes), rolling updates are the standard zero-downtime strategy. Kubernetes replaces pods one at a time, waiting for each new pod to pass health checks before terminating the old one.
WordPress Kubernetes Deployment
# wordpress-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress
labels:
app: wordpress
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Add 1 new pod at a time
maxUnavailable: 0 # Never have fewer than desired replicas
selector:
matchLabels:
app: wordpress
template:
metadata:
labels:
app: wordpress
spec:
containers:
- name: wordpress
image: your-registry.com/wordpress:v2.3.1
ports:
- containerPort: 80
env:
- name: WORDPRESS_DB_HOST
valueFrom:
secretKeyRef:
name: wordpress-db
key: host
- name: WORDPRESS_DB_NAME
valueFrom:
secretKeyRef:
name: wordpress-db
key: name
- name: WORDPRESS_DB_USER
valueFrom:
secretKeyRef:
name: wordpress-db
key: user
- name: WORDPRESS_DB_PASSWORD
valueFrom:
secretKeyRef:
name: wordpress-db
key: password
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
readinessProbe:
httpGet:
path: /wp-login.php
port: 80
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /wp-login.php
port: 80
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 5
volumeMounts:
- name: uploads
mountPath: /var/www/html/wp-content/uploads
volumes:
- name: uploads
persistentVolumeClaim:
claimName: wordpress-uploads-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: wordpress-uploads-pvc
spec:
accessModes:
- ReadWriteMany # Required for multiple pods
resources:
requests:
storage: 50Gi
storageClassName: efs-sc # Amazon EFS or similar RWX storage
The WordPress Docker Image
Your Docker image should contain the full WordPress installation plus your custom theme and plugins, baked in at build time:
# Dockerfile
FROM wordpress:6.4-php8.2-fpm
# Install additional PHP extensions
RUN docker-php-ext-install pdo_mysql opcache
# OPcache settings for production
RUN { \
echo 'opcache.memory_consumption=256'; \
echo 'opcache.interned_strings_buffer=16'; \
echo 'opcache.max_accelerated_files=20000'; \
echo 'opcache.revalidate_freq=0'; \
echo 'opcache.validate_timestamps=0'; \
echo 'opcache.save_comments=1'; \
echo 'opcache.fast_shutdown=1'; \
} > /usr/local/etc/php/conf.d/opcache-recommended.ini
# Copy custom theme
COPY wp-content/themes/your-theme/ /var/www/html/wp-content/themes/your-theme/
# Copy must-use plugins
COPY wp-content/mu-plugins/ /var/www/html/wp-content/mu-plugins/
# Copy plugins (version-locked via composer)
COPY vendor/ /var/www/html/vendor/
COPY wp-content/plugins/ /var/www/html/wp-content/plugins/
# Set ownership
RUN chown -R www-data:www-data /var/www/html
Setting opcache.validate_timestamps=0 is safe in containers because the code never changes inside a running container. When you deploy a new image, Kubernetes creates new pods with the new code, and OPcache starts fresh.
Kubernetes Service and Ingress
# wordpress-service.yaml
apiVersion: v1
kind: Service
metadata:
name: wordpress
spec:
selector:
app: wordpress
ports:
- port: 80
targetPort: 80
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: wordpress
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "64m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
spec:
tls:
- hosts:
- example.com
secretName: tls-example-com
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: wordpress
port:
number: 80
During a rolling update, Kubernetes performs these steps for each pod:
1. Creates a new pod with the updated image.
2. Waits for the readiness probe to pass (confirming WordPress is responding on the new pod).
3. Adds the new pod to the Service endpoints (it starts receiving traffic).
4. Sends SIGTERM to an old pod.
5. Removes the old pod from Service endpoints (it stops receiving new traffic).
6. Waits for the termination grace period (default 30 seconds) to allow in-flight requests to complete.
7. Kills the old pod.
With maxSurge: 1 and maxUnavailable: 0, you always have at least 4 healthy pods serving traffic. The update proceeds one pod at a time, so for 4 replicas the full rollout takes several minutes.
Handling PHP Sessions in Kubernetes
WordPress plugins that use PHP sessions will break in a multi-pod setup because the default file-based session handler stores session data locally to each pod. Use Redis for session storage:
# In wp-config.php or a mu-plugin
// Redirect PHP sessions to Redis
ini_set('session.save_handler', 'redis');
ini_set('session.save_path', 'tcp://redis-service:6379?auth=' . getenv('REDIS_PASSWORD'));
Rollback Procedures: Instant Symlink Rollback vs. Database Challenges
When a deployment goes wrong, you need to revert quickly. The rollback strategy depends on what you deployed and whether database changes were involved.
Code-Only Rollback: The Easy Case
If your deployment only changed PHP files, templates, or assets (no database schema changes), rollback is instant with symlink-based deployments:
# Deployer built-in rollback
dep rollback production
This switches the current symlink back to the previous release directory. The operation takes less than a second. For blue-green deployments, you run the traffic switch script to point back to the previous environment.
In Kubernetes, rollback is equally straightforward:
# Roll back to previous revision
kubectl rollout undo deployment/wordpress
# Or roll back to a specific revision
kubectl rollout undo deployment/wordpress --to-revision=5
# Check rollout history
kubectl rollout history deployment/wordpress
Database Rollback: The Hard Case
If your deployment included database migrations, rolling back the code does not undo the database changes. This is why the expand-contract pattern discussed earlier is so important. If you followed it, the old code is compatible with the new schema, and a code rollback works without touching the database.
But what if a migration went wrong and corrupted data? You need a pre-migration database snapshot:
#!/bin/bash
# pre-deploy-snapshot.sh
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
DB_NAME="wordpress_production"
BACKUP_DIR="/var/backups/deploy-snapshots"
# Create snapshot before migration
mysqldump --single-transaction --quick \
--host=db-server \
--user=backup_user \
--password="$DB_BACKUP_PASSWORD" \
"$DB_NAME" | gzip > "${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.sql.gz"
# Record snapshot in deployment metadata
echo "$TIMESTAMP" > /var/www/site/current/.deploy-snapshot
echo "Snapshot created: ${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.sql.gz"
Restoring from a snapshot means data loss: any orders, comments, or user registrations that happened between the snapshot and the rollback will be gone. This is the nuclear option. For high-traffic WooCommerce sites, you might lose dozens of orders during even a brief window.
A safer approach for non-destructive migrations (the expand phase of expand-contract) is to simply leave the new columns in place. They consume a negligible amount of space, and the old code ignores them.
Automated Rollback Triggers
You can automate rollback based on health checks:
#!/bin/bash
# post-deploy-monitor.sh - Run after deployment, auto-rollback on failure
SITE_URL="https://example.com"
MAX_ERRORS=3
CHECK_INTERVAL=10
CHECK_DURATION=120 # Monitor for 2 minutes
errors=0
checks=0
start_time=$(date +%s)
while true; do
current_time=$(date +%s)
elapsed=$((current_time - start_time))
if [ $elapsed -ge $CHECK_DURATION ]; then
echo "Monitoring complete. Deployment looks healthy."
exit 0
fi
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$SITE_URL")
checks=$((checks + 1))
if [ "$HTTP_CODE" != "200" ]; then
errors=$((errors + 1))
echo "Check $checks: FAIL (HTTP $HTTP_CODE) - Error count: $errors"
if [ $errors -ge $MAX_ERRORS ]; then
echo "ERROR THRESHOLD REACHED. Initiating rollback..."
dep rollback production
# Notify team
curl -X POST "$SLACK_WEBHOOK" \
-H 'Content-type: application/json' \
-d '{"text":"AUTOMATED ROLLBACK triggered for production. '"$errors"' consecutive failures detected."}'
exit 1
fi
else
errors=0 # Reset consecutive error count
echo "Check $checks: OK (HTTP $HTTP_CODE)"
fi
sleep $CHECK_INTERVAL
done
WooCommerce Considerations: Order Tables, Sessions, and Cart Persistence
WooCommerce adds significant complexity to zero-downtime deployments because of its stateful nature. Customers have active sessions, carts with items, and potentially in-progress checkout flows.
Session Persistence During Deployment
WooCommerce stores session data in the wp_woocommerce_sessions table by default. This means sessions survive code deployments as long as the database remains the same, which it does in all the strategies we have discussed.
However, if you are using file-based PHP sessions (some plugins do this), sessions will be lost when traffic switches to a new server or container. Always verify your session storage mechanism before implementing zero-downtime deploys.
For load-balanced environments, you must ensure session affinity (sticky sessions) or use centralized session storage:
# HAProxy sticky session configuration for WooCommerce
backend wp_servers
balance roundrobin
# Stick on the WooCommerce session cookie
cookie SERVERID insert indirect nocache
server wp1 10.0.1.10:80 check cookie wp1
server wp2 10.0.1.11:80 check cookie wp2
server wp3 10.0.1.12:80 check cookie wp3
WooCommerce High-Performance Order Storage (HPOS)
WooCommerce 8.2+ introduced High-Performance Order Storage, which moves orders from WordPress post meta to dedicated custom tables. This affects deployment migrations because the order schema is different from standard WordPress tables.
When deploying WooCommerce updates that involve HPOS migrations, follow these additional precautions:
// Check HPOS status before running migrations
function check_hpos_status() {
if (class_exists('Automattic\WooCommerce\Utilities\OrderUtil')) {
$hpos_enabled = \Automattic\WooCommerce\Utilities\OrderUtil::custom_orders_table_usage_is_enabled();
$sync_enabled = get_option('woocommerce_custom_orders_table_data_sync_enabled');
error_log(sprintf(
'HPOS Status: enabled=%s sync=%s',
$hpos_enabled ? 'yes' : 'no',
$sync_enabled
));
return [
'hpos_enabled' => $hpos_enabled,
'sync_enabled' => $sync_enabled === 'yes',
];
}
return ['hpos_enabled' => false, 'sync_enabled' => false];
}
Cart and Checkout During Deployment
A customer in the middle of checkout when a deployment happens should not have their cart emptied or their payment flow interrupted. Since WooCommerce cart data lives in the session table (database) and payment processing happens through external gateways (Stripe, PayPal), the deployment itself does not interrupt these flows.
The risk is in code changes that alter the checkout flow. If your new release changes form field names, modifies validation rules, or restructures the checkout template, a customer who loaded the old checkout page might submit a form that the new code does not understand.
Mitigate this by ensuring backward compatibility in form handlers for at least one release cycle:
// Accept both old and new field names during transition
function handle_checkout_submission() {
// New field name
$phone = isset($_POST['billing_phone_number']) ? sanitize_text_field($_POST['billing_phone_number']) : '';
// Fall back to old field name
if (empty($phone) && isset($_POST['billing_phone'])) {
$phone = sanitize_text_field($_POST['billing_phone']);
}
// Process order...
}
WooCommerce Scheduled Actions
WooCommerce uses Action Scheduler for background tasks like processing pending orders, sending emails, and syncing inventory. During a rolling deployment, you may have old and new code running simultaneously, both processing scheduled actions.
To prevent conflicts, designate a single worker for scheduled actions:
# In Kubernetes, run a separate deployment for WP-Cron/Action Scheduler
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress-worker
spec:
replicas: 1 # Only ONE worker to prevent duplicate processing
template:
spec:
containers:
- name: wordpress-cron
image: your-registry.com/wordpress:v2.3.1
command: ["/bin/sh", "-c"]
args:
- |
while true; do
php /var/www/html/wp-cron.php
sleep 60
done
env:
- name: DISABLE_WP_CRON
value: "true"
And disable WP-Cron on the web-serving pods:
// wp-config.php
define('DISABLE_WP_CRON', true);
CI/CD Integration: Triggering Deploys from GitHub Actions
Automating the full pipeline from code push to production deployment eliminates human error and makes deployments routine rather than events.
GitHub Actions Workflow
# .github/workflows/deploy.yml
name: Deploy WordPress
on:
push:
branches: [main]
workflow_dispatch:
inputs:
environment:
description: 'Deployment target'
required: true
default: 'staging'
type: choice
options:
- staging
- production
jobs:
test:
runs-on: ubuntu-latest
services:
mysql:
image: mysql:8.0
env:
MYSQL_ROOT_PASSWORD: test
MYSQL_DATABASE: wordpress_test
ports:
- 3306:3306
options: --health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=3
steps:
- uses: actions/checkout@v4
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: '8.2'
extensions: mysqli, pdo_mysql, gd, zip, opcache
tools: composer, wp-cli
- name: Install dependencies
run: composer install --no-dev --optimize-autoloader
- name: Run PHP lint
run: find wp-content/themes/your-theme -name "*.php" -exec php -l {} \;
- name: Run PHPStan
run: vendor/bin/phpstan analyse wp-content/themes/your-theme --level=6
- name: Build assets
run: |
cd wp-content/themes/your-theme
npm ci
npm run build
- name: Run integration tests
env:
WP_TESTS_DB_HOST: 127.0.0.1
WP_TESTS_DB_NAME: wordpress_test
WP_TESTS_DB_USER: root
WP_TESTS_DB_PASS: test
run: vendor/bin/phpunit
deploy-staging:
needs: test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: '8.2'
tools: composer
- name: Install Deployer
run: composer global require deployer/deployer
- name: Setup SSH
uses: webfactory/[email protected]
with:
ssh-private-key: ${{ secrets.DEPLOY_SSH_KEY }}
- name: Add known hosts
run: |
mkdir -p ~/.ssh
ssh-keyscan -H ${{ secrets.STAGING_HOST }} >> ~/.ssh/known_hosts
- name: Build assets
run: |
cd wp-content/themes/your-theme
npm ci
npm run build
- name: Deploy to staging
run: dep deploy staging -v
- name: Smoke test staging
run: |
sleep 5
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" https://staging.example.com)
if [ "$HTTP_CODE" != "200" ]; then
echo "Staging smoke test failed with HTTP $HTTP_CODE"
dep rollback staging
exit 1
fi
deploy-production:
needs: deploy-staging
if: github.event.inputs.environment == 'production' || (github.ref == 'refs/heads/main' && github.event_name == 'push')
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: '8.2'
tools: composer
- name: Install Deployer
run: composer global require deployer/deployer
- name: Setup SSH
uses: webfactory/[email protected]
with:
ssh-private-key: ${{ secrets.DEPLOY_SSH_KEY }}
- name: Add known hosts
run: |
mkdir -p ~/.ssh
ssh-keyscan -H ${{ secrets.PRODUCTION_HOST }} >> ~/.ssh/known_hosts
- name: Build assets
run: |
cd wp-content/themes/your-theme
npm ci
npm run build
- name: Create database snapshot
run: |
ssh deploy@${{ secrets.PRODUCTION_HOST }} \
"mysqldump --single-transaction --quick wordpress_prod | gzip > /var/backups/pre-deploy-$(date +%Y%m%d_%H%M%S).sql.gz"
- name: Deploy to production
run: dep deploy production -v
- name: Post-deploy monitoring
run: |
for i in $(seq 1 12); do
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 https://example.com)
if [ "$HTTP_CODE" != "200" ]; then
echo "Health check failed: HTTP $HTTP_CODE"
FAILURES=$((FAILURES + 1))
if [ "$FAILURES" -ge 3 ]; then
echo "Rolling back..."
dep rollback production
exit 1
fi
else
FAILURES=0
fi
sleep 10
done
echo "Production deployment verified"
- name: Notify Slack
if: always()
uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
fields: repo,message,commit,author,action
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
Container-Based CI/CD with GitHub Actions
For Kubernetes-based deployments, the workflow builds a Docker image instead of using Deployer:
# .github/workflows/deploy-k8s.yml
name: Deploy WordPress (Kubernetes)
on:
push:
branches: [main]
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build theme assets
run: |
cd wp-content/themes/your-theme
npm ci
npm run build
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to container registry
uses: docker/login-action@v3
with:
registry: your-registry.com
username: ${{ secrets.REGISTRY_USER }}
password: ${{ secrets.REGISTRY_PASSWORD }}
- name: Build and push image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
your-registry.com/wordpress:${{ github.sha }}
your-registry.com/wordpress:latest
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Deploy to Kubernetes
uses: azure/k8s-deploy@v4
with:
manifests: |
k8s/wordpress-deployment.yaml
images: |
your-registry.com/wordpress:${{ github.sha }}
strategy: canary
percentage: 20
- name: Monitor canary
run: |
echo "Waiting for canary to stabilize..."
sleep 120
kubectl get pods -l app=wordpress
# Check for crash loops
RESTARTS=$(kubectl get pods -l app=wordpress -o jsonpath='{.items[*].status.containerStatuses[0].restartCount}')
for count in $RESTARTS; do
if [ "$count" -gt 2 ]; then
echo "Pod restart detected, rejecting canary"
kubectl rollout undo deployment/wordpress
exit 1
fi
done
- name: Promote canary
run: |
kubectl set image deployment/wordpress wordpress=your-registry.com/wordpress:${{ github.sha }}
kubectl rollout status deployment/wordpress --timeout=300s
Real Nginx and HAProxy Configuration Examples
Let’s bring together the complete, production-ready configurations that tie all these strategies together.
Full Nginx Configuration for Blue-Green with SSL
# /etc/nginx/nginx.conf
user www-data;
worker_processes auto;
pid /run/nginx.pid;
worker_rlimit_nofile 65535;
events {
worker_connections 4096;
multi_accept on;
use epoll;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
client_max_body_size 64m;
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Logging
log_format detailed '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'rt=$request_time uct=$upstream_connect_time '
'uht=$upstream_header_time urt=$upstream_response_time '
'backend=$upstream_addr';
access_log /var/log/nginx/access.log detailed;
error_log /var/log/nginx/error.log;
# Gzip
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml text/javascript image/svg+xml;
# Upstream definitions
upstream blue {
server unix:/run/php/php-fpm-blue.sock;
}
upstream green {
server unix:/run/php/php-fpm-green.sock;
}
# Active environment - change this to switch
map $uri $active_env {
default blue;
}
# Rate limiting
limit_req_zone $binary_remote_addr zone=wp_login:10m rate=3r/s;
limit_req_zone $binary_remote_addr zone=wp_xmlrpc:10m rate=1r/s;
# FastCGI cache
fastcgi_cache_path /var/cache/nginx/wordpress
levels=1:2
keys_zone=wordpress:100m
max_size=1g
inactive=60m
use_temp_path=off;
fastcgi_cache_key "$scheme$request_method$host$request_uri";
server {
listen 80;
server_name example.com www.example.com;
return 301 https://example.com$request_uri;
}
server {
listen 443 ssl http2;
server_name example.com;
ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 1d;
ssl_session_tickets off;
# HSTS
add_header Strict-Transport-Security "max-age=63072000" always;
# Dynamic document root based on active environment
set $doc_root_blue /var/www/blue/current;
set $doc_root_green /var/www/green/current;
# Default to blue
set $active_root $doc_root_blue;
set $active_upstream blue;
# Switch based on map
if ($active_env = green) {
set $active_root $doc_root_green;
set $active_upstream green;
}
root $active_root;
index index.php;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
# Block xmlrpc
location = /xmlrpc.php {
limit_req zone=wp_xmlrpc burst=2 nodelay;
include fastcgi_params;
fastcgi_pass $active_upstream;
fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
}
# Rate limit wp-login
location = /wp-login.php {
limit_req zone=wp_login burst=5 nodelay;
include fastcgi_params;
fastcgi_pass $active_upstream;
fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;
}
# WordPress admin - no caching
location /wp-admin/ {
try_files $uri $uri/ /index.php?$args;
location ~ \.php$ {
include fastcgi_params;
fastcgi_pass $active_upstream;
fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;
fastcgi_no_cache 1;
fastcgi_cache_bypass 1;
}
}
# Static assets
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot|webp|avif)$ {
expires 30d;
add_header Cache-Control "public, immutable";
log_not_found off;
access_log off;
}
# Main location
location / {
try_files $uri $uri/ /index.php?$args;
}
# PHP handling with FastCGI cache
location ~ \.php$ {
try_files $uri =404;
include fastcgi_params;
fastcgi_pass $active_upstream;
fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $realpath_root;
# FastCGI cache settings
fastcgi_cache wordpress;
fastcgi_cache_valid 200 10m;
fastcgi_cache_valid 404 1m;
# Don't cache logged-in users or POST requests
set $skip_cache 0;
if ($request_method = POST) {
set $skip_cache 1;
}
if ($http_cookie ~* "wordpress_logged_in|comment_author|woocommerce_cart_hash|woocommerce_items_in_cart") {
set $skip_cache 1;
}
if ($request_uri ~* "/wp-admin/|/wp-json/|/xmlrpc.php|wp-.*\.php|/feed/|index\.php|sitemap") {
set $skip_cache 1;
}
fastcgi_cache_bypass $skip_cache;
fastcgi_no_cache $skip_cache;
# Add cache status header for debugging
add_header X-Cache-Status $upstream_cache_status;
}
# Deny access to sensitive files
location ~ /\.(ht|git|env) {
deny all;
}
location ~ /wp-config\.php$ {
deny all;
}
}
}
Full HAProxy Configuration for Canary with Health Checks
# /etc/haproxy/haproxy.cfg
global
log stdout format raw local0
maxconn 10000
tune.ssl.default-dh-param 2048
ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256
ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
stats socket /var/run/haproxy.sock mode 660 level admin
stats timeout 30s
defaults
log global
mode http
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
timeout connect 5s
timeout client 30s
timeout server 60s
timeout http-request 10s
timeout http-keep-alive 15s
timeout queue 30s
retries 3
errorfile 503 /etc/haproxy/errors/503.http
# Stats page for monitoring
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats admin if TRUE
stats auth admin:$STATS_PASSWORD
frontend http_front
bind *:80
redirect scheme https code 301 if !{ ssl_fc }
frontend https_front
bind *:443 ssl crt /etc/ssl/certs/example.com.pem alpn h2,http/1.1
# Security headers
http-response set-header Strict-Transport-Security "max-age=63072000; includeSubDomains"
http-response set-header X-Frame-Options SAMEORIGIN
http-response set-header X-Content-Type-Options nosniff
# ACLs for routing
acl is_admin path_beg /wp-admin /wp-login.php
acl is_api path_beg /wp-json
acl is_cron path_beg /wp-cron.php
acl is_logged_in req.cook(wordpress_logged_in) -m found
acl is_woo_cart req.cook(woocommerce_items_in_cart) -m found
# Canary routing: percentage of anonymous traffic
acl canary_eligible !is_admin !is_logged_in !is_woo_cart !is_cron
acl canary_selected rand(1000) lt 50 # 5% canary
# Tag canary requests
http-request set-header X-Canary true if canary_eligible canary_selected
# Routing rules
use_backend wp_canary if canary_eligible canary_selected
use_backend wp_stable if is_admin or is_logged_in or is_woo_cart
default_backend wp_stable
backend wp_stable
balance leastconn
option httpchk GET /wp-login.php HTTP/1.1\r\nHost:\ example.com
http-check expect status 200
# Sticky sessions for WooCommerce
cookie SERVERID insert indirect nocache
server stable-1 10.0.1.10:80 check inter 5s fall 3 rise 2 cookie s1 weight 100
server stable-2 10.0.1.11:80 check inter 5s fall 3 rise 2 cookie s2 weight 100
server stable-3 10.0.1.12:80 check inter 5s fall 3 rise 2 cookie s3 weight 100
backend wp_canary
balance leastconn
option httpchk GET /wp-login.php HTTP/1.1\r\nHost:\ example.com
http-check expect status 200
# If canary is down, fall back to stable
option allbackups
server canary-1 10.0.2.10:80 check inter 3s fall 2 rise 2 weight 100
server stable-1 10.0.1.10:80 check backup
The option allbackups directive in the canary backend is a safety net. If the canary server fails health checks, traffic automatically falls back to a stable server instead of returning errors. This means a bad canary deployment self-heals from the user’s perspective.
Environment Switch Script for HAProxy
#!/bin/bash
# haproxy-canary-control.sh
SOCKET="/var/run/haproxy.sock"
case "$1" in
status)
echo "show stat" | socat stdio $SOCKET | grep -E "wp_stable|wp_canary" | \
awk -F, '{printf "%-20s %-15s status=%s weight=%s\n", $1, $2, $18, $19}'
;;
set-canary)
PCT=$2
if [ -z "$PCT" ]; then
echo "Usage: $0 set-canary "
exit 1
fi
# Calculate threshold out of 1000
THRESHOLD=$((PCT * 10))
# Update HAProxy config
sed -i "s/rand(1000) lt [0-9]*/rand(1000) lt ${THRESHOLD}/" /etc/haproxy/haproxy.cfg
# Validate and reload
haproxy -c -f /etc/haproxy/haproxy.cfg
if [ $? -eq 0 ]; then
systemctl reload haproxy
echo "Canary set to ${PCT}%"
else
echo "Config validation failed, no changes applied"
exit 1
fi
;;
disable-canary)
$0 set-canary 0
;;
promote-canary)
$0 set-canary 100
echo "Canary promoted. Update stable servers and reset canary percentage."
;;
drain-canary)
echo "set server wp_canary/canary-1 state drain" | socat stdio $SOCKET
echo "Canary server draining. Existing connections will complete."
;;
*)
echo "Usage: $0 {status|set-canary |disable-canary|promote-canary|drain-canary}"
exit 1
;;
esac
Putting It All Together: Choosing the Right Strategy
Each zero-downtime strategy suits different operational contexts. Here is a practical decision framework.
Symlink-based atomic deployment is the right choice when you run a single server or a small cluster where each server gets the same code deployed sequentially. It is the simplest to implement, requires no additional infrastructure, and works with any hosting provider that gives you SSH access. Start here if you are currently deploying via FTP or manual git pulls.
Blue-green deployment makes sense when you need a pre-production verification step on the exact production infrastructure. If you have burned by deployments that worked in staging but failed in production because of environment differences, blue-green eliminates that variable. The cost is maintaining two parallel environments, which doubles your server resources (though the idle environment can run at reduced capacity).
Canary releases are appropriate when you have enough traffic that statistical significance matters. Sending 5% of traffic to a new release only provides useful signal if 5% of your traffic is more than a handful of requests per minute. For a site serving 100 requests per minute, 5 requests per minute on the canary can reveal issues within a few minutes. For a site serving 10 requests per minute, you might wait an hour before the canary has handled enough requests to draw conclusions.
Rolling updates in Kubernetes are the natural choice if you have already containerized your WordPress application. They provide a smooth upgrade path with built-in health checks and automatic rollback. The overhead is maintaining a Kubernetes cluster, which is not trivial, but the operational benefits extend far beyond deployment.
Combining Strategies
These strategies are not mutually exclusive. A common production setup combines several:
1. Symlink-based atomic deployment within each server (Deployer manages individual server releases).
2. Blue-green at the infrastructure level (two pools of servers, traffic switches between them).
3. Canary at the load balancer level (small percentage of traffic to the new blue/green environment before full switch).
4. Automated rollback based on error rate monitoring in the CI/CD pipeline.
This layered approach provides multiple safety nets. If the canary detects an issue, only 5% of traffic was affected. If the canary passes but a problem emerges after full promotion, the blue-green switch reverses traffic instantly. And if a server within the active pool has issues, the symlink rollback on that specific server takes effect in under a second.
The key to making any of these strategies work for WordPress is addressing the stateful components: the database, the uploads directory, the object cache, and user sessions. Stateless code deployments are the easy part. Managing shared state across releases, environments, and containers is where the real engineering happens.
Start with symlink-based deployment and Deployer. Get comfortable with atomic releases and instant rollback. Then layer on blue-green or canary releases as your traffic and reliability requirements demand. The investment in deployment infrastructure pays for itself the first time a deploy goes wrong and you recover in seconds instead of scrambling for minutes.
Tom Bradley
DevOps engineer focused on WordPress deployment automation. Builds CI/CD pipelines and infrastructure-as-code solutions for WordPress agencies.