WordPress Background Processing: From wp_schedule_single_event to Production Job Queues

WordPress was built as a request-response system. A visitor hits a URL, PHP executes, HTML comes back, and the process dies. That model works fine until you need to process 50,000 CSV rows, sync inventory with a third-party API, send 10,000 emails, or generate thumbnails for a bulk media upload. Suddenly, the 30-second PHP execution limit becomes a wall, and your users stare at a spinning browser tab wondering if anything is happening at all.

Background processing solves this by decoupling the work from the HTTP request. Instead of doing everything inline, you schedule tasks to run later, outside the request lifecycle. WordPress has built-in tools for this, but they were designed for a simpler era. When you push them beyond their intended scope, cracks appear quickly.

This article covers the full spectrum of background processing in WordPress: from the built-in WP-Cron system and its limitations, through the WP Background Processing library and Action Scheduler, all the way to external queue systems like Redis and Amazon SQS. Along the way, we will build retry logic, dead-letter patterns, monitoring systems, and a complete production-ready job queue. Every code example uses real WordPress functions and has been tested in production environments handling millions of jobs per month.

WP-Cron: The Built-In Scheduler and Why It Falls Short

WordPress ships with a pseudo-cron system powered by wp-cron.php. The API is straightforward. You register a hook, attach a callback, and schedule it:

// Schedule a one-time event 5 minutes from now
wp_schedule_single_event( time() + 300, 'wpkite_process_csv_chunk', array( $file_id, $offset ) );

// Schedule a recurring event
if ( ! wp_next_scheduled( 'wpkite_daily_sync' ) ) {
    wp_schedule_event( time(), 'daily', 'wpkite_daily_sync' );
}

// The callback
add_action( 'wpkite_daily_sync', function() {
    $api = new External_API_Client();
    $api->sync_all_products();
});

This works for simple tasks: checking for plugin updates, publishing scheduled posts, clearing transient caches. But WP-Cron has fundamental design constraints that make it unsuitable for serious background processing.

Traffic-Dependent Execution

WP-Cron does not use a real system cron daemon. Instead, it piggybacks on incoming HTTP requests. When someone visits your site, WordPress checks wp-cron.php to see if any scheduled events are overdue, then fires them. If nobody visits your site for six hours, no cron events run for six hours. A site that gets steady traffic might execute cron reliably. A site that goes quiet overnight or during weekends will have gaps. For a job queue, this is unacceptable. Jobs must execute on time regardless of traffic patterns.

The standard mitigation is disabling WP-Cron’s traffic-based trigger and replacing it with a real system cron entry:

// wp-config.php
define( 'DISABLE_WP_CRON', true );

// System crontab (every minute)
* * * * * cd /var/www/html && wp cron event run --due-now --quiet

This solves the timing issue but introduces a new constraint: cron events now execute inside a WP-CLI process with its own memory limits, and you still have no built-in mechanism for retries, failure tracking, or concurrency control.

No Retry Mechanism

If a WP-Cron callback throws a fatal error or simply fails silently, nothing happens. The event fires, the callback runs, and if it breaks, that is the end of the story. There is no retry queue, no failure log, no way to know something went wrong unless you build all of that yourself. For a one-off plugin update check, this is acceptable. For processing a customer’s order or syncing financial data, silent failure is a serious problem.

No Concurrency Control

WP-Cron has no concept of job locking or concurrency limits. If two visitors hit your site at the same instant and both trigger wp-cron.php, you can end up with duplicate executions. WordPress added a lock mechanism using a transient (doing_cron) to prevent simultaneous cron runs, but this is a coarse lock on the entire cron system, not per-job locking. You cannot say “only run one instance of this particular job at a time” without writing your own locking logic.

No Prioritization or Ordering

All cron events are equal. There is no way to say “process payment webhooks before thumbnail generation” or “email sending has higher priority than log cleanup.” Events run in the order they come due, with no mechanism for priority lanes or ordered execution.

Limited Observability

To see what is scheduled, you call _get_cron_array(), which returns a nested array of timestamps, hooks, and arguments. There is no built-in dashboard, no way to see execution history, no duration tracking, and no failure counts. The WP-Crontrol plugin adds a UI layer, but the underlying system remains opaque.

These limitations do not make WP-Cron useless. It is perfectly fine for its intended purpose: lightweight, periodic housekeeping tasks. But when you need to process thousands of items reliably, you need something purpose-built.

WP Background Processing: The Batching Library

The WP Background Processing library by Delicious Brains (the team behind WP Migrate DB) provides a framework for processing large batches of data in the background. It has been used in production by plugins like WP Offload Media and WP Migrate DB Pro, processing millions of items across thousands of sites.

The library consists of two abstract classes: WP_Async_Request for one-off async tasks, and WP_Background_Process for batch processing. You extend these classes and implement your processing logic.

Architecture: How It Works Under the Hood

When you push items to a background process, the library stores them in the wp_options table as a serialized batch. It then fires a non-blocking HTTP request back to the site (admin-post.php or admin-ajax.php) to begin processing. Each iteration of the processor picks up one batch, processes items one at a time, and checks memory usage and time elapsed after each item. When it approaches the memory limit or the time limit, it saves remaining items back to the database and dispatches another HTTP request to continue.

This self-calling loop continues until all items are processed. If the HTTP request fails (server restart, timeout), a WP-Cron event acts as a safety net, re-dispatching the processor every five minutes.

class CSV_Import_Process extends WP_Background_Process {

    protected $action = 'csv_import';

    /**
     * Process a single item from the queue.
     *
     * @param array $item {
     *     @type int    $row_number  CSV row index.
     *     @type array  $data        Parsed row data.
     *     @type string $file_id     Source file identifier.
     * }
     * @return false|array False to remove from queue, item to retry.
     */
    protected function task( $item ) {
        $row    = $item['data'];
        $row_id = $item['row_number'];

        try {
            $post_id = wp_insert_post( array(
                'post_title'   => sanitize_text_field( $row['title'] ),
                'post_content' => wp_kses_post( $row['content'] ),
                'post_status'  => 'draft',
                'post_type'    => 'product',
                'meta_input'   => array(
                    '_csv_row'     => $row_id,
                    '_csv_file_id' => $item['file_id'],
                    '_sku'         => sanitize_text_field( $row['sku'] ),
                    '_price'       => floatval( $row['price'] ),
                ),
            ) );

            if ( is_wp_error( $post_id ) ) {
                $this->log_failure( $row_id, $post_id->get_error_message() );
            }
        } catch ( Exception $e ) {
            $this->log_failure( $row_id, $e->getMessage() );
        }

        // Return false to remove this item from the queue
        return false;
    }

    protected function complete() {
        parent::complete();
        // All items processed
        do_action( 'wpkite_csv_import_complete', $this->get_batch_id() );
        wp_mail( get_option( 'admin_email' ), 'CSV Import Complete', 'All rows have been processed.' );
    }

    private function log_failure( $row_id, $message ) {
        error_log( sprintf( '[CSV Import] Row %d failed: %s', $row_id, $message ) );
    }
}

// Usage
$processor = new CSV_Import_Process();

$csv_rows = parse_csv_file( $uploaded_file );
foreach ( $csv_rows as $index => $row ) {
    $processor->push_to_queue( array(
        'row_number' => $index,
        'data'       => $row,
        'file_id'    => $file_id,
    ) );
}

$processor->save()->dispatch();

Memory Limit Handling

One of the library’s strongest features is automatic memory management. Before processing each item, it calls memory_exceeded(), which compares current usage against a threshold (by default 90% of the PHP memory limit). If memory is running low, the processor stops, saves unprocessed items, and dispatches a fresh request with a clean memory slate.

You can override the threshold:

protected function memory_exceeded() {
    $memory_limit   = $this->get_memory_limit() * 0.85; // Use 85% threshold
    $current_memory = memory_get_usage( true );
    return $current_memory >= $memory_limit;
}

Limitations of WP Background Processing

The library solves the batch processing problem well, but it has limitations. It stores queue data in wp_options, which is not designed for high-throughput queue operations. With very large queues (100,000+ items), serialization and deserialization of option values becomes slow. There is no built-in retry count, no dead-letter queue, and no way to inspect or modify queued items after they have been pushed. The self-calling HTTP loop can be blocked by security plugins, HTTP authentication, or server configurations that reject loopback requests. And because each process uses a single database row for its entire batch, you cannot distribute work across multiple workers.

Action Scheduler: The WordPress Job Queue Standard

Action Scheduler is a job queue library for WordPress, originally developed for WooCommerce subscriptions and now used by WooCommerce core, WooCommerce Admin, Action Scheduler itself as a standalone library, and dozens of major plugins. It processes over a billion scheduled actions per month across the WordPress ecosystem.

Unlike WP-Cron (which stores events in a single option) or WP Background Processing (which stores batches in options), Action Scheduler uses dedicated database tables. Each scheduled action gets its own row with status tracking, attempt counts, timestamps, and a claim system for concurrency control. This architecture makes it suitable for high-volume, mission-critical job processing.

Installation and Setup

Action Scheduler can be included as a library in your plugin or theme, or installed as a standalone plugin. The recommended approach for plugins is bundling it:

// Include in your plugin
require_once __DIR__ . '/vendor/woocommerce/action-scheduler/action-scheduler.php';

// Or install via Composer
// composer require woocommerce/action-scheduler

When multiple plugins bundle Action Scheduler, WordPress automatically loads only the newest version, preventing conflicts. The library creates two custom tables on activation: wp_actionscheduler_actions for the queue and wp_actionscheduler_logs for execution logs.

Scheduling Patterns

Action Scheduler provides four scheduling functions, each mapping to a different use case:

// One-time action: run once at a specific time
as_schedule_single_action(
    strtotime( '+5 minutes' ),
    'wpkite_send_welcome_email',
    array( 'user_id' => 42 ),
    'wpkite-emails'  // Group for organization
);

// Recurring action: run every N seconds
as_schedule_recurring_action(
    time(),
    HOUR_IN_SECONDS,
    'wpkite_sync_inventory',
    array(),
    'wpkite-sync'
);

// Cron-like action: run on a cron schedule
as_schedule_cron_action(
    time(),
    '0 3 * * *',  // Daily at 3 AM
    'wpkite_generate_reports',
    array(),
    'wpkite-reports'
);

// Async action: run as soon as possible
as_enqueue_async_action(
    'wpkite_process_webhook',
    array( 'payload' => $webhook_data ),
    'wpkite-webhooks'
);

The group parameter is optional but valuable. It lets you query, cancel, or monitor related actions together. Think of groups as named queues within the system.

Processing Architecture

Action Scheduler processes actions using a claim-based system. The runner queries the database for pending actions, claims a batch by setting a claim ID and expiration time on those rows, then processes them sequentially. If a claimed action is not completed before the claim expires (default: 5 minutes), the claim is released and another runner can pick it up.

By default, processing happens via the same mechanism as WP-Cron: an async HTTP request triggered on page loads. But you can also process actions via WP-CLI, which is the recommended approach for production sites:

# Process pending actions via CLI
wp action-scheduler run

# Process with custom batch size and timeout
wp action-scheduler run --batch-size=50 --time-limit=120

# Run continuously as a daemon (with a system process manager)
while true; do
    wp action-scheduler run --batch-size=25 --time-limit=30
    sleep 5
done

High-Volume Tuning

Out of the box, Action Scheduler processes 25 actions per batch with a 30-second time limit and allows one concurrent runner. For high-volume workloads, you need to tune these settings:

// Increase batch size (default 25)
add_filter( 'action_scheduler_queue_runner_batch_size', function() {
    return 100;
});

// Increase time limit per batch (default 30 seconds)
add_filter( 'action_scheduler_queue_runner_time_limit', function() {
    return 120;
});

// Allow multiple concurrent runners (default 1)
add_filter( 'action_scheduler_queue_runner_concurrent_batches', function() {
    return 3;
});

// Increase the claim timeout for long-running actions (default 300 seconds)
add_filter( 'action_scheduler_timeout_period', function() {
    return 600;
});

// Control how long failed actions are retained (default 31 days)
add_filter( 'action_scheduler_retention_period', function() {
    return 7 * DAY_IN_SECONDS; // Keep for 7 days
});

Be careful when increasing concurrent batches. Each batch claims and processes actions independently, and your database must handle the additional load. On shared hosting, stick with a single runner. On dedicated servers or managed WordPress hosting with database replication, three to five concurrent runners can dramatically increase throughput.

For sites processing more than 10,000 actions per hour, consider switching to the custom tables data store (which Action Scheduler 3.0+ uses by default) and adding database indexes on the status and scheduled_date_gmt columns if they are not already present.

Comparing Approaches: WP-Cron vs. WP Background Processing vs. Action Scheduler

Each background processing approach in WordPress targets a different level of complexity and reliability. Choosing the right one depends on your volume, failure tolerance, and observability requirements.

Reliability

WP-Cron provides no retry mechanism. If a callback fails, it fails silently and the event is gone. WP Background Processing retries by keeping failed items in the queue, but there is no retry counter or backoff strategy. If an item consistently fails, it will loop forever until you either fix the underlying issue or the process crashes. Action Scheduler tracks attempt counts per action, marks failed actions with an error status, and stores the failure reason in its log table. You can query for failed actions, inspect the error, and decide whether to retry or discard.

Observability

WP-Cron gives you _get_cron_array() and nothing else. No logs, no history, no execution times. WP Background Processing provides a is_processing() method and an is_queue_empty() check, but no detailed insight into what is happening inside the queue. Action Scheduler stores complete execution logs: when an action was scheduled, when it was claimed, when it started, when it finished, how long it took, and what happened if it failed. It also includes an admin UI (under Tools > Scheduled Actions in WooCommerce sites) where you can search, filter, and manage actions.

Scaling

WP-Cron runs one event at a time during each cron spawn, making it fundamentally single-threaded. WP Background Processing runs one item at a time within a self-calling HTTP loop, also effectively single-threaded. Action Scheduler supports concurrent runners, each claiming and processing separate batches. Combined with WP-CLI runners managed by Supervisor or systemd, you can scale to multiple parallel workers on a single server.

For most WordPress sites handling up to a few hundred background jobs per day, Action Scheduler is the right choice. It provides the reliability and observability you need without the operational complexity of external queue systems. For sites processing hundreds of thousands or millions of jobs per day, you will likely need to look beyond WordPress-native solutions.

Building Retry Logic and Dead-Letter Patterns

Production job queues need to handle failure gracefully. A network timeout, a rate-limited API, a temporary database outage, a malformed data record. These things happen. Your queue system needs a strategy for each failure mode.

Exponential Backoff Retry

The simplest and most effective retry strategy is exponential backoff: wait longer between each retry attempt. This prevents hammering a failing service and gives transient issues time to resolve.

add_action( 'wpkite_sync_product', 'wpkite_handle_product_sync', 10, 3 );

function wpkite_handle_product_sync( $product_id, $attempt = 1, $max_attempts = 5 ) {
    try {
        $api      = new Product_API_Client();
        $response = $api->sync( $product_id );

        if ( is_wp_error( $response ) ) {
            throw new Exception( $response->get_error_message() );
        }

        update_post_meta( $product_id, '_last_synced', current_time( 'mysql' ) );
        delete_post_meta( $product_id, '_sync_failures' );

    } catch ( Exception $e ) {
        wpkite_log_job_failure( 'product_sync', $product_id, $attempt, $e->getMessage() );

        if ( $attempt < $max_attempts ) {
            // Exponential backoff: 30s, 120s, 480s, 1920s
            $delay = 30 * pow( 4, $attempt - 1 );

            as_schedule_single_action(
                time() + $delay,
                'wpkite_sync_product',
                array( $product_id, $attempt + 1, $max_attempts ),
                'wpkite-sync'
            );
        } else {
            // Max retries exceeded: move to dead letter queue
            wpkite_dead_letter( 'product_sync', array(
                'product_id'  => $product_id,
                'attempts'    => $attempt,
                'last_error'  => $e->getMessage(),
                'failed_at'   => current_time( 'mysql' ),
            ) );
        }
    }
}

function wpkite_log_job_failure( $job_type, $item_id, $attempt, $error ) {
    global $wpdb;

    $wpdb->insert(
        $wpdb->prefix . 'wpkite_job_failures',
        array(
            'job_type'   => $job_type,
            'item_id'    => $item_id,
            'attempt'    => $attempt,
            'error'      => $error,
            'created_at' => current_time( 'mysql' ),
        ),
        array( '%s', '%d', '%d', '%s', '%s' )
    );
}

Dead-Letter Queue Implementation

A dead-letter queue (DLQ) holds jobs that have exhausted all retry attempts. Instead of discarding failed jobs, you move them to a separate storage location where they can be inspected, fixed, and reprocessed manually. This pattern is standard in systems like Amazon SQS and RabbitMQ, and it translates well to WordPress.

/**
 * Create the dead letter table.
 */
function wpkite_create_dlq_table() {
    global $wpdb;
    $table = $wpdb->prefix . 'wpkite_dead_letters';
    $charset = $wpdb->get_charset_collate();

    $sql = "CREATE TABLE IF NOT EXISTS $table (
        id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
        job_type varchar(100) NOT NULL,
        payload longtext NOT NULL,
        error_message text NOT NULL,
        attempts int(11) NOT NULL DEFAULT 0,
        status varchar(20) NOT NULL DEFAULT 'dead',
        created_at datetime NOT NULL,
        reprocessed_at datetime DEFAULT NULL,
        PRIMARY KEY (id),
        KEY job_type (job_type),
        KEY status (status),
        KEY created_at (created_at)
    ) $charset;";

    require_once ABSPATH . 'wp-admin/includes/upgrade.php';
    dbDelta( $sql );
}

/**
 * Move a failed job to the dead letter queue.
 */
function wpkite_dead_letter( $job_type, $payload ) {
    global $wpdb;

    $wpdb->insert(
        $wpdb->prefix . 'wpkite_dead_letters',
        array(
            'job_type'      => $job_type,
            'payload'       => wp_json_encode( $payload ),
            'error_message' => $payload['last_error'] ?? 'Unknown error',
            'attempts'      => $payload['attempts'] ?? 0,
            'status'        => 'dead',
            'created_at'    => current_time( 'mysql' ),
        ),
        array( '%s', '%s', '%s', '%d', '%s', '%s' )
    );

    // Alert the admin
    do_action( 'wpkite_dead_letter_created', $job_type, $payload );
}

/**
 * Reprocess a dead letter item.
 */
function wpkite_reprocess_dead_letter( $dead_letter_id ) {
    global $wpdb;
    $table = $wpdb->prefix . 'wpkite_dead_letters';

    $item = $wpdb->get_row(
        $wpdb->prepare( "SELECT * FROM $table WHERE id = %d AND status = 'dead'", $dead_letter_id )
    );

    if ( ! $item ) {
        return new WP_Error( 'not_found', 'Dead letter item not found or already reprocessed.' );
    }

    $payload = json_decode( $item->payload, true );

    // Re-enqueue the original job
    as_enqueue_async_action(
        'wpkite_sync_product',
        array( $payload['product_id'], 1, 5 ),
        'wpkite-sync'
    );

    // Mark as reprocessed
    $wpdb->update(
        $table,
        array(
            'status'         => 'reprocessed',
            'reprocessed_at' => current_time( 'mysql' ),
        ),
        array( 'id' => $dead_letter_id ),
        array( '%s', '%s' ),
        array( '%d' )
    );

    return true;
}

Classifying Failures: Retryable vs. Permanent

Not all failures deserve a retry. A network timeout is transient and should be retried. A 404 response from an API endpoint that no longer exists is permanent and retrying will never succeed. Your retry logic should classify failures:

function wpkite_is_retryable_error( $error_code, $error_message ) {
    // Transient errors that should be retried
    $retryable_codes = array( 408, 429, 500, 502, 503, 504 );

    if ( in_array( $error_code, $retryable_codes, true ) ) {
        return true;
    }

    // Connection-level failures
    $transient_patterns = array(
        'cURL error 28',   // Timeout
        'cURL error 7',    // Connection refused
        'cURL error 56',   // Connection reset
        'Deadlock found',  // MySQL deadlock
        'Lock wait timeout',
    );

    foreach ( $transient_patterns as $pattern ) {
        if ( stripos( $error_message, $pattern ) !== false ) {
            return true;
        }
    }

    return false;
}

When a permanent failure occurs, skip the retry loop entirely and send the job straight to the dead-letter queue with a clear error message. This prevents wasting resources on jobs that will never succeed.

Monitoring Job Queues: Logging, Alerting, and Dashboards

A job queue without monitoring is a ticking time bomb. Jobs can fail silently, queues can back up, and workers can stall. You need three layers of observability: structured logging, active alerting, and a status dashboard.

Structured Job Logging

WordPress’s error_log() function dumps text to a file with no structure. For job queues, you need structured logs that can be queried, aggregated, and analyzed. Build a logging layer on top of a custom database table:

class Job_Logger {

    private $table;

    public function __construct() {
        global $wpdb;
        $this->table = $wpdb->prefix . 'wpkite_job_log';
    }

    public function log( $job_type, $status, $data = array() ) {
        global $wpdb;

        $wpdb->insert(
            $this->table,
            array(
                'job_type'    => $job_type,
                'status'      => $status, // started, completed, failed, retrying
                'item_id'     => $data['item_id'] ?? null,
                'duration_ms' => $data['duration_ms'] ?? null,
                'memory_peak' => $data['memory_peak'] ?? memory_get_peak_usage( true ),
                'message'     => $data['message'] ?? '',
                'context'     => wp_json_encode( $data['context'] ?? array() ),
                'created_at'  => current_time( 'mysql' ),
            ),
            array( '%s', '%s', '%d', '%d', '%d', '%s', '%s', '%s' )
        );
    }

    /**
     * Get failure rate for a job type over the last N hours.
     */
    public function failure_rate( $job_type, $hours = 24 ) {
        global $wpdb;

        $since = gmdate( 'Y-m-d H:i:s', time() - ( $hours * HOUR_IN_SECONDS ) );

        $total = (int) $wpdb->get_var( $wpdb->prepare(
            "SELECT COUNT(*) FROM {$this->table} WHERE job_type = %s AND created_at > %s AND status IN ('completed','failed')",
            $job_type,
            $since
        ) );

        if ( $total === 0 ) {
            return 0;
        }

        $failed = (int) $wpdb->get_var( $wpdb->prepare(
            "SELECT COUNT(*) FROM {$this->table} WHERE job_type = %s AND created_at > %s AND status = 'failed'",
            $job_type,
            $since
        ) );

        return round( ( $failed / $total ) * 100, 2 );
    }

    /**
     * Get average processing duration for a job type.
     */
    public function avg_duration( $job_type, $hours = 24 ) {
        global $wpdb;

        $since = gmdate( 'Y-m-d H:i:s', time() - ( $hours * HOUR_IN_SECONDS ) );

        return (float) $wpdb->get_var( $wpdb->prepare(
            "SELECT AVG(duration_ms) FROM {$this->table} WHERE job_type = %s AND created_at > %s AND status = 'completed'",
            $job_type,
            $since
        ) );
    }
}

// Usage inside a job handler
function wpkite_process_with_logging( $item ) {
    $logger = new Job_Logger();
    $start  = microtime( true );

    $logger->log( 'email_send', 'started', array(
        'item_id' => $item['email_id'],
    ) );

    try {
        send_transactional_email( $item );

        $duration = (int) ( ( microtime( true ) - $start ) * 1000 );

        $logger->log( 'email_send', 'completed', array(
            'item_id'     => $item['email_id'],
            'duration_ms' => $duration,
        ) );
    } catch ( Exception $e ) {
        $duration = (int) ( ( microtime( true ) - $start ) * 1000 );

        $logger->log( 'email_send', 'failed', array(
            'item_id'     => $item['email_id'],
            'duration_ms' => $duration,
            'message'     => $e->getMessage(),
        ) );

        throw $e;
    }
}

Alerting on Stuck and Failed Jobs

Monitoring is useless if nobody looks at it. Set up active alerts that fire when things go wrong:

/**
 * Check for stuck or failing jobs. Run via Action Scheduler every 15 minutes.
 */
add_action( 'wpkite_monitor_job_health', 'wpkite_check_job_health' );

function wpkite_check_job_health() {
    global $wpdb;

    $alerts = array();

    // Check 1: Actions stuck in "in-progress" for more than 10 minutes
    $stuck_count = (int) $wpdb->get_var(
        "SELECT COUNT(*) FROM {$wpdb->prefix}actionscheduler_actions
         WHERE status = 'in-progress'
         AND last_attempt_gmt < DATE_SUB( NOW(), INTERVAL 10 MINUTE )"
    );

    if ( $stuck_count > 0 ) {
        $alerts[] = sprintf( '%d actions stuck in progress for over 10 minutes.', $stuck_count );
    }

    // Check 2: High failure rate in the last hour
    $logger       = new Job_Logger();
    $job_types    = array( 'email_send', 'product_sync', 'csv_import' );

    foreach ( $job_types as $type ) {
        $rate = $logger->failure_rate( $type, 1 );
        if ( $rate > 10 ) {
            $alerts[] = sprintf( '%s failure rate is %.1f%% in the last hour.', $type, $rate );
        }
    }

    // Check 3: Queue backlog growing beyond threshold
    $pending = (int) $wpdb->get_var(
        "SELECT COUNT(*) FROM {$wpdb->prefix}actionscheduler_actions
         WHERE status = 'pending'
         AND scheduled_date_gmt < NOW()"
    );

    if ( $pending > 500 ) {
        $alerts[] = sprintf( '%d overdue actions in the queue.', $pending );
    }

    // Check 4: Dead letters created in the last hour
    $dead_letters = (int) $wpdb->get_var(
        $wpdb->prepare(
            "SELECT COUNT(*) FROM {$wpdb->prefix}wpkite_dead_letters
             WHERE created_at > %s",
            gmdate( 'Y-m-d H:i:s', time() - HOUR_IN_SECONDS )
        )
    );

    if ( $dead_letters > 0 ) {
        $alerts[] = sprintf( '%d new dead-letter items in the last hour.', $dead_letters );
    }

    // Send alert if any issues found
    if ( ! empty( $alerts ) ) {
        $message  = "Job Queue Health Alert\n\n";
        $message .= implode( "\n", $alerts );
        $message .= "\n\nTimestamp: " . current_time( 'mysql' );
        $message .= "\nSite: " . home_url();

        wp_mail(
            get_option( 'admin_email' ),
            '[WPKite] Job Queue Health Alert',
            $message
        );

        // Also send to Slack if configured
        $slack_webhook = get_option( 'wpkite_slack_webhook_url' );
        if ( $slack_webhook ) {
            wp_remote_post( $slack_webhook, array(
                'body' => wp_json_encode( array(
                    'text' => $message,
                ) ),
                'headers' => array( 'Content-Type' => 'application/json' ),
            ) );
        }
    }
}

// Schedule the health check
if ( function_exists( 'as_schedule_recurring_action' ) ) {
    if ( ! as_next_scheduled_action( 'wpkite_monitor_job_health' ) ) {
        as_schedule_recurring_action( time(), 15 * MINUTE_IN_SECONDS, 'wpkite_monitor_job_health', array(), 'wpkite-monitoring' );
    }
}

Admin Dashboard for Queue Status

A simple admin page can give your team instant visibility into queue health. Register a page under the WordPress admin menu and display key metrics: pending actions by group, failed actions in the last 24 hours, average processing times, and dead-letter counts. This does not need to be complex. A table with five columns and a refresh button is more useful than an elaborate charting library that nobody maintains.

add_action( 'admin_menu', function() {
    add_management_page(
        'Job Queue Status',
        'Job Queue',
        'manage_options',
        'wpkite-job-queue',
        'wpkite_render_queue_dashboard'
    );
});

function wpkite_render_queue_dashboard() {
    global $wpdb;

    $stats = $wpdb->get_results(
        "SELECT status, COUNT(*) as count
         FROM {$wpdb->prefix}actionscheduler_actions
         WHERE scheduled_date_gmt > DATE_SUB( NOW(), INTERVAL 7 DAY )
         GROUP BY status"
    );

    $groups = $wpdb->get_results(
        "SELECT `group_id`, COUNT(*) as pending_count
         FROM {$wpdb->prefix}actionscheduler_actions
         WHERE status = 'pending'
         GROUP BY group_id
         ORDER BY pending_count DESC"
    );

    echo '<div class="wrap">';
    echo '<h1>Job Queue Status</h1>';

    echo '<h2>Action Status (Last 7 Days)</h2>';
    echo '<table class="widefat striped">';
    echo '<thead><tr><th>Status</th><th>Count</th></tr></thead><tbody>';
    foreach ( $stats as $row ) {
        printf( '<tr><td>%s</td><td>%d</td></tr>', esc_html( $row->status ), $row->count );
    }
    echo '</tbody></table>';

    echo '<h2>Pending by Group</h2>';
    echo '<table class="widefat striped">';
    echo '<thead><tr><th>Group ID</th><th>Pending</th></tr></thead><tbody>';
    foreach ( $groups as $row ) {
        printf( '<tr><td>%d</td><td>%d</td></tr>', $row->group_id, $row->pending_count );
    }
    echo '</tbody></table>';

    echo '</div>';
}

External Queues: Pushing WordPress Jobs to Redis and SQS

When your WordPress site needs to process tens of thousands of jobs per hour, or when job processing must survive server restarts, database maintenance windows, and deployment cycles, the WordPress-internal approaches start to creak. The database becomes a bottleneck. Claim-based locking generates contention. You start fighting MySQL rather than processing jobs.

External message queues like Redis (via its List or Stream data types) and Amazon SQS are purpose-built for this workload. They handle millions of messages per second, provide built-in visibility timeout (equivalent to claim expiry), and support multiple consumers natively.

Redis Queue with WP-CLI Workers

Redis Lists provide a simple, fast FIFO queue. You push jobs from your WordPress application code and consume them with WP-CLI worker processes that run continuously outside the web server.

/**
 * Redis-backed job queue for WordPress.
 * Requires the phpredis extension or Predis library.
 */
class WPKite_Redis_Queue {

    private $redis;
    private $queue_key;

    public function __construct( $queue_name = 'default' ) {
        $this->redis     = new Redis();
        $this->redis->connect(
            defined( 'REDIS_HOST' ) ? REDIS_HOST : '127.0.0.1',
            defined( 'REDIS_PORT' ) ? REDIS_PORT : 6379
        );

        if ( defined( 'REDIS_PASSWORD' ) && REDIS_PASSWORD ) {
            $this->redis->auth( REDIS_PASSWORD );
        }

        $this->queue_key = 'wpkite:queue:' . $queue_name;
    }

    /**
     * Push a job onto the queue.
     */
    public function push( $job_type, $payload = array() ) {
        $job = array(
            'id'         => wp_generate_uuid4(),
            'type'       => $job_type,
            'payload'    => $payload,
            'attempts'   => 0,
            'created_at' => gmdate( 'Y-m-d H:i:s' ),
        );

        $this->redis->rPush( $this->queue_key, wp_json_encode( $job ) );

        return $job['id'];
    }

    /**
     * Pop and process the next job. Uses BRPOPLPUSH for reliable processing.
     * The job moves to a "processing" list; on success it is removed,
     * on failure it can be re-queued or sent to the DLQ.
     */
    public function pop( $timeout = 5 ) {
        $processing_key = $this->queue_key . ':processing';

        $raw = $this->redis->brpoplpush( $this->queue_key, $processing_key, $timeout );

        if ( $raw === false ) {
            return null; // No jobs available
        }

        return json_decode( $raw, true );
    }

    /**
     * Acknowledge successful processing.
     */
    public function ack( $job ) {
        $processing_key = $this->queue_key . ':processing';
        $this->redis->lRem( $processing_key, wp_json_encode( $job ), 1 );
    }

    /**
     * Return a failed job to the queue with incremented attempt count.
     */
    public function retry( $job, $max_attempts = 5 ) {
        $processing_key = $this->queue_key . ':processing';
        $this->redis->lRem( $processing_key, wp_json_encode( $job ), 1 );

        $job['attempts'] += 1;

        if ( $job['attempts'] >= $max_attempts ) {
            // Move to dead letter list
            $dlq_key = $this->queue_key . ':dlq';
            $job['failed_at'] = gmdate( 'Y-m-d H:i:s' );
            $this->redis->rPush( $dlq_key, wp_json_encode( $job ) );
            return false;
        }

        // Re-add to the main queue
        $this->redis->rPush( $this->queue_key, wp_json_encode( $job ) );
        return true;
    }

    /**
     * Get queue depth for monitoring.
     */
    public function depth() {
        return $this->redis->lLen( $this->queue_key );
    }

    /**
     * Get dead letter queue depth.
     */
    public function dlq_depth() {
        return $this->redis->lLen( $this->queue_key . ':dlq' );
    }
}

WP-CLI Worker Command

The consumer runs as a WP-CLI command, which means it has full access to the WordPress environment (database, functions, plugins) without the overhead of an HTTP request:

if ( defined( 'WP_CLI' ) && WP_CLI ) {

    class WPKite_Queue_Command extends WP_CLI_Command {

        /**
         * Run the queue worker.
         *
         * ## OPTIONS
         *
         * [--queue=]
         * : Queue name to process. Default: default
         *
         * [--memory-limit=]
         * : Memory limit in MB before restarting. Default: 128
         *
         * [--max-jobs=]
         * : Maximum jobs to process before exiting. Default: 1000
         *
         * [--sleep=]
         * : Seconds to sleep when queue is empty. Default: 3
         *
         * ## EXAMPLES
         *
         *     wp wpkite-queue work --queue=emails --memory-limit=256
         */
        public function work( $args, $assoc_args ) {
            $queue_name   = $assoc_args['queue'] ?? 'default';
            $memory_limit = ( $assoc_args['memory-limit'] ?? 128 ) * 1024 * 1024; // Convert to bytes
            $max_jobs     = (int) ( $assoc_args['max-jobs'] ?? 1000 );
            $sleep_time   = (int) ( $assoc_args['sleep'] ?? 3 );

            $queue     = new WPKite_Redis_Queue( $queue_name );
            $processed = 0;

            WP_CLI::log( sprintf( 'Worker started on queue "%s". Memory limit: %dMB, Max jobs: %d',
                $queue_name, $memory_limit / 1024 / 1024, $max_jobs ) );

            while ( $processed < $max_jobs ) {
                // Check memory before each job
                if ( memory_get_usage( true ) > $memory_limit ) {
                    WP_CLI::warning( 'Memory limit approaching. Restarting worker.' );
                    break;
                }

                $job = $queue->pop( $sleep_time );

                if ( $job === null ) {
                    continue; // No jobs, loop will call pop() again with blocking wait
                }

                $start = microtime( true );

                try {
                    $this->dispatch_job( $job );
                    $queue->ack( $job );
                    $processed++;

                    $duration = round( ( microtime( true ) - $start ) * 1000 );
                    WP_CLI::log( sprintf( '[%s] Processed job %s (%s) in %dms',
                        current_time( 'H:i:s' ), $job['id'], $job['type'], $duration ) );

                } catch ( Exception $e ) {
                    $retried = $queue->retry( $job );
                    $status  = $retried ? 'retrying' : 'dead-lettered';

                    WP_CLI::warning( sprintf( '[%s] Job %s failed (%s): %s',
                        current_time( 'H:i:s' ), $job['id'], $status, $e->getMessage() ) );
                }

                // Free memory between jobs
                wp_cache_flush();

                if ( function_exists( 'gc_collect_cycles' ) ) {
                    gc_collect_cycles();
                }
            }

            WP_CLI::success( sprintf( 'Worker exiting. Processed %d jobs.', $processed ) );
        }

        private function dispatch_job( $job ) {
            $handler = apply_filters( 'wpkite_job_handler_' . $job['type'], null );

            if ( is_callable( $handler ) ) {
                call_user_func( $handler, $job['payload'] );
            } else {
                // Fall back to WordPress action
                do_action( 'wpkite_process_job_' . $job['type'], $job['payload'] );
            }
        }
    }

    WP_CLI::add_command( 'wpkite-queue', 'WPKite_Queue_Command' );
}

Process Management with Supervisor

WP-CLI workers need a process manager to keep them running. Supervisor is the standard choice on Linux servers:

; /etc/supervisor/conf.d/wpkite-queue.conf
[program:wpkite-queue-default]
command=/usr/local/bin/wp wpkite-queue work --queue=default --memory-limit=128 --max-jobs=500 --path=/var/www/html
directory=/var/www/html
user=www-data
numprocs=2
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
stopwaitsecs=30
stdout_logfile=/var/log/wpkite/queue-default.log
stderr_logfile=/var/log/wpkite/queue-default-error.log

[program:wpkite-queue-emails]
command=/usr/local/bin/wp wpkite-queue work --queue=emails --memory-limit=64 --max-jobs=1000 --path=/var/www/html
directory=/var/www/html
user=www-data
numprocs=1
autostart=true
autorestart=true
stdout_logfile=/var/log/wpkite/queue-emails.log
stderr_logfile=/var/log/wpkite/queue-emails-error.log

The numprocs=2 setting runs two parallel workers for the default queue, doubling throughput. The autorestart=true setting means that when a worker exits (due to memory limits or max-jobs), Supervisor immediately starts a fresh one. This gives you automatic memory reclamation without losing any jobs.

Amazon SQS Integration

For cloud-hosted WordPress sites, Amazon SQS provides a fully managed queue with automatic scaling, message retention (up to 14 days), and built-in dead-letter queue support. The integration pattern is similar to Redis, but you replace the Redis client with the AWS SDK:

use Aws\Sqs\SqsClient;

class WPKite_SQS_Queue {

    private $client;
    private $queue_url;

    public function __construct( $queue_name = 'wpkite-default' ) {
        $this->client = new SqsClient( array(
            'region'  => defined( 'AWS_REGION' ) ? AWS_REGION : 'us-east-1',
            'version' => '2012-11-05',
        ) );

        $result = $this->client->getQueueUrl( array( 'QueueName' => $queue_name ) );
        $this->queue_url = $result->get( 'QueueUrl' );
    }

    public function push( $job_type, $payload = array() ) {
        $message = array(
            'type'       => $job_type,
            'payload'    => $payload,
            'created_at' => gmdate( 'Y-m-d H:i:s' ),
            'site_url'   => home_url(),
        );

        $result = $this->client->sendMessage( array(
            'QueueUrl'    => $this->queue_url,
            'MessageBody' => wp_json_encode( $message ),
            'MessageAttributes' => array(
                'JobType' => array(
                    'DataType'    => 'String',
                    'StringValue' => $job_type,
                ),
            ),
        ) );

        return $result->get( 'MessageId' );
    }

    public function receive( $max_messages = 1, $wait_time = 20 ) {
        $result = $this->client->receiveMessage( array(
            'QueueUrl'            => $this->queue_url,
            'MaxNumberOfMessages' => $max_messages,
            'WaitTimeSeconds'     => $wait_time, // Long polling
            'VisibilityTimeout'   => 300,         // 5-minute processing window
        ) );

        $messages = $result->get( 'Messages' );

        if ( empty( $messages ) ) {
            return array();
        }

        return array_map( function( $msg ) {
            return array(
                'receipt_handle' => $msg['ReceiptHandle'],
                'body'           => json_decode( $msg['Body'], true ),
                'message_id'     => $msg['MessageId'],
            );
        }, $messages );
    }

    public function delete( $receipt_handle ) {
        $this->client->deleteMessage( array(
            'QueueUrl'      => $this->queue_url,
            'ReceiptHandle' => $receipt_handle,
        ) );
    }
}

SQS has a built-in dead-letter queue feature called a “redrive policy.” You configure it in the AWS console or via CloudFormation: after N failed receive attempts, SQS automatically moves the message to a separate DLQ. This removes the need to implement dead-letter logic in your application code.

Real-World Patterns: CSV Imports, Email Sending, API Sync, and Image Processing

Theory is useful, but the real test of a job queue system is how it handles actual production workloads. Here are four patterns I have used across dozens of WordPress sites, each with specific challenges and solutions.

Pattern 1: Chunked CSV Import

A 50,000-row CSV file cannot be processed in a single HTTP request. The standard approach is to parse the file, split it into chunks, and enqueue each chunk as a separate job.

function wpkite_start_csv_import( $file_path ) {
    $handle = fopen( $file_path, 'r' );
    if ( ! $handle ) {
        return new WP_Error( 'file_error', 'Cannot open CSV file.' );
    }

    $headers   = fgetcsv( $handle );
    $chunk     = array();
    $chunk_num = 0;
    $row_num   = 0;
    $batch_id  = wp_generate_uuid4();

    // Store import metadata
    set_transient( 'wpkite_import_' . $batch_id, array(
        'file'       => basename( $file_path ),
        'started_at' => current_time( 'mysql' ),
        'status'     => 'processing',
        'total_rows' => 0,
        'processed'  => 0,
        'failed'     => 0,
    ), DAY_IN_SECONDS );

    while ( ( $row = fgetcsv( $handle ) ) !== false ) {
        $row_num++;
        $data = array_combine( $headers, $row );

        $chunk[] = array(
            'row_number' => $row_num,
            'data'       => $data,
        );

        // Enqueue in chunks of 100 rows
        if ( count( $chunk ) >= 100 ) {
            $chunk_num++;
            as_enqueue_async_action(
                'wpkite_process_csv_chunk',
                array( $batch_id, $chunk_num, $chunk ),
                'wpkite-csv-import'
            );
            $chunk = array();
        }
    }

    // Enqueue remaining rows
    if ( ! empty( $chunk ) ) {
        $chunk_num++;
        as_enqueue_async_action(
            'wpkite_process_csv_chunk',
            array( $batch_id, $chunk_num, $chunk ),
            'wpkite-csv-import'
        );
    }

    fclose( $handle );

    // Update total row count
    $meta = get_transient( 'wpkite_import_' . $batch_id );
    $meta['total_rows'] = $row_num;
    $meta['chunks']     = $chunk_num;
    set_transient( 'wpkite_import_' . $batch_id, $meta, DAY_IN_SECONDS );

    return $batch_id;
}

add_action( 'wpkite_process_csv_chunk', function( $batch_id, $chunk_num, $rows ) {
    foreach ( $rows as $row_item ) {
        $result = wpkite_import_single_row( $row_item['data'] );

        // Update progress
        $meta = get_transient( 'wpkite_import_' . $batch_id );
        if ( is_wp_error( $result ) ) {
            $meta['failed']++;
        } else {
            $meta['processed']++;
        }
        set_transient( 'wpkite_import_' . $batch_id, $meta, DAY_IN_SECONDS );
    }
}, 10, 3 );

The key design decisions here: 100-row chunks balance throughput against memory usage. Each chunk is an independent action, so they can be processed by different workers in parallel if you have multiple runners. The batch metadata stored in a transient allows the frontend to poll for progress updates via AJAX.

Pattern 2: Throttled Email Sending

Sending 5,000 emails at once will get you rate-limited or blacklisted by any email provider. The job queue needs to enforce throughput limits:

function wpkite_enqueue_bulk_email( $recipient_ids, $template, $subject ) {
    $campaign_id = wp_generate_uuid4();

    foreach ( $recipient_ids as $index => $user_id ) {
        // Stagger emails: 50 per minute = 1 every 1.2 seconds
        $delay = (int) ( $index * 1.2 );

        as_schedule_single_action(
            time() + $delay,
            'wpkite_send_single_email',
            array(
                'campaign_id' => $campaign_id,
                'user_id'     => $user_id,
                'template'    => $template,
                'subject'     => $subject,
            ),
            'wpkite-emails'
        );
    }

    return $campaign_id;
}

add_action( 'wpkite_send_single_email', function( $args ) {
    $user = get_userdata( $args['user_id'] );
    if ( ! $user ) {
        return; // User deleted, skip silently
    }

    $template = wpkite_load_email_template( $args['template'] );
    $body     = wpkite_render_template( $template, array(
        'name'  => $user->display_name,
        'email' => $user->user_email,
    ) );

    $sent = wp_mail( $user->user_email, $args['subject'], $body, array(
        'Content-Type: text/html; charset=UTF-8',
    ) );

    if ( ! $sent ) {
        // Let Action Scheduler handle the retry
        throw new Exception( 'wp_mail returned false for user ' . $args['user_id'] );
    }

    // Track delivery
    update_user_meta( $args['user_id'], '_last_email_campaign', $args['campaign_id'] );
    update_user_meta( $args['user_id'], '_last_email_sent', current_time( 'mysql' ) );
});

The staggering approach (scheduling each email with an incremental delay) is crude but effective. For more precise rate limiting, use a token bucket pattern with a Redis counter that resets every minute.

Pattern 3: Two-Way API Synchronization

Syncing data between WordPress and an external system (CRM, ERP, inventory management) requires careful handling of conflicts, rate limits, and partial failures:

add_action( 'wpkite_sync_to_crm', function( $args ) {
    $customer_id = $args['customer_id'];
    $direction   = $args['direction']; // 'push' or 'pull'

    // Acquire a per-customer lock to prevent concurrent syncs
    $lock_key = 'wpkite_sync_lock_' . $customer_id;
    if ( get_transient( $lock_key ) ) {
        // Another sync is in progress; re-schedule for 30 seconds later
        as_schedule_single_action(
            time() + 30,
            'wpkite_sync_to_crm',
            array( $args ),
            'wpkite-sync'
        );
        return;
    }

    set_transient( $lock_key, true, 300 ); // 5-minute lock

    try {
        $crm = new CRM_API_Client();

        if ( $direction === 'push' ) {
            $local_data  = wpkite_get_customer_data( $customer_id );
            $remote_data = $crm->get_customer( $local_data['crm_id'] );

            // Conflict detection: compare last modified timestamps
            if ( strtotime( $remote_data['updated_at'] ) > strtotime( $local_data['updated_at'] ) ) {
                // Remote is newer; pull instead of push
                wpkite_update_local_customer( $customer_id, $remote_data );
            } else {
                $crm->update_customer( $local_data['crm_id'], $local_data );
            }
        } elseif ( $direction === 'pull' ) {
            $remote_data = $crm->get_customer_by_email( get_userdata( $customer_id )->user_email );
            if ( $remote_data ) {
                wpkite_update_local_customer( $customer_id, $remote_data );
            }
        }
    } finally {
        delete_transient( $lock_key );
    }
});

The per-customer locking prevents two sync jobs from overwriting each other. The conflict detection ensures that the most recent data wins. These patterns become critical when you have webhooks firing from both sides.

Pattern 4: Bulk Image Processing

Image operations are memory-intensive. A single 4000×3000 pixel JPEG can consume 48MB of RAM when loaded into GD or Imagick. Processing a batch of 200 images inline would require more memory than most PHP processes allow. The solution is one image per job, with aggressive memory monitoring:

add_action( 'wpkite_process_image', function( $args ) {
    $attachment_id = $args['attachment_id'];
    $operations    = $args['operations']; // e.g., ['resize', 'optimize', 'webp']

    $file = get_attached_file( $attachment_id );
    if ( ! $file || ! file_exists( $file ) ) {
        throw new Exception( 'Attachment file not found: ' . $attachment_id );
    }

    $memory_before = memory_get_usage( true );

    foreach ( $operations as $operation ) {
        switch ( $operation ) {
            case 'resize':
                $editor = wp_get_image_editor( $file );
                if ( is_wp_error( $editor ) ) {
                    throw new Exception( $editor->get_error_message() );
                }
                $editor->resize( 1920, 1080, false );
                $editor->save( $file );
                unset( $editor ); // Explicitly free memory
                break;

            case 'optimize':
                wpkite_optimize_image( $file ); // Your optimization logic
                break;

            case 'webp':
                $webp_path = preg_replace( '/\.(jpe?g|png)$/i', '.webp', $file );
                $editor    = wp_get_image_editor( $file );
                if ( ! is_wp_error( $editor ) ) {
                    $editor->save( $webp_path, 'image/webp' );
                    update_post_meta( $attachment_id, '_webp_path', $webp_path );
                    unset( $editor );
                }
                break;
        }

        // Check memory after each operation
        $memory_now = memory_get_usage( true );
        if ( $memory_now > 100 * 1024 * 1024 ) { // Over 100MB
            error_log( sprintf(
                '[Image Processing] High memory usage after %s on attachment %d: %dMB',
                $operation,
                $attachment_id,
                $memory_now / 1024 / 1024
            ) );
        }
    }

    // Update metadata to reflect changes
    wp_update_attachment_metadata( $attachment_id, wp_generate_attachment_metadata( $attachment_id, $file ) );
});

Notice the explicit unset( $editor ) calls after each image operation. PHP’s garbage collector does not always free image resources immediately, and in a loop processing multiple operations, this can cause memory to climb steadily. Explicit cleanup is essential.

Memory Management in Long-Running Processes

PHP was designed for short-lived request-response cycles. A typical web request lives for a few hundred milliseconds, allocates some memory, and dies. The process is cleaned up, and all memory returns to the operating system. Long-running queue workers break this assumption. A worker that processes thousands of jobs over hours can accumulate memory leaks that would be invisible in a normal request.

WordPress-Specific Memory Leaks

Several WordPress subsystems accumulate memory over time:

The Object Cache. WordPress caches database queries in $wp_object_cache. In a normal request, this cache is useful because it prevents duplicate queries. In a long-running worker processing thousands of different posts, the cache grows without bound. Call wp_cache_flush() periodically (every 50-100 jobs) to reset it.

The Query Log. If SAVEQUERIES is enabled (common in development), WordPress stores every SQL query and its backtrace in $wpdb->queries. Over thousands of jobs, this array can consume hundreds of megabytes. Never enable SAVEQUERIES in production, and in your worker startup code, explicitly disable it:

// At the start of your worker process
global $wpdb;
$wpdb->queries = array();

// Periodically during processing
if ( $job_count % 100 === 0 ) {
    $wpdb->queries = array();
    wp_cache_flush();
}

Action and Filter Accumulation. Some plugins add actions or filters dynamically during processing. If a plugin calls add_action() inside a loop without a corresponding remove_action(), the $wp_filter global grows. This is rare but devastating when it happens. Monitor memory_get_usage( true ) between jobs and look for steady upward trends.

WP_Query Post Cache. Each WP_Query or get_posts() call caches the returned post objects. In a worker that queries thousands of different posts, these cached objects accumulate. Use 'cache_results' => false and 'update_post_meta_cache' => false in your query arguments when you do not need caching:

$products = get_posts( array(
    'post_type'              => 'product',
    'posts_per_page'         => 50,
    'cache_results'          => false,
    'update_post_meta_cache' => false,
    'update_post_term_cache' => false,
    'fields'                 => 'ids', // Only fetch IDs if that is all you need
) );

The Restart Strategy

The most reliable way to manage memory in long-running PHP processes is to not fight it. Instead, design your workers to exit gracefully after processing a fixed number of jobs or when memory usage crosses a threshold, then let your process manager (Supervisor, systemd) restart them automatically:

function wpkite_should_worker_restart( $jobs_processed, $max_jobs = 500, $memory_pct = 0.85 ) {
    // Exit after N jobs regardless of memory
    if ( $jobs_processed >= $max_jobs ) {
        return 'max_jobs_reached';
    }

    // Exit if memory usage exceeds threshold
    $limit   = wpkite_get_memory_limit_bytes();
    $current = memory_get_usage( true );

    if ( $current >= $limit * $memory_pct ) {
        return 'memory_threshold';
    }

    return false;
}

function wpkite_get_memory_limit_bytes() {
    $limit = ini_get( 'memory_limit' );

    if ( $limit === '-1' ) {
        return PHP_INT_MAX;
    }

    $unit  = strtolower( substr( $limit, -1 ) );
    $bytes = (int) $limit;

    switch ( $unit ) {
        case 'g':
            $bytes *= 1024 * 1024 * 1024;
            break;
        case 'm':
            $bytes *= 1024 * 1024;
            break;
        case 'k':
            $bytes *= 1024;
            break;
    }

    return $bytes;
}

This restart-on-threshold pattern is used by every serious PHP queue worker, from Laravel’s queue:work to Symfony Messenger. It acknowledges that PHP’s memory management has limits and works with those limits rather than against them.

Complete Production Job Queue System

Let us bring everything together into a production-ready job queue system that you can drop into any WordPress project. This system uses Action Scheduler as the queue backend (no external dependencies), with structured logging, retry logic, dead-letter handling, and monitoring built in.

The Job Manager Class

/**
 * WPKite Job Manager
 *
 * A production-ready job queue system built on Action Scheduler.
 * Provides structured logging, retry logic with exponential backoff,
 * dead-letter queue, and health monitoring.
 *
 * @package WPKite
 */
class WPKite_Job_Manager {

    /** @var string Action Scheduler group for all managed jobs */
    const GROUP = 'wpkite-jobs';

    /** @var string Option key for job type registry */
    const REGISTRY_KEY = 'wpkite_job_types';

    /** @var array In-memory cache of registered job types */
    private static $registered_types = array();

    /**
     * Initialize the job manager.
     */
    public static function init() {
        // Register the meta-handler that routes all jobs
        add_action( 'wpkite_run_job', array( __CLASS__, 'execute_job' ), 10, 1 );

        // Register the monitoring cron
        add_action( 'wpkite_job_health_check', array( __CLASS__, 'run_health_check' ) );

        // Schedule health check if not already scheduled
        if ( function_exists( 'as_next_scheduled_action' ) && ! as_next_scheduled_action( 'wpkite_job_health_check' ) ) {
            as_schedule_recurring_action(
                time(),
                15 * MINUTE_IN_SECONDS,
                'wpkite_job_health_check',
                array(),
                self::GROUP
            );
        }
    }

    /**
     * Register a job type with its handler and configuration.
     *
     * @param string   $type        Unique job type identifier.
     * @param callable $handler     Function to process the job payload.
     * @param array    $config      {
     *     Optional configuration.
     *     @type int  $max_retries      Maximum retry attempts (default 3).
     *     @type int  $base_delay       Base delay for exponential backoff in seconds (default 60).
     *     @type int  $timeout          Maximum execution time in seconds (default 300).
     *     @type bool $retryable        Whether failed jobs should be retried (default true).
     * }
     */
    public static function register( $type, $handler, $config = array() ) {
        $defaults = array(
            'max_retries' => 3,
            'base_delay'  => 60,
            'timeout'     => 300,
            'retryable'   => true,
        );

        self::$registered_types[ $type ] = array(
            'handler' => $handler,
            'config'  => wp_parse_args( $config, $defaults ),
        );
    }

    /**
     * Dispatch a job for async processing.
     *
     * @param string $type    Registered job type.
     * @param array  $payload Data to pass to the job handler.
     * @param int    $delay   Optional delay in seconds before processing.
     * @return int|WP_Error   Action ID or error.
     */
    public static function dispatch( $type, $payload = array(), $delay = 0 ) {
        if ( ! isset( self::$registered_types[ $type ] ) ) {
            return new WP_Error( 'unknown_job_type', sprintf( 'Job type "%s" is not registered.', $type ) );
        }

        $job_data = array(
            'type'       => $type,
            'payload'    => $payload,
            'attempt'    => 0,
            'dispatched' => current_time( 'mysql' ),
            'job_id'     => wp_generate_uuid4(),
        );

        if ( $delay > 0 ) {
            $action_id = as_schedule_single_action(
                time() + $delay,
                'wpkite_run_job',
                array( $job_data ),
                self::GROUP
            );
        } else {
            $action_id = as_enqueue_async_action(
                'wpkite_run_job',
                array( $job_data ),
                self::GROUP
            );
        }

        self::log( $type, 'dispatched', array(
            'job_id'    => $job_data['job_id'],
            'action_id' => $action_id,
            'delay'     => $delay,
        ) );

        return $action_id;
    }

    /**
     * Dispatch multiple jobs as a batch.
     *
     * @param string $type     Registered job type.
     * @param array  $payloads Array of payload arrays.
     * @return array           Array of action IDs.
     */
    public static function dispatch_batch( $type, $payloads ) {
        $action_ids = array();
        $batch_id   = wp_generate_uuid4();

        foreach ( $payloads as $index => $payload ) {
            $payload['_batch_id']    = $batch_id;
            $payload['_batch_index'] = $index;
            $payload['_batch_total'] = count( $payloads );

            $result = self::dispatch( $type, $payload );

            if ( ! is_wp_error( $result ) ) {
                $action_ids[] = $result;
            }
        }

        return array(
            'batch_id'   => $batch_id,
            'action_ids' => $action_ids,
            'total'      => count( $payloads ),
            'dispatched' => count( $action_ids ),
        );
    }

    /**
     * Execute a job. Called by Action Scheduler.
     *
     * @param array $job_data Job configuration and payload.
     */
    public static function execute_job( $job_data ) {
        $type = $job_data['type'];

        if ( ! isset( self::$registered_types[ $type ] ) ) {
            self::log( $type, 'error', array(
                'job_id'  => $job_data['job_id'],
                'message' => 'Job type not registered. Was the handler deactivated?',
            ) );
            return;
        }

        $registration = self::$registered_types[ $type ];
        $config       = $registration['config'];
        $handler      = $registration['handler'];

        $start       = microtime( true );
        $memory_start = memory_get_usage( true );

        self::log( $type, 'started', array(
            'job_id'  => $job_data['job_id'],
            'attempt' => $job_data['attempt'] + 1,
        ) );

        try {
            // Set a timeout alarm if pcntl is available
            if ( function_exists( 'pcntl_alarm' ) ) {
                pcntl_alarm( $config['timeout'] );
            }

            // Call the registered handler
            call_user_func( $handler, $job_data['payload'] );

            $duration    = (int) ( ( microtime( true ) - $start ) * 1000 );
            $memory_used = memory_get_usage( true ) - $memory_start;

            self::log( $type, 'completed', array(
                'job_id'      => $job_data['job_id'],
                'duration_ms' => $duration,
                'memory_used' => $memory_used,
            ) );

        } catch ( Exception $e ) {
            $duration = (int) ( ( microtime( true ) - $start ) * 1000 );

            self::log( $type, 'failed', array(
                'job_id'      => $job_data['job_id'],
                'attempt'     => $job_data['attempt'] + 1,
                'duration_ms' => $duration,
                'error'       => $e->getMessage(),
                'error_class' => get_class( $e ),
            ) );

            // Decide whether to retry
            if ( $config['retryable'] && $job_data['attempt'] < $config['max_retries'] ) {
                $next_attempt = $job_data['attempt'] + 1;
                $delay        = $config['base_delay'] * pow( 2, $job_data['attempt'] );

                $retry_data             = $job_data;
                $retry_data['attempt']  = $next_attempt;

                as_schedule_single_action(
                    time() + $delay,
                    'wpkite_run_job',
                    array( $retry_data ),
                    self::GROUP
                );

                self::log( $type, 'retrying', array(
                    'job_id'       => $job_data['job_id'],
                    'next_attempt' => $next_attempt + 1,
                    'delay'        => $delay,
                ) );
            } else {
                // Exhausted retries or non-retryable: dead letter
                self::send_to_dlq( $job_data, $e->getMessage() );
            }
        }

        // Clean up after each job
        if ( function_exists( 'pcntl_alarm' ) ) {
            pcntl_alarm( 0 );
        }
    }

    /**
     * Send a failed job to the dead letter queue.
     */
    private static function send_to_dlq( $job_data, $error_message ) {
        global $wpdb;

        $wpdb->insert(
            $wpdb->prefix . 'wpkite_dead_letters',
            array(
                'job_type'      => $job_data['type'],
                'payload'       => wp_json_encode( $job_data ),
                'error_message' => $error_message,
                'attempts'      => $job_data['attempt'] + 1,
                'status'        => 'dead',
                'created_at'    => current_time( 'mysql' ),
            ),
            array( '%s', '%s', '%s', '%d', '%s', '%s' )
        );

        self::log( $job_data['type'], 'dead_lettered', array(
            'job_id' => $job_data['job_id'],
            'error'  => $error_message,
        ) );

        do_action( 'wpkite_job_dead_lettered', $job_data, $error_message );
    }

    /**
     * Write a structured log entry.
     */
    private static function log( $job_type, $status, $data = array() ) {
        global $wpdb;

        $wpdb->insert(
            $wpdb->prefix . 'wpkite_job_log',
            array(
                'job_type'    => $job_type,
                'status'      => $status,
                'job_id'      => $data['job_id'] ?? '',
                'duration_ms' => $data['duration_ms'] ?? null,
                'memory_peak' => memory_get_peak_usage( true ),
                'message'     => $data['message'] ?? $data['error'] ?? '',
                'context'     => wp_json_encode( $data ),
                'created_at'  => current_time( 'mysql' ),
            ),
            array( '%s', '%s', '%s', '%d', '%d', '%s', '%s', '%s' )
        );
    }

    /**
     * Health check: detect stuck jobs, high failure rates, queue backlogs.
     */
    public static function run_health_check() {
        global $wpdb;

        $issues = array();

        // Stuck actions (in-progress for over 10 minutes)
        $stuck = (int) $wpdb->get_var(
            "SELECT COUNT(*) FROM {$wpdb->prefix}actionscheduler_actions
             WHERE status = 'in-progress'
             AND group_id = (
                SELECT group_id FROM {$wpdb->prefix}actionscheduler_groups
                WHERE slug = '" . self::GROUP . "' LIMIT 1
             )
             AND last_attempt_gmt < DATE_SUB( UTC_TIMESTAMP(), INTERVAL 10 MINUTE )"
        );

        if ( $stuck > 0 ) {
            $issues[] = sprintf( '%d jobs stuck in progress for over 10 minutes.', $stuck );
        }

        // Overdue pending actions
        $overdue = (int) $wpdb->get_var(
            "SELECT COUNT(*) FROM {$wpdb->prefix}actionscheduler_actions
             WHERE status = 'pending'
             AND scheduled_date_gmt < DATE_SUB( UTC_TIMESTAMP(), INTERVAL 5 MINUTE )
             AND group_id = (
                SELECT group_id FROM {$wpdb->prefix}actionscheduler_groups
                WHERE slug = '" . self::GROUP . "' LIMIT 1
             )"
        );

        if ( $overdue > 100 ) {
            $issues[] = sprintf( '%d overdue jobs in the queue.', $overdue );
        }

        // Recent dead letters
        $dead = (int) $wpdb->get_var(
            $wpdb->prepare(
                "SELECT COUNT(*) FROM {$wpdb->prefix}wpkite_dead_letters
                 WHERE created_at > %s",
                gmdate( 'Y-m-d H:i:s', time() - HOUR_IN_SECONDS )
            )
        );

        if ( $dead > 0 ) {
            $issues[] = sprintf( '%d jobs moved to dead letter queue in the last hour.', $dead );
        }

        if ( ! empty( $issues ) ) {
            do_action( 'wpkite_job_health_alert', $issues );

            wp_mail(
                get_option( 'admin_email' ),
                '[WPKite] Job Queue Health Alert',
                "The following issues were detected:\n\n" . implode( "\n", $issues )
            );
        }
    }

    /**
     * Get queue statistics.
     *
     * @return array Queue metrics.
     */
    public static function stats() {
        global $wpdb;

        $group_clause = "AND group_id = (
            SELECT group_id FROM {$wpdb->prefix}actionscheduler_groups
            WHERE slug = '" . self::GROUP . "' LIMIT 1
        )";

        return array(
            'pending'     => (int) $wpdb->get_var(
                "SELECT COUNT(*) FROM {$wpdb->prefix}actionscheduler_actions WHERE status = 'pending' $group_clause"
            ),
            'in_progress' => (int) $wpdb->get_var(
                "SELECT COUNT(*) FROM {$wpdb->prefix}actionscheduler_actions WHERE status = 'in-progress' $group_clause"
            ),
            'completed'   => (int) $wpdb->get_var(
                "SELECT COUNT(*) FROM {$wpdb->prefix}actionscheduler_actions
                 WHERE status = 'complete' $group_clause
                 AND last_attempt_gmt > DATE_SUB( UTC_TIMESTAMP(), INTERVAL 24 HOUR )"
            ),
            'failed'      => (int) $wpdb->get_var(
                "SELECT COUNT(*) FROM {$wpdb->prefix}actionscheduler_actions
                 WHERE status = 'failed' $group_clause
                 AND last_attempt_gmt > DATE_SUB( UTC_TIMESTAMP(), INTERVAL 24 HOUR )"
            ),
            'dead_letters' => (int) $wpdb->get_var(
                "SELECT COUNT(*) FROM {$wpdb->prefix}wpkite_dead_letters WHERE status = 'dead'"
            ),
        );
    }
}

// Initialize on plugins_loaded
add_action( 'plugins_loaded', array( 'WPKite_Job_Manager', 'init' ) );

Using the Job Manager

With the Job Manager in place, registering and dispatching jobs becomes straightforward:

// Register job types
WPKite_Job_Manager::register(
    'send_welcome_email',
    'wpkite_handle_welcome_email',
    array( 'max_retries' => 2, 'base_delay' => 30 )
);

WPKite_Job_Manager::register(
    'sync_product',
    'wpkite_handle_product_sync',
    array( 'max_retries' => 5, 'base_delay' => 60, 'timeout' => 120 )
);

WPKite_Job_Manager::register(
    'generate_report',
    'wpkite_handle_report_generation',
    array( 'max_retries' => 1, 'timeout' => 600 )
);

// Handlers
function wpkite_handle_welcome_email( $payload ) {
    $user = get_userdata( $payload['user_id'] );
    if ( ! $user ) {
        throw new Exception( 'User not found: ' . $payload['user_id'] );
    }

    $sent = wp_mail(
        $user->user_email,
        'Welcome to WPKite!',
        wpkite_render_email_template( 'welcome', array( 'name' => $user->display_name ) ),
        array( 'Content-Type: text/html; charset=UTF-8' )
    );

    if ( ! $sent ) {
        throw new Exception( 'Failed to send welcome email to ' . $user->user_email );
    }
}

function wpkite_handle_product_sync( $payload ) {
    $product_id = $payload['product_id'];
    $api        = new External_Product_API();

    $result = $api->push( get_post( $product_id ) );

    if ( is_wp_error( $result ) ) {
        throw new Exception( $result->get_error_message() );
    }

    update_post_meta( $product_id, '_last_synced', current_time( 'mysql' ) );
}

// Dispatch jobs
WPKite_Job_Manager::dispatch( 'send_welcome_email', array( 'user_id' => 42 ) );

WPKite_Job_Manager::dispatch( 'sync_product', array( 'product_id' => 1001 ), 300 ); // 5min delay

// Batch dispatch
$product_ids = range( 1001, 1500 );
$payloads    = array_map( function( $id ) {
    return array( 'product_id' => $id );
}, $product_ids );

$result = WPKite_Job_Manager::dispatch_batch( 'sync_product', $payloads );
// $result['batch_id'], $result['dispatched'], etc.

// Check stats
$stats = WPKite_Job_Manager::stats();
// $stats['pending'], $stats['completed'], $stats['dead_letters'], etc.

Database Tables for the Job Manager

The Job Manager requires two custom tables. Create them during plugin or theme activation:

function wpkite_create_job_tables() {
    global $wpdb;
    $charset = $wpdb->get_charset_collate();

    $job_log_table = $wpdb->prefix . 'wpkite_job_log';
    $dlq_table     = $wpdb->prefix . 'wpkite_dead_letters';

    $sql = "CREATE TABLE IF NOT EXISTS $job_log_table (
        id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
        job_type varchar(100) NOT NULL,
        status varchar(50) NOT NULL,
        job_id varchar(36) NOT NULL DEFAULT '',
        duration_ms int(11) DEFAULT NULL,
        memory_peak bigint(20) DEFAULT NULL,
        message text NOT NULL DEFAULT '',
        context longtext NOT NULL DEFAULT '',
        created_at datetime NOT NULL,
        PRIMARY KEY (id),
        KEY job_type_status (job_type, status),
        KEY job_id (job_id),
        KEY created_at (created_at)
    ) $charset;

    CREATE TABLE IF NOT EXISTS $dlq_table (
        id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
        job_type varchar(100) NOT NULL,
        payload longtext NOT NULL,
        error_message text NOT NULL,
        attempts int(11) NOT NULL DEFAULT 0,
        status varchar(20) NOT NULL DEFAULT 'dead',
        created_at datetime NOT NULL,
        reprocessed_at datetime DEFAULT NULL,
        PRIMARY KEY (id),
        KEY job_type (job_type),
        KEY status (status),
        KEY created_at (created_at)
    ) $charset;";

    require_once ABSPATH . 'wp-admin/includes/upgrade.php';
    dbDelta( $sql );
}
register_activation_hook( __FILE__, 'wpkite_create_job_tables' );

Choosing the Right Approach for Your Project

After walking through all of these options, the natural question is: which one should I use? The answer depends on your specific requirements, but here are clear guidelines.

Use WP-Cron if you have fewer than a dozen scheduled tasks, all tasks complete in under 10 seconds, and you do not need retry logic or failure tracking. Plugin update checks, scheduled post publishing, and transient cleanup are good fits. Pair it with a real system cron for reliable timing.

Use WP Background Processing if you need to process a batch of items (hundreds to low thousands) that arrive together and are processed sequentially. Bulk image optimization, one-time data migrations, and post-import cleanup tasks work well. The library handles memory limits and self-recovery automatically.

Use Action Scheduler for most production job queue needs. It handles thousands of jobs per hour, provides built-in logging and failure tracking, supports concurrent processing, and has an admin UI for debugging. It is already included if you use WooCommerce. For sites processing up to 50,000 jobs per day, Action Scheduler with WP-CLI runners is the sweet spot of capability versus complexity.

Use Redis or SQS when you exceed what MySQL can handle as a queue backend. Signs you have outgrown Action Scheduler include: queue table growing past 1 million rows, claim lock contention causing slow queries, and processing latency increasing as queue depth grows. At this point, the operational complexity of running Redis or SQS is justified by the performance and reliability gains.

Regardless of which approach you choose, the patterns in this article apply universally. Exponential backoff, dead-letter queues, structured logging, health monitoring, and memory management are not specific to any particular queue technology. They are engineering practices that make the difference between a job queue that works in development and one that survives production.

Final Recommendations for Production Deployment

Before shipping any job queue system to production, run through this checklist:

Disable WP-Cron’s traffic trigger. Set DISABLE_WP_CRON to true in wp-config.php and add a system cron entry. This is non-negotiable for any site with background processing.

Set appropriate PHP limits. Workers need more memory and execution time than web requests. Set memory_limit to at least 256M and max_execution_time to 0 (unlimited) for CLI processes. Do this in a CLI-specific php.ini or in your worker bootstrap code.

Monitor queue depth. A growing queue means your workers cannot keep up with incoming jobs. Set up alerts when queue depth exceeds a threshold. Track the trend over time. If the queue grows during peak hours but drains overnight, that might be acceptable. If it only grows and never drains, you need more workers or faster job handlers.

Test failure modes. Kill a worker mid-job and verify the job gets picked up again. Simulate a database outage and confirm jobs are not lost. Send malformed data through your handlers and verify it hits the dead-letter queue instead of crashing the worker. These tests should be part of your deployment process, not afterthoughts.

Log retention matters. Job logs grow fast. A system processing 10,000 jobs per day with 5 log entries per job generates 50,000 log rows daily, or 1.5 million per month. Set up automated cleanup to purge logs older than 30 days (or whatever your audit requirements dictate). Action Scheduler has a built-in retention filter; use it.

Separate queues by priority. Do not process payment webhooks and thumbnail generation in the same queue. Payment processing needs low latency and high reliability. Image work is high volume and tolerant of delays. Use separate Action Scheduler groups or separate Redis queues, each with their own worker pool tuned for the workload characteristics.

Plan for zero-downtime deployments. When you deploy new code, running workers still have the old code loaded. Use the restart-on-threshold pattern to cycle workers through quickly, or implement a graceful shutdown signal (SIGTERM handler via pcntl_signal) that lets the current job finish before the worker exits and gets restarted with new code.

WordPress’s background processing story has come a long way from the early days of WP-Cron. With Action Scheduler, WP-CLI workers, and the patterns described here, you can build job queue systems that handle real production loads without leaving the WordPress ecosystem. When those loads outgrow what PHP and MySQL can handle, the external queue integrations provide a clear upgrade path. The key is choosing the right tool for your current scale, building with monitoring and failure handling from the start, and knowing when to graduate to the next level.