WordPress and Elasticsearch: Building Custom Search Experiences with ElasticPress and Relevance Tuning
Why Default WordPress Search Falls Short
WordPress ships with a search system built on MySQL LIKE queries. For a small blog with fifty posts, this works fine. But the moment your site grows beyond a few hundred posts, adds custom post types, stores data in Advanced Custom Fields, or needs to handle WooCommerce product catalogs, the default search becomes a liability.
The core problem is architectural. MySQL was designed for relational data storage, not full-text search at scale. WordPress search runs queries like WHERE post_content LIKE '%search_term%', which means the database performs a full table scan on every request. There is no relevance scoring. There is no understanding of synonyms, stemming, or partial matches. A user searching for “running shoes” will not find a product described as “jogging sneakers.”
Elasticsearch solves this by providing a purpose-built search engine backed by Apache Lucene. It indexes your content into an inverted index structure, scores results by relevance, and returns results in milliseconds even across millions of documents. ElasticPress, maintained by 10up, bridges WordPress and Elasticsearch so that standard WP_Query calls are transparently routed to Elasticsearch without rewriting your theme or plugin code.
This article covers the full pipeline: connecting Elasticsearch to WordPress through ElasticPress, building custom index mappings, tuning relevance with field weights and function scores, creating faceted search interfaces with aggregations, implementing autocomplete, handling multilingual content, and monitoring search quality in production.
Elasticsearch Fundamentals for WordPress Developers
Before touching any WordPress code, you need a mental model of how Elasticsearch organizes and retrieves data.
Documents, Indices, and Shards
Elasticsearch stores data as JSON documents. Each document represents a single searchable item, like a WordPress post, a WooCommerce product, or a user profile. Documents live inside an index, which is roughly analogous to a database table. Each index is split into shards, and each shard is a self-contained Lucene index that can live on any node in an Elasticsearch cluster.
When you index a WordPress post, ElasticPress converts it to a JSON document like this:
{
"post_id": 1042,
"post_title": "How to Optimize WordPress Database Queries",
"post_content": "Database optimization is critical for WordPress sites...",
"post_type": "post",
"post_status": "publish",
"post_date": "2022-10-15T08:30:00",
"terms": {
"category": [
{ "term_id": 5, "slug": "performance", "name": "Performance" }
],
"post_tag": [
{ "term_id": 22, "slug": "mysql", "name": "MySQL" },
{ "term_id": 31, "slug": "optimization", "name": "Optimization" }
]
},
"meta": {
"reading_time": ["8"],
"difficulty_level": ["intermediate"]
}
}
This document is analyzed during indexing. Elasticsearch tokenizes the text fields, applies analyzers (lowercase, stemming, stop-word removal), and stores the results in an inverted index. At query time, the same analysis pipeline runs on the search query, and Elasticsearch compares the analyzed tokens to find matches.
Inverted Index and Relevance Scoring
An inverted index maps every unique term to the list of documents that contain it. If three posts contain the word “caching,” the inverted index entry for “caching” points to those three document IDs along with positional data (where in each document the term appears, how many times it appears).
Elasticsearch uses BM25 (Best Match 25) as its default scoring algorithm. BM25 considers three factors: term frequency (how often the term appears in the document), inverse document frequency (how rare the term is across all documents), and field length normalization (shorter fields receive a slight scoring boost). This means a post titled “WordPress Caching Guide” will score higher for the query “caching” than a post where “caching” appears once in a 3,000-word article body.
Analyzers and Tokenization
When Elasticsearch indexes a field, it passes the text through an analyzer chain. The standard analyzer performs three steps: character filtering (stripping HTML tags, for instance), tokenization (splitting text into individual tokens at whitespace and punctuation boundaries), and token filtering (lowercasing, removing stop words, applying stemming).
Consider the text “WordPress’s built-in search isn’t great.” The standard analyzer produces these tokens: ["wordpress's", "built", "in", "search", "isn't", "great"]. With a custom analyzer using the English stemmer, you would get: ["wordpress", "built", "search", "great"], stripping possessives and contractions and reducing words to their root forms.
Understanding this pipeline is critical because it directly affects what queries will match. If your analyzer stems “running” to “run,” then a search for “runs” (also stemmed to “run”) will match documents about running. This is the foundation of search quality that MySQL LIKE queries cannot replicate.
ElasticPress Architecture: How It Intercepts WP_Query
ElasticPress works by hooking into WordPress at a fundamental level. When a WP_Query fires, ElasticPress checks whether it should be offloaded to Elasticsearch. If so, it translates the WP_Query arguments into an Elasticsearch query DSL request, sends it to the cluster, receives scored results, and returns them to WordPress in the expected format. Your theme code, pagination, and template hierarchy remain unchanged.
The Interception Mechanism
ElasticPress hooks into the pre_get_posts action and the posts_pre_query filter. When a query is eligible for Elasticsearch (based on post type, search context, and feature settings), ElasticPress short-circuits the MySQL query entirely. It builds an Elasticsearch request from the WP_Query arguments, fires it against the cluster, and maps the returned document IDs back to WordPress post objects.
Here is a simplified view of the flow:
// 1. Standard WordPress query
$query = new WP_Query([
's' => 'caching plugins',
'post_type' => 'post',
'posts_per_page' => 10,
]);
// 2. ElasticPress intercepts via posts_pre_query
// 3. Translates to Elasticsearch DSL:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "caching plugins",
"fields": ["post_title^3", "post_content", "post_excerpt^2"],
"type": "cross_fields",
"operator": "and"
}
}
],
"filter": [
{ "term": { "post_type.raw": "post" } },
{ "term": { "post_status": "publish" } }
]
}
},
"size": 10
}
// 4. Elasticsearch returns scored document IDs
// 5. ElasticPress fetches full WP_Post objects by ID
// 6. Results returned to the calling code
The translation layer handles taxonomy queries, meta queries, date queries, orderby parameters, and pagination. ElasticPress maps tax_query arguments to Elasticsearch term filters, meta_query to nested or keyword queries on the meta object, and date_query to range filters on the post_date field.
Feature Architecture
ElasticPress organizes its functionality into “features” that can be toggled independently. The core features include:
Post Search powers the main site search by intercepting search queries and routing them through Elasticsearch.
Autosuggest provides real-time search suggestions as users type.
WooCommerce integrates product search, layered navigation, and product ordering with Elasticsearch.
Related Posts uses a More Like This query to find content similar to the currently viewed post.
Protected Content indexes draft, scheduled, and private posts for admin-side search.
Custom Search Results allows editors to pin specific results to the top for given search terms.
Each feature registers its own filters on the Elasticsearch query DSL, the indexing process, and the mapping schema. This modular design means you can extend individual features without touching the core plugin.
Custom Index Mappings: ACF Fields, WooCommerce Attributes, and Custom Meta
Out of the box, ElasticPress indexes standard WordPress fields: title, content, excerpt, date, author, taxonomies, and post meta. But real-world sites store critical data in places the default mapping does not optimize for. ACF fields might hold product specifications. WooCommerce attributes define filterable product properties. Custom meta fields store pricing tiers, event dates, or location data.
Extending the Mapping for ACF Fields
ElasticPress stores post meta as a flat object where each meta key maps to an array of string values. This works for basic filtering but fails for numeric range queries or precise keyword matching. To index ACF fields with proper types, you need to modify the mapping.
/**
* Add custom mappings for ACF fields to the Elasticsearch index.
*/
add_filter( 'ep_post_mapping', function( $mapping ) {
// Add a dedicated object for ACF fields with explicit types
$mapping['mappings']['properties']['acf'] = [
'type' => 'object',
'properties' => [
'event_date' => [
'type' => 'date',
'format' => 'yyyy-MM-dd',
],
'ticket_price' => [
'type' => 'float',
],
'venue_location' => [
'type' => 'geo_point',
],
'difficulty_level' => [
'type' => 'keyword',
],
'event_description' => [
'type' => 'text',
'analyzer' => 'standard',
],
],
];
return $mapping;
});
/**
* Populate ACF fields during post sync.
*/
add_filter( 'ep_post_sync_args_post_prepare_meta', function( $post_args, $post_id ) {
$post_args['acf'] = [
'event_date' => get_field( 'event_date', $post_id ) ?: null,
'ticket_price' => (float) get_field( 'ticket_price', $post_id ),
'venue_location' => get_field( 'venue_location', $post_id ) ?: null,
'difficulty_level' => get_field( 'difficulty_level', $post_id ) ?: '',
'event_description' => get_field( 'event_description', $post_id ) ?: '',
];
return $post_args;
}, 10, 2 );
After adding this filter, run a full reindex with wp elasticpress index --setup. The --setup flag recreates the mapping before indexing, which is required whenever you change field types.
WooCommerce Product Attributes
WooCommerce stores product attributes as taxonomy terms (for global attributes) or post meta (for custom product-level attributes). ElasticPress’s WooCommerce feature handles global attributes automatically, but custom attributes and computed fields need manual indexing.
/**
* Index WooCommerce product data for search and filtering.
*/
add_filter( 'ep_post_sync_args_post_prepare_meta', function( $post_args, $post_id ) {
if ( 'product' !== get_post_type( $post_id ) ) {
return $post_args;
}
$product = wc_get_product( $post_id );
if ( ! $product ) {
return $post_args;
}
$post_args['woo_data'] = [
'regular_price' => (float) $product->get_regular_price(),
'sale_price' => (float) $product->get_sale_price(),
'effective_price' => (float) $product->get_price(),
'sku' => $product->get_sku(),
'stock_status' => $product->get_stock_status(),
'average_rating' => (float) $product->get_average_rating(),
'review_count' => (int) $product->get_review_count(),
'total_sales' => (int) get_post_meta( $post_id, 'total_sales', true ),
'is_on_sale' => $product->is_on_sale(),
'weight' => $product->get_weight(),
'dimensions' => [
'length' => $product->get_length(),
'width' => $product->get_width(),
'height' => $product->get_height(),
],
];
// Index all product attributes for faceted filtering
$attributes = $product->get_attributes();
$post_args['product_attributes'] = [];
foreach ( $attributes as $attr_name => $attribute ) {
if ( $attribute->is_taxonomy() ) {
$terms = wp_get_post_terms( $post_id, $attribute->get_name(), [ 'fields' => 'names' ] );
$post_args['product_attributes'][ $attr_name ] = $terms;
} else {
$post_args['product_attributes'][ $attr_name ] = $attribute->get_options();
}
}
return $post_args;
}, 10, 2 );
/**
* Add WooCommerce field mappings.
*/
add_filter( 'ep_post_mapping', function( $mapping ) {
$mapping['mappings']['properties']['woo_data'] = [
'type' => 'object',
'properties' => [
'regular_price' => [ 'type' => 'float' ],
'sale_price' => [ 'type' => 'float' ],
'effective_price' => [ 'type' => 'float' ],
'sku' => [ 'type' => 'keyword' ],
'stock_status' => [ 'type' => 'keyword' ],
'average_rating' => [ 'type' => 'float' ],
'review_count' => [ 'type' => 'integer' ],
'total_sales' => [ 'type' => 'integer' ],
'is_on_sale' => [ 'type' => 'boolean' ],
],
];
$mapping['mappings']['properties']['product_attributes'] = [
'type' => 'object',
'dynamic' => true,
];
return $mapping;
});
With this setup, you can run Elasticsearch queries that filter by price range, stock availability, and average rating, all without touching MySQL.
Handling Custom Meta with Type Awareness
A common pitfall with ElasticPress is that all post meta values are indexed as strings by default. If you store a numeric value like update_post_meta( $post_id, 'view_count', 1500 ), it gets indexed as the string “1500.” This breaks range queries and numeric sorting.
The solution is to create typed fields in your mapping and populate them during sync:
add_filter( 'ep_post_mapping', function( $mapping ) {
$mapping['mappings']['properties']['typed_meta'] = [
'type' => 'object',
'properties' => [
'view_count' => [ 'type' => 'integer' ],
'reading_time' => [ 'type' => 'integer' ],
'content_score' => [ 'type' => 'float' ],
'is_featured' => [ 'type' => 'boolean' ],
'publish_region' => [ 'type' => 'keyword' ],
],
];
return $mapping;
});
add_filter( 'ep_post_sync_args_post_prepare_meta', function( $post_args, $post_id ) {
$post_args['typed_meta'] = [
'view_count' => (int) get_post_meta( $post_id, 'view_count', true ),
'reading_time' => (int) get_post_meta( $post_id, 'reading_time', true ),
'content_score' => (float) get_post_meta( $post_id, 'content_score', true ),
'is_featured' => (bool) get_post_meta( $post_id, 'is_featured', true ),
'publish_region' => get_post_meta( $post_id, 'publish_region', true ) ?: 'global',
];
return $post_args;
}, 10, 2 );
Relevance Tuning: Field Weighting, Function Scores, and Boosting
The default search results from ElasticPress are a significant improvement over MySQL, but they still use generic field weights. For most sites, the title should matter more than the body, tags should carry meaning, and recent content should rank higher than older content. Relevance tuning is where you shape these behaviors.
Field Weighting
ElasticPress allows you to adjust field weights through the ep_formatted_args filter. This filter gives you access to the full Elasticsearch query DSL before it is sent to the cluster.
/**
* Customize field weights for search relevance.
*/
add_filter( 'ep_formatted_args', function( $formatted_args, $args ) {
// Only modify search queries
if ( empty( $args['s'] ) ) {
return $formatted_args;
}
// Walk through the query to find multi_match clauses
if ( isset( $formatted_args['query']['bool']['should'] ) ) {
foreach ( $formatted_args['query']['bool']['should'] as &$clause ) {
if ( isset( $clause['multi_match'] ) ) {
// Override field weights
$clause['multi_match']['fields'] = [
'post_title^5', // Title is most important
'post_title.analyzed^3', // Analyzed title catches stemmed matches
'post_excerpt^2', // Excerpt is a focused summary
'post_content', // Body has base weight of 1
'terms.post_tag.name^2', // Tags indicate topic relevance
'terms.category.name^1.5', // Categories are broader signals
'meta.custom_subtitle.value^2', // Custom fields can carry weight
'author_name', // Author matches are sometimes useful
];
}
}
}
return $formatted_args;
}, 20, 2 );
The caret notation (^5) is a multiplier on the score contribution from that field. A title match with weight 5 contributes five times more to the overall score than a body match with weight 1. Finding the right weights requires experimentation. Start with title at 5x, excerpt at 2x, and body at 1x, then adjust based on real search queries from your analytics.
Function Score: Boosting by Freshness and Popularity
Field weighting controls which fields matter. Function scores control how external signals (date, popularity, rating) influence ranking. A function_score query wraps your original query and applies mathematical functions to modify each document’s score.
/**
* Apply function score for freshness and popularity boosting.
*/
add_filter( 'ep_formatted_args', function( $formatted_args, $args ) {
if ( empty( $args['s'] ) ) {
return $formatted_args;
}
$original_query = $formatted_args['query'];
$formatted_args['query'] = [
'function_score' => [
'query' => $original_query,
'functions' => [
// Boost newer content with exponential decay
[
'exp' => [
'post_date' => [
'origin' => 'now',
'scale' => '90d',
'offset' => '7d',
'decay' => 0.5,
],
],
'weight' => 2,
],
// Boost popular content by view count
[
'field_value_factor' => [
'field' => 'typed_meta.view_count',
'factor' => 1.2,
'modifier' => 'log1p',
'missing' => 1,
],
'weight' => 1.5,
],
// Boost featured posts
[
'filter' => [
'term' => [
'typed_meta.is_featured' => true,
],
],
'weight' => 3,
],
// Slight boost for posts with higher content scores
[
'field_value_factor' => [
'field' => 'typed_meta.content_score',
'factor' => 1,
'modifier' => 'sqrt',
'missing' => 0,
],
'weight' => 1,
],
],
'score_mode' => 'sum',
'boost_mode' => 'multiply',
'max_boost' => 10,
],
];
return $formatted_args;
}, 25, 2 );
Let me break down the key parameters here.
The exp decay function reduces the boost based on the distance from the origin. Content published within the last 7 days (offset) gets the full boost. Content older than 90 days (scale) receives half the boost (decay: 0.5). Content older than that decays further toward zero.
The field_value_factor function uses a document field value directly in scoring. The log1p modifier applies a logarithmic curve so that the difference between 100 and 1,000 views matters more than the difference between 10,000 and 10,900 views. The missing parameter provides a default for documents that lack the field.
The score_mode determines how multiple function scores combine (sum, multiply, average, max, min). The boost_mode determines how the combined function score interacts with the original query score. Using multiply means a highly relevant document that is also recent and popular will score dramatically higher than one that matches only on text relevance.
Query-Time Boosting for Specific Contexts
Sometimes you need relevance adjustments that depend on context. A search on a WooCommerce shop should boost in-stock products. A search on a news site should heavily favor recent content. You can conditionally apply boosts based on the page context:
/**
* Context-aware relevance boosting.
*/
add_filter( 'ep_formatted_args', function( $formatted_args, $args ) {
if ( empty( $args['s'] ) ) {
return $formatted_args;
}
// Boost in-stock products when searching the shop
if ( isset( $args['post_type'] ) && $args['post_type'] === 'product' ) {
$formatted_args['query']['bool']['should'][] = [
'term' => [
'woo_data.stock_status' => [
'value' => 'instock',
'boost' => 5,
],
],
];
// Penalize out-of-stock items
$formatted_args['query']['bool']['should'][] = [
'term' => [
'woo_data.stock_status' => [
'value' => 'outofstock',
'boost' => 0.1,
],
],
];
}
// On the blog, heavily boost content from the last 30 days
if ( isset( $args['post_type'] ) && $args['post_type'] === 'post' ) {
$formatted_args['query']['bool']['should'][] = [
'range' => [
'post_date' => [
'gte' => 'now-30d',
'boost' => 3,
],
],
];
}
return $formatted_args;
}, 30, 2 );
Building Faceted Search with Aggregations
Faceted search lets users refine results by clicking filters: category, price range, rating, author, date range. In Elasticsearch, facets are powered by aggregations, which compute summary data (counts, ranges, averages) across the result set in a single query.
Adding Aggregations to ElasticPress Queries
ElasticPress does not add aggregations by default, so you need to inject them into the query DSL and then process the results.
/**
* Add aggregations for faceted search.
*/
add_filter( 'ep_formatted_args', function( $formatted_args, $args ) {
if ( empty( $args['s'] ) ) {
return $formatted_args;
}
$formatted_args['aggs'] = [
// Category facet
'category_facet' => [
'terms' => [
'field' => 'terms.category.slug',
'size' => 20,
'order' => [ '_count' => 'desc' ],
],
],
// Tag facet
'tag_facet' => [
'terms' => [
'field' => 'terms.post_tag.slug',
'size' => 30,
],
],
// Author facet
'author_facet' => [
'terms' => [
'field' => 'post_author.display_name.raw',
'size' => 15,
],
],
// Date histogram for temporal filtering
'date_histogram' => [
'date_histogram' => [
'field' => 'post_date',
'calendar_interval' => 'month',
'format' => 'yyyy-MM',
'min_doc_count' => 1,
],
],
// Price range facet for products
'price_ranges' => [
'range' => [
'field' => 'woo_data.effective_price',
'ranges' => [
[ 'key' => 'under_25', 'to' => 25 ],
[ 'key' => '25_to_50', 'from' => 25, 'to' => 50 ],
[ 'key' => '50_to_100', 'from' => 50, 'to' => 100 ],
[ 'key' => '100_to_250', 'from' => 100, 'to' => 250 ],
[ 'key' => 'over_250', 'from' => 250 ],
],
],
],
// Average rating stats
'rating_stats' => [
'stats' => [
'field' => 'woo_data.average_rating',
],
],
];
return $formatted_args;
}, 20, 2 );
Retrieving and Displaying Aggregation Results
After ElasticPress executes the query, you need to capture the aggregation data from the raw Elasticsearch response and pass it to your template.
/**
* Store aggregation results from the Elasticsearch response.
*/
add_action( 'ep_valid_response', function( $response, $query, $args ) {
if ( ! isset( $response['aggregations'] ) ) {
return;
}
// Store aggregations in a global for template access
global $wpkite_search_facets;
$wpkite_search_facets = $response['aggregations'];
}, 10, 3 );
/**
* Render category facets in the search sidebar.
*/
function wpkite_render_category_facets() {
global $wpkite_search_facets;
if ( empty( $wpkite_search_facets['category_facet']['buckets'] ) ) {
return;
}
$active_category = isset( $_GET['filter_category'] )
? sanitize_text_field( $_GET['filter_category'] )
: '';
echo '<div class="search-facet">';
echo '<h3>Categories</h3>';
echo '<ul>';
foreach ( $wpkite_search_facets['category_facet']['buckets'] as $bucket ) {
$slug = esc_attr( $bucket['key'] );
$count = (int) $bucket['doc_count'];
$term = get_term_by( 'slug', $slug, 'category' );
$name = $term ? esc_html( $term->name ) : esc_html( $slug );
$active = ( $slug === $active_category ) ? ' class="active"' : '';
$url = add_query_arg( 'filter_category', $slug );
printf(
'<li%s><a href="%s">%s (%d)</a></li>',
$active,
esc_url( $url ),
$name,
$count
);
}
echo '</ul>';
echo '</div>';
}
/**
* Apply facet filters to the Elasticsearch query.
*/
add_filter( 'ep_formatted_args', function( $formatted_args, $args ) {
if ( empty( $args['s'] ) ) {
return $formatted_args;
}
// Category filter
if ( ! empty( $_GET['filter_category'] ) ) {
$category = sanitize_text_field( $_GET['filter_category'] );
$formatted_args['query']['bool']['filter'][] = [
'term' => [ 'terms.category.slug' => $category ],
];
}
// Price range filter
if ( ! empty( $_GET['price_min'] ) || ! empty( $_GET['price_max'] ) ) {
$range = [];
if ( ! empty( $_GET['price_min'] ) ) {
$range['gte'] = (float) $_GET['price_min'];
}
if ( ! empty( $_GET['price_max'] ) ) {
$range['lte'] = (float) $_GET['price_max'];
}
$formatted_args['query']['bool']['filter'][] = [
'range' => [ 'woo_data.effective_price' => $range ],
];
}
// Rating filter
if ( ! empty( $_GET['min_rating'] ) ) {
$formatted_args['query']['bool']['filter'][] = [
'range' => [
'woo_data.average_rating' => [
'gte' => (float) $_GET['min_rating'],
],
],
];
}
return $formatted_args;
}, 15, 2 );
Post-Filter Aggregations
There is a subtlety with aggregations and filters. If you add a category filter to the main query, the category facet counts will only show the selected category. Users lose the ability to see other available categories. The solution is to use post_filter for facet selections and run aggregations on the unfiltered query:
add_filter( 'ep_formatted_args', function( $formatted_args, $args ) {
if ( empty( $args['s'] ) ) {
return $formatted_args;
}
// Move facet filters to post_filter so aggregations remain unaffected
if ( ! empty( $_GET['filter_category'] ) ) {
$category = sanitize_text_field( $_GET['filter_category'] );
$formatted_args['post_filter']['bool']['filter'][] = [
'term' => [ 'terms.category.slug' => $category ],
];
}
return $formatted_args;
}, 15, 2 );
With post_filter, Elasticsearch computes aggregations on the full result set but applies the filter only when returning documents. This is the standard pattern for e-commerce faceted navigation.
Autocomplete with Completion Suggesters
Autocomplete (search-as-you-type) is one of the most impactful search UX improvements you can make. Users expect instant suggestions after typing two or three characters. Elasticsearch provides two approaches: prefix queries on analyzed fields and dedicated completion suggesters.
The Completion Suggester Approach
Completion suggesters use a specialized data structure (an FST, or finite state transducer) that lives entirely in memory. They are optimized for speed, returning suggestions in sub-millisecond time, but they require a dedicated field in your mapping.
/**
* Add a completion suggester field to the mapping.
*/
add_filter( 'ep_post_mapping', function( $mapping ) {
$mapping['mappings']['properties']['title_suggest'] = [
'type' => 'completion',
'analyzer' => 'simple',
'preserve_separators' => true,
'preserve_position_increments' => true,
'max_input_length' => 50,
];
$mapping['mappings']['properties']['search_suggest'] = [
'type' => 'completion',
'analyzer' => 'simple',
'contexts' => [
[
'name' => 'post_type',
'type' => 'category',
],
],
];
return $mapping;
});
/**
* Populate the suggestion fields during indexing.
*/
add_filter( 'ep_post_sync_args_post_prepare_meta', function( $post_args, $post_id ) {
$post = get_post( $post_id );
$title = $post->post_title;
$inputs = [ $title ];
// Add individual words as inputs for mid-word matching
$words = explode( ' ', strtolower( $title ) );
foreach ( $words as $i => $word ) {
if ( $i > 0 ) {
$inputs[] = implode( ' ', array_slice( $words, $i ) );
}
}
// Add category names as inputs
$categories = wp_get_post_categories( $post_id, [ 'fields' => 'names' ] );
foreach ( $categories as $cat_name ) {
$inputs[] = $cat_name;
}
$post_args['title_suggest'] = [
'input' => $inputs,
'weight' => max( 1, (int) get_post_meta( $post_id, 'view_count', true ) / 100 ),
];
$post_args['search_suggest'] = [
'input' => $inputs,
'weight' => max( 1, (int) get_post_meta( $post_id, 'view_count', true ) / 100 ),
'contexts' => [
'post_type' => [ get_post_type( $post_id ) ],
],
];
return $post_args;
}, 10, 2 );
The AJAX Endpoint for Autocomplete
On the frontend, you need an AJAX endpoint that accepts partial input and returns suggestions:
/**
* Register the autocomplete AJAX endpoint.
*/
add_action( 'wp_ajax_wpkite_autocomplete', 'wpkite_handle_autocomplete' );
add_action( 'wp_ajax_nopriv_wpkite_autocomplete', 'wpkite_handle_autocomplete' );
function wpkite_handle_autocomplete() {
$query = isset( $_GET['q'] ) ? sanitize_text_field( $_GET['q'] ) : '';
if ( strlen( $query ) < 2 ) {
wp_send_json_success( [] );
}
$post_type = isset( $_GET['type'] )
? sanitize_text_field( $_GET['type'] )
: null;
// Build the suggest query
$suggest_body = [
'suggest' => [
'title-suggestions' => [
'prefix' => $query,
'completion' => [
'field' => 'search_suggest',
'size' => 8,
'skip_duplicates' => true,
'fuzzy' => [
'fuzziness' => 'AUTO',
],
],
],
],
'_source' => [ 'post_id', 'post_title', 'post_type', 'permalink' ],
];
// Add context filtering if post type is specified
if ( $post_type ) {
$suggest_body['suggest']['title-suggestions']['completion']['contexts'] = [
'post_type' => [ $post_type ],
];
}
// Send directly to Elasticsearch
$response = ElasticPress\Elasticsearch::factory()->remote_request(
ElasticPress\Indexables::factory()->get( 'post' )->get_index_name() . '/_search',
[
'method' => 'POST',
'body' => wp_json_encode( $suggest_body ),
]
);
if ( is_wp_error( $response ) ) {
wp_send_json_error( 'Search service unavailable' );
}
$body = json_decode( wp_remote_retrieve_body( $response ), true );
$results = [];
if ( isset( $body['suggest']['title-suggestions'][0]['options'] ) ) {
foreach ( $body['suggest']['title-suggestions'][0]['options'] as $option ) {
$source = $option['_source'];
$results[] = [
'id' => $source['post_id'],
'title' => $source['post_title'],
'post_type' => $source['post_type'],
'url' => get_permalink( $source['post_id'] ),
];
}
}
wp_send_json_success( $results );
}
Frontend JavaScript for the Autocomplete UI
The frontend component sends keystrokes to the AJAX endpoint and renders a dropdown of suggestions. Here is a minimal implementation using vanilla JavaScript with a debounce to avoid hammering the server:
class WPKiteAutocomplete {
constructor( inputSelector, resultsSelector ) {
this.input = document.querySelector( inputSelector );
this.results = document.querySelector( resultsSelector );
this.debounceTimer = null;
this.minChars = 2;
this.debounceMs = 200;
this.selectedIndex = -1;
if ( ! this.input || ! this.results ) return;
this.input.addEventListener( 'input', () => this.onInput() );
this.input.addEventListener( 'keydown', ( e ) => this.onKeydown( e ) );
document.addEventListener( 'click', ( e ) => {
if ( ! this.input.contains( e.target ) && ! this.results.contains( e.target ) ) {
this.hideResults();
}
});
}
onInput() {
clearTimeout( this.debounceTimer );
const query = this.input.value.trim();
if ( query.length < this.minChars ) {
this.hideResults();
return;
}
this.debounceTimer = setTimeout( () => this.fetchSuggestions( query ), this.debounceMs );
}
async fetchSuggestions( query ) {
const url = `${wpkiteAjax.ajaxurl}?action=wpkite_autocomplete&q=${encodeURIComponent( query )}`;
try {
const response = await fetch( url );
const data = await response.json();
if ( data.success && data.data.length > 0 ) {
this.renderResults( data.data );
} else {
this.hideResults();
}
} catch ( err ) {
this.hideResults();
}
}
renderResults( items ) {
this.selectedIndex = -1;
this.results.innerHTML = items.map( ( item, index ) =>
`
${this.escapeHtml( item.title )}
${item.post_type}
`
).join( '' );
this.results.style.display = 'block';
this.results.querySelectorAll( '.autocomplete-item' ).forEach( ( li ) => {
li.addEventListener( 'click', () => {
window.location.href = li.dataset.url;
});
});
}
onKeydown( e ) {
const items = this.results.querySelectorAll( '.autocomplete-item' );
if ( ! items.length ) return;
if ( e.key === 'ArrowDown' ) {
e.preventDefault();
this.selectedIndex = Math.min( this.selectedIndex + 1, items.length - 1 );
this.highlightItem( items );
} else if ( e.key === 'ArrowUp' ) {
e.preventDefault();
this.selectedIndex = Math.max( this.selectedIndex - 1, 0 );
this.highlightItem( items );
} else if ( e.key === 'Enter' && this.selectedIndex >= 0 ) {
e.preventDefault();
window.location.href = items[ this.selectedIndex ].dataset.url;
} else if ( e.key === 'Escape' ) {
this.hideResults();
}
}
highlightItem( items ) {
items.forEach( ( item, i ) => {
item.classList.toggle( 'highlighted', i === this.selectedIndex );
});
}
hideResults() {
this.results.style.display = 'none';
this.results.innerHTML = '';
}
escapeHtml( str ) {
const div = document.createElement( 'div' );
div.textContent = str;
return div.innerHTML;
}
}
document.addEventListener( 'DOMContentLoaded', () => {
new WPKiteAutocomplete( '#search-input', '#autocomplete-results' );
});
Multilingual Search Strategies
Multilingual WordPress sites present a unique search challenge. A single Elasticsearch index needs to handle content in multiple languages, each with its own stemming rules, stop words, and character sets. There are three main approaches.
Per-Language Index Strategy
The cleanest approach is to create a separate Elasticsearch index for each language. Each index uses the appropriate language analyzer. When a user searches, the query is routed to the index matching their current language.
/**
* Create per-language index names.
* Works with WPML or Polylang.
*/
add_filter( 'ep_index_name', function( $index_name, $blog_id ) {
$current_lang = function_exists( 'pll_current_language' )
? pll_current_language()
: apply_filters( 'wpml_current_language', 'en' );
return $index_name . '-' . $current_lang;
}, 10, 2 );
/**
* Set the correct analyzer for each language index.
*/
add_filter( 'ep_post_mapping', function( $mapping ) {
$current_lang = function_exists( 'pll_current_language' )
? pll_current_language()
: apply_filters( 'wpml_current_language', 'en' );
$analyzers = [
'en' => 'english',
'de' => 'german',
'fr' => 'french',
'es' => 'spanish',
'pt' => 'portuguese',
'it' => 'italian',
'nl' => 'dutch',
'ja' => 'kuromoji',
'zh' => 'smartcn',
'ko' => 'nori',
'ar' => 'arabic',
];
$analyzer = isset( $analyzers[ $current_lang ] )
? $analyzers[ $current_lang ]
: 'standard';
// Apply language-specific analyzer to text fields
$mapping['settings']['analysis'] = [
'analyzer' => [
'content_analyzer' => [
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => [ 'lowercase', $analyzer . '_stemmer', $analyzer . '_stop' ],
],
],
'filter' => [
$analyzer . '_stemmer' => [
'type' => 'stemmer',
'language' => $analyzer,
],
$analyzer . '_stop' => [
'type' => 'stop',
'stopwords' => '_' . $analyzer . '_',
],
],
];
// Apply to content fields
$mapping['mappings']['properties']['post_content']['analyzer'] = 'content_analyzer';
$mapping['mappings']['properties']['post_title']['fields']['analyzed'] = [
'type' => 'text',
'analyzer' => 'content_analyzer',
];
return $mapping;
});
Single Index with Multi-Language Fields
If running separate indices per language is operationally complex, you can use a single index with language-specific sub-fields:
add_filter( 'ep_post_mapping', function( $mapping ) {
$mapping['mappings']['properties']['post_title'] = [
'type' => 'text',
'fields' => [
'en' => [
'type' => 'text',
'analyzer' => 'english',
],
'de' => [
'type' => 'text',
'analyzer' => 'german',
],
'fr' => [
'type' => 'text',
'analyzer' => 'french',
],
'es' => [
'type' => 'text',
'analyzer' => 'spanish',
],
'raw' => [
'type' => 'keyword',
],
],
];
return $mapping;
});
At query time, you target the sub-field matching the user’s language:
add_filter( 'ep_formatted_args', function( $formatted_args, $args ) {
if ( empty( $args['s'] ) ) {
return $formatted_args;
}
$lang = function_exists( 'pll_current_language' )
? pll_current_language()
: 'en';
// Replace default fields with language-specific ones
if ( isset( $formatted_args['query']['bool']['should'] ) ) {
foreach ( $formatted_args['query']['bool']['should'] as &$clause ) {
if ( isset( $clause['multi_match'] ) ) {
$clause['multi_match']['fields'] = [
"post_title.{$lang}^5",
"post_content.{$lang}",
"post_excerpt.{$lang}^2",
];
}
}
}
return $formatted_args;
}, 20, 2 );
The per-language index approach is better for large sites because each index has cleaner analysis and smaller shard sizes. The multi-field approach works well for smaller sites with two or three languages where operational simplicity is more important.
CJK Language Considerations
Chinese, Japanese, and Korean (CJK) require specialized tokenizers because these languages do not use whitespace to separate words. Elasticsearch provides the kuromoji analyzer for Japanese, nori for Korean, and smartcn or ik for Chinese. These must be installed as Elasticsearch plugins before they can be referenced in your mapping.
For Japanese specifically, the kuromoji analyzer handles morphological analysis, breaking compound expressions into their component words. A search for a three-character compound will correctly match documents containing those characters as part of longer words, something that a simple bigram tokenizer would handle poorly.
Synonyms, Stemming, and Custom Dictionaries
Even with proper language analyzers, search quality often suffers from vocabulary mismatch. Users search for “laptop” but your products say “notebook computer.” They type “WP” expecting WordPress results. Custom synonyms and dictionaries bridge these gaps.
Configuring Synonyms in the Index Settings
Synonyms are defined as part of a custom analyzer in the index settings:
add_filter( 'ep_post_mapping', function( $mapping ) {
$mapping['settings']['analysis'] = array_merge_recursive(
$mapping['settings']['analysis'] ?? [],
[
'filter' => [
'wpkite_synonyms' => [
'type' => 'synonym',
'lenient' => true,
'synonyms' => [
'wp, wordpress',
'woo, woocommerce',
'seo, search engine optimization',
'plugin, extension, addon, add-on',
'theme, template, skin',
'hosting, server, web hosting',
'ssl, https, secure certificate',
'cdn, content delivery network',
'php, hypertext preprocessor',
'js, javascript',
'css, cascading style sheets, stylesheet',
'db, database',
'api, application programming interface',
'ui, user interface',
'ux, user experience',
'cms, content management system',
'ftp, file transfer protocol',
'dns, domain name system',
'gutenberg, block editor',
'classic editor, tinymce',
'cpanel, control panel',
'multisite, network, wpmu',
'ecommerce, e-commerce, online store, web store',
'laptop, notebook, notebook computer',
'cell phone, mobile phone, smartphone',
],
],
'wpkite_english_stemmer' => [
'type' => 'stemmer',
'language' => 'english',
],
'wpkite_english_stop' => [
'type' => 'stop',
'stopwords' => '_english_',
],
],
'analyzer' => [
'wpkite_content' => [
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => [
'lowercase',
'wpkite_english_stop',
'wpkite_synonyms',
'wpkite_english_stemmer',
],
],
'wpkite_search' => [
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => [
'lowercase',
'wpkite_synonyms',
],
],
],
]
);
// Apply custom analyzer to text fields
$mapping['mappings']['properties']['post_content']['analyzer'] = 'wpkite_content';
$mapping['mappings']['properties']['post_content']['search_analyzer'] = 'wpkite_search';
return $mapping;
});
Note the separate wpkite_search analyzer for queries. The search analyzer applies synonyms but skips stemming so that synonym expansion does not interfere with exact phrase matching. The index-time analyzer applies both stemming and synonyms to maximize recall.
Synonym File Management
For large synonym sets, inline configuration becomes unwieldy. Elasticsearch supports loading synonyms from a file:
'wpkite_synonyms' => [
'type' => 'synonym',
'synonyms_path' => 'analysis/wpkite_synonyms.txt',
'updateable' => true,
]
Place the file at {elasticsearch_config_dir}/analysis/wpkite_synonyms.txt. Each line defines a synonym rule:
# Explicit mappings (left side maps to right side)
wp => wordpress
woo => woocommerce
# Equivalent synonyms (all terms are interchangeable)
plugin, extension, addon
theme, template, skin
# Multi-word synonyms
search engine optimization, seo
content delivery network, cdn
When you update the file, call the _reload_search_analyzers API endpoint to pick up changes without a full reindex:
// Reload analyzers after synonym file update
$response = ElasticPress\Elasticsearch::factory()->remote_request(
ElasticPress\Indexables::factory()->get( 'post' )->get_index_name() . '/_reload_search_analyzers',
[ 'method' => 'POST' ]
);
Custom Stop Words
Stop words are common terms that add noise to search results: “the,” “is,” “at,” “which.” The default English stop words list covers the basics, but you may want to add domain-specific stop words. On a WordPress support site, words like “wordpress,” “site,” and “page” appear in nearly every document and carry little discriminative value:
'wpkite_custom_stop' => [
'type' => 'stop',
'stopwords' => [ 'wordpress', 'site', 'page', 'website', 'click', 'button' ],
]
Be cautious with custom stop words. Removing too many terms degrades the search experience because users expect those terms to work.
Performance: Query Optimization, Shard Configuration, and Caching
Elasticsearch is fast, but a poorly configured cluster or inefficient queries can negate its advantages. Performance tuning covers three areas: query structure, cluster topology, and caching layers.
Query Optimization
The most common performance mistake is using expensive query types when simpler alternatives exist. Here is a hierarchy from fastest to slowest:
term queries (exact keyword match): O(1) lookup. Use for status fields, slugs, IDs.
bool filter clauses: Filters are cached and do not compute scores. Always prefer filters over must clauses for non-scoring conditions.
match queries: Single-field text search with analysis. Fast.
multi_match queries: Multi-field text search. Scales linearly with field count.
wildcard queries: Pattern matching. Can be slow on large indices if the pattern starts with a wildcard.
regex queries: Full regex evaluation. Avoid in production search.
script queries: Arbitrary scripting. Slowest option. Use only when nothing else works.
// BAD: Using must for non-scoring filter conditions
{
"query": {
"bool": {
"must": [
{ "match": { "post_content": "caching" } },
{ "term": { "post_status": "publish" } },
{ "term": { "post_type": "post" } }
]
}
}
}
// GOOD: Move non-scoring conditions to filter
{
"query": {
"bool": {
"must": [
{ "match": { "post_content": "caching" } }
],
"filter": [
{ "term": { "post_status": "publish" } },
{ "term": { "post_type": "post" } }
]
}
}
}
The filter context skips scoring entirely and caches results in the filter cache. Subsequent queries with the same filter return instantly from cache.
Shard and Replica Configuration
ElasticPress creates indices with default shard settings, which may not be optimal for your data volume. The general guidelines:
Each shard should hold between 10 GB and 50 GB of data. Smaller shards create overhead. Larger shards slow down recovery and relocations.
For a WordPress site with 10,000 posts (roughly 500 MB of index data), a single shard with one replica is sufficient. For a WooCommerce store with 500,000 products, three to five primary shards with one replica each is reasonable.
/**
* Configure shard and replica counts based on index size.
*/
add_filter( 'ep_post_mapping', function( $mapping ) {
$post_count = wp_count_posts()->publish;
if ( $post_count < 50000 ) {
$shards = 1;
$replicas = 1;
} elseif ( $post_count < 500000 ) {
$shards = 3;
$replicas = 1;
} else {
$shards = 5;
$replicas = 2;
}
$mapping['settings']['number_of_shards'] = $shards;
$mapping['settings']['number_of_replicas'] = $replicas;
// Optimize refresh interval for bulk indexing
$mapping['settings']['refresh_interval'] = '30s';
return $mapping;
});
Replicas serve two purposes: fault tolerance and read throughput. Each replica can serve search requests independently, so two replicas effectively triple your read capacity. However, replicas also double (or triple) your storage requirements and indexing load.
WordPress Object Cache Integration
Even though Elasticsearch responds in milliseconds, you can reduce load further by caching frequent queries in the WordPress object cache:
/**
* Cache Elasticsearch results in the WordPress object cache.
*/
add_filter( 'ep_formatted_args', function( $formatted_args, $args ) {
if ( empty( $args['s'] ) ) {
return $formatted_args;
}
// Generate a cache key from the query
$cache_key = 'ep_search_' . md5( wp_json_encode( $formatted_args ) );
$cached = wp_cache_get( $cache_key, 'ep_search_results' );
if ( false !== $cached ) {
// Store for retrieval in posts_pre_query
global $wpkite_cached_ep_results;
$wpkite_cached_ep_results = $cached;
}
return $formatted_args;
}, 5, 2 );
/**
* Store results in cache after successful ES query.
*/
add_action( 'ep_valid_response', function( $response, $query, $args ) {
if ( empty( $args['s'] ) ) {
return;
}
$cache_key = 'ep_search_' . md5( wp_json_encode( $args ) );
wp_cache_set( $cache_key, $response, 'ep_search_results', 300 ); // 5 minute TTL
}, 10, 3 );
Use this sparingly. Caching works best for popular queries (the top 100 searches on your site), but it can serve stale results if content changes frequently. A 5-minute TTL is a reasonable balance for most sites.
Bulk Indexing Performance
When reindexing a large site, the default ElasticPress settings can be slow. Several adjustments speed up the process:
// Increase the bulk index batch size (default is 350)
add_filter( 'ep_bulk_items_per_page', function() {
return 1000;
});
// Disable replicas during reindexing for faster writes
add_action( 'ep_cli_before_set_search_algorithm_version', function() {
$index_name = ElasticPress\Indexables::factory()->get( 'post' )->get_index_name();
ElasticPress\Elasticsearch::factory()->remote_request(
$index_name . '/_settings',
[
'method' => 'PUT',
'body' => wp_json_encode([
'index' => [
'number_of_replicas' => 0,
'refresh_interval' => '-1',
],
]),
]
);
});
// Re-enable replicas and refresh after reindexing
add_action( 'ep_cli_after_set_search_algorithm_version', function() {
$index_name = ElasticPress\Indexables::factory()->get( 'post' )->get_index_name();
ElasticPress\Elasticsearch::factory()->remote_request(
$index_name . '/_settings',
[
'method' => 'PUT',
'body' => wp_json_encode([
'index' => [
'number_of_replicas' => 1,
'refresh_interval' => '1s',
],
]),
]
);
// Force a refresh to make all documents searchable
ElasticPress\Elasticsearch::factory()->remote_request(
$index_name . '/_refresh',
[ 'method' => 'POST' ]
);
});
Disabling replicas and the refresh interval during bulk indexing can reduce reindex time by 40% to 60% on large datasets.
Monitoring Search Quality
Deploying Elasticsearch is only half the battle. You need ongoing visibility into what users search for, what results they click, and where the search fails them. Without this data, relevance tuning is guesswork.
Logging Search Queries
Capture every search query, the result count, and whether the user clicked a result:
/**
* Log search queries for quality analysis.
*/
add_action( 'ep_valid_response', function( $response, $query, $args ) {
if ( empty( $args['s'] ) ) {
return;
}
global $wpdb;
$total_results = isset( $response['hits']['total']['value'] )
? (int) $response['hits']['total']['value']
: 0;
$top_results = [];
if ( isset( $response['hits']['hits'] ) ) {
foreach ( array_slice( $response['hits']['hits'], 0, 5 ) as $hit ) {
$top_results[] = [
'id' => $hit['_source']['post_id'],
'score' => $hit['_score'],
];
}
}
$wpdb->insert(
$wpdb->prefix . 'search_log',
[
'query_text' => sanitize_text_field( $args['s'] ),
'result_count' => $total_results,
'top_results' => wp_json_encode( $top_results ),
'user_id' => get_current_user_id(),
'search_date' => current_time( 'mysql' ),
'response_time' => isset( $response['took'] ) ? (int) $response['took'] : null,
],
[ '%s', '%d', '%s', '%d', '%s', '%d' ]
);
}, 10, 3 );
/**
* Create the search log table.
*/
function wpkite_create_search_log_table() {
global $wpdb;
$table_name = $wpdb->prefix . 'search_log';
$charset = $wpdb->get_charset_collate();
$sql = "CREATE TABLE IF NOT EXISTS {$table_name} (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
query_text VARCHAR(255) NOT NULL,
result_count INT UNSIGNED NOT NULL DEFAULT 0,
top_results TEXT,
user_id BIGINT UNSIGNED DEFAULT 0,
clicked_result_id BIGINT UNSIGNED DEFAULT NULL,
search_date DATETIME NOT NULL,
response_time INT UNSIGNED DEFAULT NULL,
PRIMARY KEY (id),
INDEX idx_query_text (query_text),
INDEX idx_search_date (search_date),
INDEX idx_result_count (result_count)
) {$charset};";
require_once ABSPATH . 'wp-admin/includes/upgrade.php';
dbDelta( $sql );
}
register_activation_hook( __FILE__, 'wpkite_create_search_log_table' );
Click-Through Tracking
Logging queries alone does not tell you whether the results were useful. You also need to track which results users actually click:
/**
* Track search result clicks via AJAX.
*/
add_action( 'wp_ajax_wpkite_track_search_click', 'wpkite_track_search_click' );
add_action( 'wp_ajax_nopriv_wpkite_track_search_click', 'wpkite_track_search_click' );
function wpkite_track_search_click() {
$search_query = isset( $_POST['search_query'] )
? sanitize_text_field( $_POST['search_query'] )
: '';
$clicked_id = isset( $_POST['clicked_id'] )
? absint( $_POST['clicked_id'] )
: 0;
$position = isset( $_POST['position'] )
? absint( $_POST['position'] )
: 0;
if ( empty( $search_query ) || empty( $clicked_id ) ) {
wp_send_json_error();
}
global $wpdb;
// Update the most recent search log entry for this query
$wpdb->query( $wpdb->prepare(
"UPDATE {$wpdb->prefix}search_log
SET clicked_result_id = %d
WHERE query_text = %s
AND user_id = %d
AND clicked_result_id IS NULL
ORDER BY search_date DESC
LIMIT 1",
$clicked_id,
$search_query,
get_current_user_id()
) );
// Log the click with position data
$wpdb->insert(
$wpdb->prefix . 'search_clicks',
[
'query_text' => $search_query,
'post_id' => $clicked_id,
'position' => $position,
'user_id' => get_current_user_id(),
'click_date' => current_time( 'mysql' ),
],
[ '%s', '%d', '%d', '%d', '%s' ]
);
wp_send_json_success();
}
Key Metrics to Track
With query and click data in your database, you can compute metrics that measure search quality:
Zero-result rate: The percentage of searches that return no results. A rate above 10% indicates gaps in your content or synonym coverage. Run a weekly report to identify the top zero-result queries and either create content for those topics or add synonyms.
/**
* Get zero-result queries from the last 30 days.
*/
function wpkite_get_zero_result_queries( $days = 30, $limit = 50 ) {
global $wpdb;
return $wpdb->get_results( $wpdb->prepare(
"SELECT query_text, COUNT(*) as search_count
FROM {$wpdb->prefix}search_log
WHERE result_count = 0
AND search_date > DATE_SUB(NOW(), INTERVAL %d DAY)
GROUP BY query_text
ORDER BY search_count DESC
LIMIT %d",
$days,
$limit
) );
}
Click-through rate (CTR): The percentage of searches where the user clicks a result. A low CTR suggests that results are not matching user intent. Compare CTR across different query patterns to identify weak spots.
Mean reciprocal rank (MRR): For each search where a click occurred, compute 1/position (where position is the rank of the clicked result). Average this across all searches. An MRR close to 1.0 means users consistently click the first result. An MRR below 0.3 means the best result is often buried.
/**
* Calculate mean reciprocal rank for search quality assessment.
*/
function wpkite_calculate_mrr( $days = 30 ) {
global $wpdb;
$clicks = $wpdb->get_results( $wpdb->prepare(
"SELECT position
FROM {$wpdb->prefix}search_clicks
WHERE click_date > DATE_SUB(NOW(), INTERVAL %d DAY)
AND position > 0",
$days
) );
if ( empty( $clicks ) ) {
return 0;
}
$rr_sum = 0;
foreach ( $clicks as $click ) {
$rr_sum += 1 / (int) $click->position;
}
return $rr_sum / count( $clicks );
}
Query latency: Track the took field from Elasticsearch responses. P95 latency should stay under 200ms for a good user experience. If latency spikes, check for expensive queries (wildcards, scripts), insufficient cluster resources, or shard imbalance.
Building a Search Quality Dashboard
Combine these metrics into a WordPress admin page that your team can review weekly:
/**
* Register the search quality admin page.
*/
add_action( 'admin_menu', function() {
add_submenu_page(
'tools.php',
'Search Quality',
'Search Quality',
'manage_options',
'search-quality',
'wpkite_render_search_quality_page'
);
});
function wpkite_render_search_quality_page() {
global $wpdb;
$days = isset( $_GET['days'] ) ? absint( $_GET['days'] ) : 30;
// Total searches
$total_searches = (int) $wpdb->get_var( $wpdb->prepare(
"SELECT COUNT(*) FROM {$wpdb->prefix}search_log
WHERE search_date > DATE_SUB(NOW(), INTERVAL %d DAY)",
$days
) );
// Zero result count
$zero_results = (int) $wpdb->get_var( $wpdb->prepare(
"SELECT COUNT(*) FROM {$wpdb->prefix}search_log
WHERE result_count = 0
AND search_date > DATE_SUB(NOW(), INTERVAL %d DAY)",
$days
) );
$zero_rate = $total_searches > 0
? round( ( $zero_results / $total_searches ) * 100, 1 )
: 0;
// Average response time
$avg_response = (float) $wpdb->get_var( $wpdb->prepare(
"SELECT AVG(response_time) FROM {$wpdb->prefix}search_log
WHERE response_time IS NOT NULL
AND search_date > DATE_SUB(NOW(), INTERVAL %d DAY)",
$days
) );
// MRR
$mrr = wpkite_calculate_mrr( $days );
// Top queries
$top_queries = $wpdb->get_results( $wpdb->prepare(
"SELECT query_text, COUNT(*) as count, AVG(result_count) as avg_results
FROM {$wpdb->prefix}search_log
WHERE search_date > DATE_SUB(NOW(), INTERVAL %d DAY)
GROUP BY query_text
ORDER BY count DESC
LIMIT 20",
$days
) );
echo '';
printf( 'Search Quality Dashboard (%d days)
', $days );
echo '';
printf( 'Total Searches
%s
', number_format( $total_searches ) );
printf( 'Zero-Result Rate
%s%%
', $zero_rate );
printf( 'Avg Response Time
%sms
', round( $avg_response ) );
printf( 'Mean Reciprocal Rank
%s
', round( $mrr, 3 ) );
echo '';
// Render top queries table
if ( ! empty( $top_queries ) ) {
echo 'Top Search Queries
';
echo '';
echo 'Query Count Avg Results ';
echo '';
foreach ( $top_queries as $row ) {
printf(
'%s %d %s ',
esc_html( $row->query_text ),
(int) $row->count,
round( (float) $row->avg_results, 1 )
);
}
echo '
';
}
// Render zero-result queries
$zero_queries = wpkite_get_zero_result_queries( $days, 20 );
if ( ! empty( $zero_queries ) ) {
echo 'Top Zero-Result Queries
';
echo '';
echo 'Query Count ';
echo '';
foreach ( $zero_queries as $row ) {
printf(
'%s %d ',
esc_html( $row->query_text ),
(int) $row->search_count
);
}
echo '
';
}
echo '';
}
Putting It All Together: A Complete Search Architecture
Let me outline how all of these pieces fit into a real WordPress project. The architecture has four layers.
Layer 1: Index Configuration
Your functions.php or a dedicated plugin registers custom mappings, analyzers, and synonym lists. This defines the structure of your search index. Every time you change the mapping, you run wp elasticpress index --setup to rebuild. In production, schedule reindexes during low-traffic hours.
Layer 2: Data Sync
The ep_post_sync_args_post_prepare_meta filter populates your custom fields during indexing. This is where ACF fields, WooCommerce product data, and typed meta fields get written to Elasticsearch. Make sure this filter is performant because it runs once for every post during a full reindex. Avoid calling get_field() or wc_get_product() multiple times for the same post. Cache intermediate results where possible.
Layer 3: Query Modification
The ep_formatted_args filter is where relevance tuning, function scores, facet aggregations, and context-aware boosting happen. Stack your filters with clear priority levels. Use priority 10-15 for structural modifications (adding aggregations, moving filters to post_filter), priority 20-25 for relevance tuning (field weights, function scores), and priority 30+ for context-specific overrides.
Layer 4: Frontend Presentation
The search results template renders scored results, facet panels, and the autocomplete UI. The JavaScript autocomplete class handles real-time suggestions. Click tracking feeds data back to the monitoring layer.
Deployment Checklist
Before going live with Elasticsearch, walk through this checklist:
Verify your Elasticsearch cluster is running on a supported version (7.x or 8.x for current ElasticPress releases). Confirm the cluster health is green, meaning all primary and replica shards are allocated.
Run a full index with wp elasticpress index --setup and verify the document count matches your published post count. Check for indexing errors in the ElasticPress health screen or CLI output.
Test your synonym list by searching for each synonym pair and verifying matches appear. Test edge cases: single-character searches, very long queries, queries with special characters, and queries in non-English languages if applicable.
Load test the search endpoint with realistic query patterns. Use a tool like Apache JMeter or k6 to send 50 to 100 concurrent search requests and verify that P95 latency stays under 200ms.
Set up monitoring alerts for cluster health (yellow or red status), search latency spikes (P95 above 500ms), and zero-result rate increases (more than 5% change week over week).
Configure index lifecycle management if your search log grows large. Elasticsearch ILM policies can automatically move older indices to cheaper storage tiers or delete them after a retention period.
Common Pitfalls and How to Avoid Them
Mapping conflicts after reindex. If you change a field type (say, from text to keyword), Elasticsearch will reject the mapping update. You must delete the index and recreate it with --setup. In production, this means a brief period without search. Use an index alias and the blue-green deployment pattern: create a new index with the updated mapping, reindex into it, then swap the alias to point to the new index.
Memory pressure from too many shards. Each shard consumes heap memory for metadata, regardless of whether it holds data. A cluster with hundreds of tiny shards will suffer from garbage collection pauses and slow query performance. Consolidate small indices and keep total shard count proportional to your cluster's heap.
Stale results after post updates. ElasticPress syncs posts to Elasticsearch on save, but if the sync fails silently (due to a timeout or network issue), your search index goes stale. Implement a health check that compares the Elasticsearch document count with the WordPress post count and alerts on discrepancies.
Over-reliance on fuzzy matching. Fuzzy queries correct typos by matching terms within an edit distance. But fuzziness on short terms produces bizarre results: a fuzzy search for "PHP" might match "PHP," "PH," and "PHS" but also "FHP" and "PAP." Restrict fuzziness to longer query terms (five or more characters) and use exact matching for short terms.
Ignoring the explain API. When a specific query returns unexpected results, use the _explain endpoint to see how Elasticsearch scored each document. This reveals whether a field weight, function score, or synonym expansion is responsible:
// Debug why a specific document scored the way it did
$explain = ElasticPress\Elasticsearch::factory()->remote_request(
ElasticPress\Indexables::factory()->get( 'post' )->get_index_name()
. '/_explain/' . $document_id,
[
'method' => 'POST',
'body' => wp_json_encode( $query_dsl ),
]
);
$explanation = json_decode( wp_remote_retrieve_body( $explain ), true );
// The explanation tree shows every scoring component
Search quality is an ongoing discipline, not a one-time configuration. The sites that deliver the best search experiences are the ones that review their search analytics monthly, refine their synonym dictionaries quarterly, and revisit their relevance tuning whenever content strategy shifts. Elasticsearch gives you the engine. ElasticPress gives you the bridge to WordPress. The search quality monitoring gives you the feedback loop to make it all work.
Elena Vasquez
WooCommerce specialist and plugin developer with 8 years of experience. Built several popular WooCommerce extensions. Focuses on performance and scaling for online stores.