Jetpack Search is a powerful replacement for the search capability built into WordPress. It is a paid upgrade to the Jetpack plugin that provides higher quality results and an improved search experience. Upgrade today to get started.
This support article covers how to use customize using the search api and is intended for developers.
Each search result is a single Elasticsearch document. Currently there is only a single document type.The top level code for building our index is open sourced and is the best place to look if you want the details. Below though is a written description of all the fields that are safe to rely on. Certain fields (especially about extracted post content) are likely to change due to Gutenberg indexing and so they have been left out.
- Post Info
- Post Language
- Post Author
- Post Content
- Post Tags and Categories
- Post Custom Taxonomies
- Post Interactions
- Post Dates
- Post Meta
Description of the Fields
We use a few specific terms to specify fields below.
Data Type: description of the data that is stored
- number
- string
- boolean
- date
Type Details: details about the mapping for that field
- short, integer, long
- boolean
- text: An analyzed string. Tokenized into multiple terms.
- keyword: The string is treated as a single term. Keyword strings get truncated.
- token_count: A count of the number of tokens in the text. i.e. the word count.
- date: The date data object takes dates in ISO 8601 format either with times (
yyyy-MM-dd HH:mm:ss
) or without (yyyy-MM-dd
).
Tokenization languages
There is a default language analysis used for text fields, and then also custom language analysis for 29 languages. The language analyzers are defined in this code.
Post Fields
Post Info
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
site_id | number | short | 1 for WordPress.com, 2 for Jetpack |
blog_id | number | integer | |
post_id | number | long | |
parent_post_id | number | long | |
ancestor_post_ids | number | long | |
sticky | boolean | boolean | |
menu_order | number | integer | |
slug | string | keyword | |
permalink.url.analyzed | string | text | URL no protocol |
permalink.url.raw | string | keyword | URL no protocol |
permalink.host | string | keyword | |
permalink.reverse_host | string | keyword | |
post_type | string | keyword | |
post_format | string | keyword | |
post_status | string | keyword | |
has_password | boolean | boolean | |
public | boolean | boolean | |
featured_image | string | keyword | URL no protocol |
featured_image_url.url.analyzed | string | text | URL no protocol |
featured_image_url.url.raw | string | keyword | URL no protocol |
featured_image_url.host | string | keyword | |
featured_image_url.reverse_host | string | keyword |
Post Language
The post language is determined dynamically by detecting the language in the post title, content, and excerpt fields. If it is not possible to detect the post language then the fall back is the blog’s configured language.
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
lang | string | keyword | Two letter ISO 639 code |
Post Author
The post author is the WordPress.com user that authored the post. If it’s a Jetpack site and we are unable to determine the corresponding WordPress.com user the author_id
field will be set to 0
.
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
author | string | text | WordPress.com display name |
author.raw | string | keyword | WordPress.com display name |
author_login | string | keyword | WordPress.com username |
author_id | number | integer | WordPress.com user id |
Post Content
All content in the post is processed in a similar way with HTML and shortcodes stripped. The content is analyzed for which language it is in and then indexed into the appropriate lang field. Content is always put into the default field(s) also.
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
all_content.default | string | text | default analyzer |
all_content.default.engram | string | text | search as you type analyzer |
all_content.default.word_count | number | token_count | count of words – works for whitespace delimited languages |
all_content.[LANG] | string | text | [LANG]_analyzer analyzer |
all_content.[LANG].engram | string | text | search as you type with lang specific analysis |
all_content.[LANG].word_count | number | token_count | count of words, but will exclude stop words. Good for ja, ko, zh |
title.default | string | text | |
title.default.word_count | number | token_count | |
title.[LANG] | string | text | |
title.[LANG].word_count | number | token_count | |
excerpt.default | string | text | |
excerpt.default.word_count | number | token_count | |
excerpt.[LANG] | string | text | |
excerpt.[LANG].word_count | number | token_count | |
content.default | string | text | |
content.default.word_count | number | token_count | |
content.[LANG] | string | text | |
content.[LANG].word_count | number | token_count |
Post Tags and Categories
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
tag_cat_count | number | short | Total number of tags and categories |
tag.name.default | string | text | default analyzer |
tag.name.[LANG] | string | text | [LANG] analyzer |
tag.slug_slash_name | string | keyword | combines slug and formatted name for displaying aggregations |
tag.slug | string | keyword | |
tag.term_id | number | long | |
category.name.default | string | text | default analyzer |
category.name.[LANG] | string | text | [LANG] analyzer |
category.slug_slash_name | string | keyword | combines slug and formatted name for displaying aggregations |
category.slug | string | keyword | |
category.term_id | number | long |
Post Custom Taxonomies
The taxonomy fields are dynamic and the [NAME]
portion of the field name depends on the name of the post taxonomy. There is a hardcoded list of taxonomies that are indexed. This has not yet been synced to Jetpack.
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
taxonomy.[NAME].name | string | text | default analyzer |
taxonomy.[NAME].name.slug_slash_name | string | keyword | combines slug and formatted name for displaying aggregations |
taxonomy.[NAME].slug | string | keyword | |
taxonomy.[NAME].term_id | number | long |
Post Interactions
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
like_count | number | short | |
liker_ids | number | integer | WordPress.com users that liked this post |
comment_count | number | integer | |
commenter_ids | number | integer | WordPress.com users that commented on this post |
is_reblogged | boolean | boolean | Post contains reblogged content from another site |
reblog_count | number | long | Number of times this post was reblogged elsewhere |
reblogger_ids | number | long | WordPress.com users that reblogged this post elsewhere |
Post Dates
Each the dates associated with the post is stored as both a date data type as well as broken out into token parts to make granular date based searches easier. For example, finding all posts that were published on a Tuesday (date_token.day_of_week
), or those that were modified in the second half of each hour (modified_token.seconds_from_hour
).
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
date | date | date | |
date_token.year | number | short | 4 digit year |
date_token.month | number | byte | |
date_token.day | number | byte | |
date_token.hour | number | byte | 24 hour format |
date_token.minute | number | byte | |
date_token.second | number | byte | |
date_token.day_of_year | number | short | The day of the year (starting from 0) |
date_token.day_of_week | number | byte | 1 for Monday through 7 for Sunday |
date_token.week_of_year | number | byte | Week number of year |
date_token.seconds_from_day | number | integer | Seconds since midnight of day |
date_token.seconds_from_hour | number | short | Seconds since start of hour |
date_gmt | date | date | |
date_gmt_token.year | number | short | 4 digit year |
date_gmt_token.month | number | byte | |
date_gmt_token.day | number | byte | |
date_gmt_token.hour | number | byte | 24 hour format |
date_gmt_token.minute | number | byte | |
date_gmt_token.second | number | byte | |
date_gmt_token.day_of_year | number | short | The day of the year (starting from 0) |
date_gmt_token.day_of_week | number | byte | 1 for Monday through 7 for Sunday |
date_gmt_token.week_of_year | number | byte | Week number of year |
date_gmt_token.seconds_from_day | number | integer | Seconds since midnight of day |
date_gmt_token.seconds_from_hour | number | short | Seconds since start of hour |
modified | date | date | |
modified_token.year | number | short | 4 digit year |
modified_token.month | number | byte | |
modified_token.day | number | byte | |
modified_token.hour | number | byte | 24 hour format |
modified_token.minute | number | byte | |
modified_token.second | number | byte | |
modified_token.day_of_year | number | short | The day of the year (starting from 0) |
modified_token.day_of_week | number | byte | 1 for Monday through 7 for Sunday |
modified_token.week_of_year | number | byte | Week number of year |
modified_token.seconds_from_day | number | integer | Seconds since midnight of day |
modified_token.seconds_from_hour | number | short | Seconds since start of hour |
modified_gmt | date | date | |
modified_gmt_token.year | number | short | 4 digit year |
modified_gmt_token.month | number | byte | |
modified_gmt_token.day | number | byte | |
modified_gmt_token.hour | number | byte | 24 hour format |
modified_gmt_token.minute | number | byte | |
modified_gmt_token.second | number | byte | |
modified_gmt_token.day_of_year | number | short | The day of the year (starting from 0) |
modified_gmt_token.day_of_week | number | byte | 1 for Monday through 7 for Sunday |
modified_gmt_token.week_of_year | number | byte | Week number of year |
modified_gmt_token.seconds_from_day | number | integer | Seconds since midnight of day |
modified_gmt_token.seconds_from_hour | number | short | Seconds since start of hour |
Post Meta
There is a hard coded list of post meta keys that are indexed. This list has not yet been synced to Jetpack.
The post meta fields are dynamic and the [KEY]
portion of the field name depends on the name (key) of the post meta being indexed. To accommodate advanced querying all post meta values are cast and indexed as numeric and boolean values in addition to being indexed as strings.
Field Name | Data Type | Type Details | Notes |
---|---|---|---|
meta.[KEY].value | string | text | default analyzer |
meta.[KEY].value.raw | string | keyword | |
meta.[KEY].date | date | date | If it looks like a date |
meta.[KEY].long | number | long | Value cast as 64bit integer (bigint) if it looks like a number |
meta.[KEY].double | number | double | Value cast as floating point number if it looks like a number |
meta.[KEY].boolean | boolean | boolean | Value cast as boolean |
* Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.