Difference between revisions of "ElasticSearch Query DSL"
Line 178: | Line 178: | ||
* ''lt'' = ''to'' + ''include_upper=false'', | * ''lt'' = ''to'' + ''include_upper=false'', | ||
* ''lte'' = ''to'' + ''include_upper=true''. | * ''lte'' = ''to'' + ''include_upper=true''. | ||
+ | |||
+ | === term, terms === | ||
+ | |||
+ | Matches documents that have fields that contain a term (not analyzed). | ||
+ | |||
+ | <pre>{ "term" : { "user" : "kimchy" } }</pre> | ||
+ | <pre>{ "term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } } }</pre> | ||
== Filters == | == Filters == | ||
+ | |||
+ | Filters can be a great candidate for caching. Caching the result of a filter does not require a lot of memory, and will cause other queries executing against the same filter (same parameters) to be blazingly fast. Esp. ''term'', ''terms'', ''prefix'', and ''range'' filters, are by default cached and are recommended to use (compared to the equivalent query version). | ||
=== and, or, not === | === and, or, not === | ||
+ | |||
+ | Matches documents using AND operator on other queries, more performant than ''bool'' filter. | ||
+ | |||
+ | These filters are '''not''' cached by default. | ||
+ | <pre>{ | ||
+ | "filtered" : { | ||
+ | "query" : { | ||
+ | "term" : { "name.first" : "shay" } | ||
+ | }, | ||
+ | "filter" : { | ||
+ | "and" : [ | ||
+ | { | ||
+ | "range" : { | ||
+ | "postDate" : { | ||
+ | "from" : "2010-03-01", | ||
+ | "to" : "2010-04-01" | ||
+ | } | ||
+ | } | ||
+ | }, | ||
+ | { | ||
+ | "prefix" : { "name.second" : "ba" } | ||
+ | } | ||
+ | ] | ||
+ | } | ||
+ | } | ||
+ | }</pre> | ||
+ | |||
+ | To cache the results of the filter: | ||
+ | <pre>{ | ||
+ | "filtered" : { | ||
+ | "query" : { | ||
+ | "term" : { "name.first" : "shay" } | ||
+ | }, | ||
+ | "filter" : { | ||
+ | "or" : { | ||
+ | "filters" : [ | ||
+ | { | ||
+ | "term" : { "name.second" : "banon" } | ||
+ | }, | ||
+ | { | ||
+ | "term" : { "name.nick" : "kimchy" } | ||
+ | } | ||
+ | ], | ||
+ | "_cache" : true | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | }</pre> | ||
=== bool === | === bool === | ||
+ | |||
+ | Matches documents matching Boolean combinations of other queries. Similar to Boolean queries, but clauses are filters: | ||
+ | <pre>{ | ||
+ | "filtered" : { | ||
+ | "query" : { | ||
+ | "queryString" : { | ||
+ | "default_field" : "message", | ||
+ | "query" : "elasticsearch" | ||
+ | } | ||
+ | }, | ||
+ | "filter" : { | ||
+ | "bool" : { | ||
+ | "must" : { | ||
+ | "term" : { "tag" : "wow" } | ||
+ | }, | ||
+ | "must_not" : { | ||
+ | "range" : { | ||
+ | "age" : { "from" : 10, "to" : 20 } | ||
+ | } | ||
+ | }, | ||
+ | "should" : [ | ||
+ | { | ||
+ | "term" : { "tag" : "sometag" } | ||
+ | }, | ||
+ | { | ||
+ | "term" : { "tag" : "sometagtag" } | ||
+ | } | ||
+ | ] | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | }</pre> | ||
=== exists, missing === | === exists, missing === | ||
− | === | + | Filters documents where a specific field has a value in them (''exists''), or has no value in them (''missing''). |
+ | <pre>{ | ||
+ | "constant_score" : { | ||
+ | "filter" : { | ||
+ | "exists" : { "field" : "user" } | ||
+ | } | ||
+ | } | ||
+ | }</pre> | ||
+ | |||
+ | === range, numeric_range === | ||
+ | |||
+ | Unlike ''range'' query, ''range'' filter is cached; see ''range'' query for available parameters. | ||
+ | <pre>{ | ||
+ | "constant_score" : { | ||
+ | "filter" : { | ||
+ | "range" : { | ||
+ | "age" : { | ||
+ | "from" : "10", | ||
+ | "to" : "20", | ||
+ | "include_lower" : true, | ||
+ | "include_upper" : false | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | }</pre> | ||
+ | |||
+ | The ''numeric_range'' filter loads relevant fields to memory and checks the numeric range; this requires more memory, but may be significantly faster. | ||
+ | |||
+ | Unlike ''range'', ''numeric_range'' filter results are '''not''' cached by default. Set ''_cache'' to ''true'' to do so. But if the filter is reused, it's advisable to simply use ''range'' filter. | ||
+ | |||
+ | === query === | ||
+ | |||
+ | Wraps a query to be used as a filter: | ||
+ | <pre>{ | ||
+ | "constantScore" : { | ||
+ | "filter" : { | ||
+ | "query" : { | ||
+ | "query_string" : { | ||
+ | "query" : "this AND that OR thus" | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | }</pre> | ||
+ | |||
+ | This is not cached by default; to allow caching (note that the format differs a bit from other filters): | ||
+ | <pre>{ | ||
+ | "constantScore" : { | ||
+ | "filter" : { | ||
+ | "fquery" : { | ||
+ | "query" : { | ||
+ | "query_string" : { | ||
+ | "query" : "this AND that OR thus" | ||
+ | } | ||
+ | }, | ||
+ | "_cache" : true | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | }</pre> | ||
+ | |||
+ | === term, terms === | ||
+ | |||
+ | Filters documents that have fields that contain a term (not analyzed), and is cached by default: | ||
+ | <pre>{ | ||
+ | "constant_score" : { | ||
+ | "filter" : { | ||
+ | "term" : { "user" : "kimchy"} | ||
+ | } | ||
+ | } | ||
+ | }</pre> | ||
− | = | + | The ''terms'' filter simply accepts multiple terms, matching documents containing ''any'' of the terms; if you want to match ''all'' of the terms, use ''execution=and'': |
+ | <pre>{ | ||
+ | "constant_score" : { | ||
+ | "filter" : { | ||
+ | "terms" : { | ||
+ | "user" : ["kimchy", "elasticsearch"], | ||
+ | "execution" : "and" | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | }</pre> |
Latest revision as of 13:17, 23 January 2013
ES's Query DSL is a language for specifying queries in JSON.
This is by far not an exhaustive documentation, it's just stuff I use the most; see official documentation for more. Especially the boosting and scoring functionality is not documented here to proper extent.
Contents
Queries
match, multi_match
The match queries accept, analyze, and construct query out of text/numeric/date. The match family of queries does not go through a "query parsing" process. It does not support field name prefixes, wildcard characters, or other "advance" features.
Here, message is name of the field to match in (can be also _all):
{ "match" : { "message" : "this is a test" } }
By default, terms are OR'ed; to AND them:
{ "match" : { "message" : { "query" : "this is a test", "operator" : "and" } } }
To match a phrase:
{ "match_phrase" : { "message" : "this is a test" } }
or using the last word as prefix (the "as you type" search):
{ "match_phrase_prefix" : { "message" : "this is a test" } }
To match in multiple fields, with optional boosting, use:
{ "multi_match" : { "query" : "this is a test", "fields" : [ "subject^2", "message" ] } }
where matches in subject are "twice as important" as matched in message.
bool
The bool query provides a Boolean combination of queries with typed occurrence:
- must - clause must appear in matching documents,
- should - should appear; is no must clause is provided, at least one should clause must be matched; you can also specify minimum_number_should_match parameter,
- must_not appear.
{ "bool" : { "must" : { "term" : { "user" : "kimchy" } }, "must_not" : { "range" : { "age" : { "from" : 10, "to" : 20 } } }, "should" : [ { "term" : { "tag" : "wow" } }, { "term" : { "tag" : "elasticsearch" } } ], "minimum_number_should_match" : 1, "boost" : 1.0 } }
boosting
Boosting can be used to promote or demote search results:
{ "boosting" : { "positive" : { "term" : { "field1" : "value1" } }, "negative" : { "term" : { "field2" : "value2" } }, "negative_boost" : 0.2 } }
ids
Match by ID:
{ "ids" : { "type" : "my_type", "values" : ["1", "4", "100"] } }
Note: type field is optional, and may contain array of values.
field
Query only on a specified field (equivalent of query_string with default_field):
{ "field" : { "name.first" : "+something -else" } }
filtered
Filters results of a query; may be much faster than querying, as no scoring is done, and may be cached:
{ "filtered" : { "query" : { "term" : { "tag" : "wow" } }, "filter" : { "range" : { "age" : { "from" : 10, "to" : 20 } } } } }
query_string
Uses query parser in order to parse its content.
{ "query_string" : { "default_field" : "content", "query" : "this AND that OR thus" } }
Parameters
- query - actual query to be parsed.
- default_field - default field for query terms (if no prefix field specified); default index.query.default_field settings, which defaults to _all,
- fields - run query against multiple fields (provided as array):
- "fields" : ["content", "name"],
- optionally with boosting: "fields" : ["content", "name^5"],
- wildcards may be used for fields: "fields" : ["city.*"] if document contains object city,
- to check for existence of nonexistence of fields, use: _exists_:field1 and _missing_:field,
- default_operator - default operator used (if none explicitly specified); e.g. with default operator OR, the query "capital of Hungary" is translated to "capital OR of OR Hungary"; default is OR,
- allow_leading_wildcard - are * or ? allowed as the first character? default true,
- lowercase_expanded_terms - should terms of wildcard, prefix, fuzzy, and range queries be automatically lower-cased? (since they are not analyzed); default true,
- boost - boost value of the query; default 1.0,
- minimum_should_match - percent value ("20%") controlling how many "should" clauses in the resulting boolean query should match,
- lenient - if true, format based failures (like providing text to a numeric field) to be ignored.
range
Matches documents by a provided range. For string fields, the TermRangeQuery is used, while for number/date fields, the query is a NumericRangeQuery.
{ "range" : { "age" : { "from" : 10, "to" : 20, "include_lower" : true, "include_upper": false, "boost" : 2.0 } } }
You can also use the following abbreviations:
- gt = from + include_lower=false,
- gte = from + include_lower=true,
- lt = to + include_upper=false,
- lte = to + include_upper=true.
term, terms
Matches documents that have fields that contain a term (not analyzed).
{ "term" : { "user" : "kimchy" } }
{ "term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } } }
Filters
Filters can be a great candidate for caching. Caching the result of a filter does not require a lot of memory, and will cause other queries executing against the same filter (same parameters) to be blazingly fast. Esp. term, terms, prefix, and range filters, are by default cached and are recommended to use (compared to the equivalent query version).
and, or, not
Matches documents using AND operator on other queries, more performant than bool filter.
These filters are not cached by default.
{ "filtered" : { "query" : { "term" : { "name.first" : "shay" } }, "filter" : { "and" : [ { "range" : { "postDate" : { "from" : "2010-03-01", "to" : "2010-04-01" } } }, { "prefix" : { "name.second" : "ba" } } ] } } }
To cache the results of the filter:
{ "filtered" : { "query" : { "term" : { "name.first" : "shay" } }, "filter" : { "or" : { "filters" : [ { "term" : { "name.second" : "banon" } }, { "term" : { "name.nick" : "kimchy" } } ], "_cache" : true } } } }
bool
Matches documents matching Boolean combinations of other queries. Similar to Boolean queries, but clauses are filters:
{ "filtered" : { "query" : { "queryString" : { "default_field" : "message", "query" : "elasticsearch" } }, "filter" : { "bool" : { "must" : { "term" : { "tag" : "wow" } }, "must_not" : { "range" : { "age" : { "from" : 10, "to" : 20 } } }, "should" : [ { "term" : { "tag" : "sometag" } }, { "term" : { "tag" : "sometagtag" } } ] } } } }
exists, missing
Filters documents where a specific field has a value in them (exists), or has no value in them (missing).
{ "constant_score" : { "filter" : { "exists" : { "field" : "user" } } } }
range, numeric_range
Unlike range query, range filter is cached; see range query for available parameters.
{ "constant_score" : { "filter" : { "range" : { "age" : { "from" : "10", "to" : "20", "include_lower" : true, "include_upper" : false } } } } }
The numeric_range filter loads relevant fields to memory and checks the numeric range; this requires more memory, but may be significantly faster.
Unlike range, numeric_range filter results are not cached by default. Set _cache to true to do so. But if the filter is reused, it's advisable to simply use range filter.
query
Wraps a query to be used as a filter:
{ "constantScore" : { "filter" : { "query" : { "query_string" : { "query" : "this AND that OR thus" } } } } }
This is not cached by default; to allow caching (note that the format differs a bit from other filters):
{ "constantScore" : { "filter" : { "fquery" : { "query" : { "query_string" : { "query" : "this AND that OR thus" } }, "_cache" : true } } } }
term, terms
Filters documents that have fields that contain a term (not analyzed), and is cached by default:
{ "constant_score" : { "filter" : { "term" : { "user" : "kimchy"} } } }
The terms filter simply accepts multiple terms, matching documents containing any of the terms; if you want to match all of the terms, use execution=and:
{ "constant_score" : { "filter" : { "terms" : { "user" : ["kimchy", "elasticsearch"], "execution" : "and" } } } }