Difference between revisions of "ElasticSearch Query DSL"
Line 135: | Line 135: | ||
=== query_string === | === query_string === | ||
+ | |||
+ | Uses query parser in order to parse its content. | ||
+ | |||
+ | <pre>{ | ||
+ | "query_string" : { | ||
+ | "default_field" : "content", | ||
+ | "query" : "this AND that OR thus" | ||
+ | } | ||
+ | }</pre> | ||
+ | |||
+ | ==== Parameters ==== | ||
+ | |||
+ | * ''query'' - actual query to be parsed. | ||
+ | * ''default_field'' - default field for query terms (if no prefix field specified); default ''index.query.default_field'' settings, which defaults to ''_all'', | ||
+ | * ''fields'' - run query against multiple fields (provided as array): | ||
+ | ** <tt>"fields" : ["content", "name"]</tt>, | ||
+ | ** optionally with boosting: <tt>"fields" : ["content", "name^5"]</tt>, | ||
+ | ** wildcards may be used for fields: <tt>"fields" : ["city.*"]</tt> if document contains object ''city'', | ||
+ | ** to check for existence of nonexistence of fields, use: <tt>_exists_:field1</tt> and <tt>_missing_:field</tt>, | ||
+ | * ''default_operator'' - default operator used (if none explicitly specified); e.g. with default operator ''OR'', the query "capital of Hungary" is translated to "capital OR of OR Hungary"; default is ''OR'', | ||
+ | * ''allow_leading_wildcard'' - are '''*''' or '''?''' allowed as the first character? default ''true'', | ||
+ | * ''lowercase_expanded_terms'' - should terms of wildcard, prefix, fuzzy, and range queries be automatically lower-cased? (since they are not analyzed); default ''true'', | ||
+ | * ''boost'' - boost value of the query; default 1.0, | ||
+ | * ''minimum_should_match'' - percent value ("20%") controlling how many "should" clauses in the resulting boolean query should match, | ||
+ | * ''lenient'' - if true, format based failures (like providing text to a numeric field) to be ignored. | ||
=== range === | === range === |
Revision as of 12:40, 23 January 2013
ES's Query DSL is a language for specifying queries in JSON.
This is by far not an exhaustive documentation, it's just stuff I use the most; see official documentation for more. Especially the boosting and scoring functionality is not documented here to proper extent.
Contents
Queries
match, multi_match
The match queries accept, analyze, and construct query out of text/numeric/date. The match family of queries does not go through a "query parsing" process. It does not support field name prefixes, wildcard characters, or other "advance" features.
Here, message is name of the field to match in (can be also _all):
{ "match" : { "message" : "this is a test" } }
By default, terms are OR'ed; to AND them:
{ "match" : { "message" : { "query" : "this is a test", "operator" : "and" } } }
To match a phrase:
{ "match_phrase" : { "message" : "this is a test" } }
or using the last word as prefix (the "as you type" search):
{ "match_phrase_prefix" : { "message" : "this is a test" } }
To match in multiple fields, with optional boosting, use:
{ "multi_match" : { "query" : "this is a test", "fields" : [ "subject^2", "message" ] } }
where matches in subject are "twice as important" as matched in message.
bool
The bool query provides a Boolean combination of queries with typed occurrence:
- must - clause must appear in matching documents,
- should - should appear; is no must clause is provided, at least one should clause must be matched; you can also specify minimum_number_should_match parameter,
- must_not appear.
{ "bool" : { "must" : { "term" : { "user" : "kimchy" } }, "must_not" : { "range" : { "age" : { "from" : 10, "to" : 20 } } }, "should" : [ { "term" : { "tag" : "wow" } }, { "term" : { "tag" : "elasticsearch" } } ], "minimum_number_should_match" : 1, "boost" : 1.0 } }
boosting
Boosting can be used to promote or demote search results:
{ "boosting" : { "positive" : { "term" : { "field1" : "value1" } }, "negative" : { "term" : { "field2" : "value2" } }, "negative_boost" : 0.2 } }
ids
Match by ID:
{ "ids" : { "type" : "my_type", "values" : ["1", "4", "100"] } }
Note: type field is optional, and may contain array of values.
field
Query only on a specified field (equivalent of query_string with default_field):
{ "field" : { "name.first" : "+something -else" } }
filtered
Filters results of a query; may be much faster than querying, as no scoring is done, and may be cached:
{ "filtered" : { "query" : { "term" : { "tag" : "wow" } }, "filter" : { "range" : { "age" : { "from" : 10, "to" : 20 } } } } }
query_string
Uses query parser in order to parse its content.
{ "query_string" : { "default_field" : "content", "query" : "this AND that OR thus" } }
Parameters
- query - actual query to be parsed.
- default_field - default field for query terms (if no prefix field specified); default index.query.default_field settings, which defaults to _all,
- fields - run query against multiple fields (provided as array):
- "fields" : ["content", "name"],
- optionally with boosting: "fields" : ["content", "name^5"],
- wildcards may be used for fields: "fields" : ["city.*"] if document contains object city,
- to check for existence of nonexistence of fields, use: _exists_:field1 and _missing_:field,
- default_operator - default operator used (if none explicitly specified); e.g. with default operator OR, the query "capital of Hungary" is translated to "capital OR of OR Hungary"; default is OR,
- allow_leading_wildcard - are * or ? allowed as the first character? default true,
- lowercase_expanded_terms - should terms of wildcard, prefix, fuzzy, and range queries be automatically lower-cased? (since they are not analyzed); default true,
- boost - boost value of the query; default 1.0,
- minimum_should_match - percent value ("20%") controlling how many "should" clauses in the resulting boolean query should match,
- lenient - if true, format based failures (like providing text to a numeric field) to be ignored.