Difference between revisions of "ElasticSearch Query DSL"

From PaskvilWiki
Jump to: navigation, search
Line 17: Line 17:
  
 
By default, terms are OR'ed; to '''AND''' them:
 
By default, terms are OR'ed; to '''AND''' them:
 
 
<pre>{
 
<pre>{
 
     "match" : {
 
     "match" : {
Line 137: Line 136:
  
 
Uses query parser in order to parse its content.
 
Uses query parser in order to parse its content.
 
 
<pre>{
 
<pre>{
 
     "query_string" : {
 
     "query_string" : {
Line 162: Line 160:
  
 
=== range ===
 
=== range ===
 +
 +
Matches documents by a provided range. For string fields, the ''TermRangeQuery'' is used, while for number/date fields, the query is a ''NumericRangeQuery''.
 +
<pre>{
 +
    "range" : {
 +
        "age" : {
 +
            "from" : 10,
 +
            "to" : 20,
 +
            "include_lower" : true,
 +
            "include_upper": false,
 +
            "boost" : 2.0
 +
        }
 +
    }
 +
}</pre>
 +
You can also use the following abbreviations:
 +
* ''gt'' = ''from'' + ''include_lower=false'',
 +
* ''gte'' = ''from'' + ''include_lower=true'',
 +
* ''lt'' = ''to'' + ''include_upper=false'',
 +
* ''lte'' = ''to'' + ''include_upper=true''.
  
 
== Filters ==
 
== Filters ==

Revision as of 12:45, 23 January 2013

ES's Query DSL is a language for specifying queries in JSON.

This is by far not an exhaustive documentation, it's just stuff I use the most; see official documentation for more. Especially the boosting and scoring functionality is not documented here to proper extent.

Queries

match, multi_match

The match queries accept, analyze, and construct query out of text/numeric/date. The match family of queries does not go through a "query parsing" process. It does not support field name prefixes, wildcard characters, or other "advance" features.

Here, message is name of the field to match in (can be also _all):

{
    "match" : {
        "message" : "this is a test"
    }
}

By default, terms are OR'ed; to AND them:

{
    "match" : {
        "message" : {
            "query" : "this is a test",
            "operator" : "and"
        }
    }
}

To match a phrase:

{
    "match_phrase" : {
        "message" : "this is a test"
    }
}

or using the last word as prefix (the "as you type" search):

{
    "match_phrase_prefix" : {
        "message" : "this is a test"
    }
}

To match in multiple fields, with optional boosting, use:

{
  "multi_match" : {
    "query" : "this is a test",
    "fields" : [ "subject^2", "message" ]
  }
}

where matches in subject are "twice as important" as matched in message.

bool

The bool query provides a Boolean combination of queries with typed occurrence:

  • must - clause must appear in matching documents,
  • should - should appear; is no must clause is provided, at least one should clause must be matched; you can also specify minimum_number_should_match parameter,
  • must_not appear.
{
    "bool" : {
        "must" : {
            "term" : { "user" : "kimchy" }
        },
        "must_not" : {
            "range" : {
                "age" : { "from" : 10, "to" : 20 }
            }
        },
        "should" : [
            {
                "term" : { "tag" : "wow" }
            },
            {
                "term" : { "tag" : "elasticsearch" }
            }
        ],
        "minimum_number_should_match" : 1,
        "boost" : 1.0
    }
}

boosting

Boosting can be used to promote or demote search results:

{
    "boosting" : {
        "positive" : {
            "term" : {
                "field1" : "value1"
            }
        },
        "negative" : {
            "term" : {
                "field2" : "value2"
            }
        },
        "negative_boost" : 0.2
    }
}

ids

Match by ID:

{
    "ids" : {
        "type" : "my_type",
        "values" : ["1", "4", "100"]
    }
}

Note: type field is optional, and may contain array of values.

field

Query only on a specified field (equivalent of query_string with default_field):

{
    "field" : { 
        "name.first" : "+something -else"
    }
}

filtered

Filters results of a query; may be much faster than querying, as no scoring is done, and may be cached:

{
    "filtered" : {
        "query" : {
            "term" : { "tag" : "wow" }
        },
        "filter" : {
            "range" : {
                "age" : { "from" : 10, "to" : 20 }
            }
        }
    }
}

query_string

Uses query parser in order to parse its content.

{
    "query_string" : {
        "default_field" : "content",
        "query" : "this AND that OR thus"
    }
}

Parameters

  • query - actual query to be parsed.
  • default_field - default field for query terms (if no prefix field specified); default index.query.default_field settings, which defaults to _all,
  • fields - run query against multiple fields (provided as array):
    • "fields" : ["content", "name"],
    • optionally with boosting: "fields" : ["content", "name^5"],
    • wildcards may be used for fields: "fields" : ["city.*"] if document contains object city,
    • to check for existence of nonexistence of fields, use: _exists_:field1 and _missing_:field,
  • default_operator - default operator used (if none explicitly specified); e.g. with default operator OR, the query "capital of Hungary" is translated to "capital OR of OR Hungary"; default is OR,
  • allow_leading_wildcard - are * or ? allowed as the first character? default true,
  • lowercase_expanded_terms - should terms of wildcard, prefix, fuzzy, and range queries be automatically lower-cased? (since they are not analyzed); default true,
  • boost - boost value of the query; default 1.0,
  • minimum_should_match - percent value ("20%") controlling how many "should" clauses in the resulting boolean query should match,
  • lenient - if true, format based failures (like providing text to a numeric field) to be ignored.

range

Matches documents by a provided range. For string fields, the TermRangeQuery is used, while for number/date fields, the query is a NumericRangeQuery.

{
    "range" : {
        "age" : { 
            "from" : 10, 
            "to" : 20, 
            "include_lower" : true, 
            "include_upper": false, 
            "boost" : 2.0
        }
    }
}

You can also use the following abbreviations:

  • gt = from + include_lower=false,
  • gte = from + include_lower=true,
  • lt = to + include_upper=false,
  • lte = to + include_upper=true.

Filters

and, or, not

bool

exists, missing

ids

limit