Difference between revisions of "ElasticSearch Search"

From PaskvilWiki
Jump to: navigation, search
Line 4: Line 4:
  
 
Search is broadcasted to all the index/indices shards; to limit the scope, you can use ''routing'' parameter - e.g. using user ID when searching through tweets by given user. The ''routing'' parameter may be multivalued, CSV.
 
Search is broadcasted to all the index/indices shards; to limit the scope, you can use ''routing'' parameter - e.g. using user ID when searching through tweets by given user. The ''routing'' parameter may be multivalued, CSV.
 +
 +
See also [http://www.elasticsearch.org/guide/reference/api/search/highlighting.html highlighting], and other topics not covered here...
  
 
== Request Body ==
 
== Request Body ==

Revision as of 17:18, 23 January 2013

official documentation

Search can be executed across indices and types, with query string as a parameter, or using a request body.

Search is broadcasted to all the index/indices shards; to limit the scope, you can use routing parameter - e.g. using user ID when searching through tweets by given user. The routing parameter may be multivalued, CSV.

See also highlighting, and other topics not covered here...

Request Body

Request body uses Query DSL.

$ curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '{
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}'
{
    "_shards":{
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits":{
        "total" : 1,
        "hits" : [
            {
                "_index" : "twitter",
                "_type" : "tweet",
                "_id" : "1", 
                "_source" : {
                    "user" : "kimchy",
                    "postDate" : "2009-11-15T14:12:12",
                    "message" : "trying out Elastic Search"
                }
            }
        ]
    }
}

Parameters

  • timeout - search timeout, bounding the search request to be executed within the specified time; default no timeout,
  • from - the starting from index of the hits to return; default 0,
  • size - the number of hits to return; default 10,
  • search_type - the type of the search operation to perform - dfs_query_then_fetch, dfs_query_and_fetch, query_then_fetch, query_and_fetch; defaults query_then_fetch; see Search Type for more details on the different types of search that can be performed.

URI Request

A search request can be executed purely using a URI by providing request parameters.

$ curl -XGET 'http://localhost:9200/twitter/tweet/_search?q=user:kimchy'

Parameters

  • q - query string (maps to the query_string query, see Query String Query for more details),
  • df - default field to use when no field prefix is defined,
  • default_operator - default operator to be used - AND or OR, default OR,
  • explain - include explanation of how scoring of the each hits was computed,
  • fields - selective fields of the document to return, CSV; default internal _source field; empty value will cause no fields to return,
  • sort - sorting to perform - fieldName, fieldName:asc, or fieldName:desc; fieldName can either be an actual field, or _score; there can be several sort parameters (CSV, order is important),
  • track_scores - when sorting, set to true in order to return score as part of each hit,
  • timeout - search timeout, limiting the execution time; all results accumulated up to timeout are returned; defaults to no timeout,
  • from - starting from index of the hits to return; defaults 0,
  • size - number of hits to return; defaults 10,
  • lowercase_expanded_terms - should terms be automatically lowercased or not; default true,
  • analyze_wildcard - should wildcard and prefix queries be analyzed or not; default false.

Query Element

See Query DSL for details.

{
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

Filter Element

When doing things like facet navigation, sometimes only the hits are needed to be filtered by the chosen facet, and all the facets should continue to be calculated based on the original query. The filter element within the search request can be used to accomplish it.

Note, this is different compared to creating a filtered query with the filter, since this will cause the facets to only process the filtered results.

In other words, using

{
    "query" : { "term" : { "message" : "something" } },
    "filter" : { "term" : { "tag" : "green" } },
    "facets" : { "tag" : { "terms" : { "field" : "tag" } } }
}

the filter will not change the facets (the results of facets will be the same as without the filter element), while the results set will be different.

But using filtered query:

{
    "filtered" : {
        "query" : { "message" : "something" },
        "filter" : { "term" : { "tag" : "green" } }
        }
    },
    "facets" : { "tag" : { "terms" : { "field" : "tag" } } }
}

the filter field within the filtered query element will change the facets, influencing both the results set and the facets.

To filter the facets, you can use facet_filter element.

From and Size, Pagination

{
    "from" : 0, "size" : 10,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

Indices and Types

  • specific index and type
$ curl -XGET 'http://localhost:9200/twitter/user/_search?q=user:kimchy'
  • specific index, multiple types
$ curl -XGET 'http://localhost:9200/twitter/user,tweet/_search?q=user:kimchy'
  • multiple indices, all types
$ curl -XGET 'http://localhost:9200/twitter,facebook/_search?q=user:kimchy'
  • all indices, specific type
$ curl -XGET 'http://localhost:9200/_all/tweet/_search?q=user:kimchy'
  • all indices and types
$ curl -XGET 'http://localhost:9200/_search?q=user:kimchy'

Sorting

The sort is defined on a per field level, with special field name for _score to sort by score:

{
    "sort" : [
        { "post_date" : {"order" : "asc"} },
        "user",
        { "name" : "desc" },
        { "age" : "desc" },
        "_score"
    ],
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

The sort values for each document returned are also returned as part of the response.

Missing Numeric Fields

Numeric fields support specific handling for missing fields in a doc. The missing value can be _last, _first, or a custom value (that will be used for missing docs as the sort value). For example:

{
    "sort" : [
        { "price" : {"missing" : "_last"} },
    ],
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

Missing Mapping for Field

By default, the search request will fail if there is no mapping associated with a field. The ignore_unmapped option allows to ignore fields that have no mapping and not sort by them. Here is an example of how it can be used:

{
    "sort" : [
        { "price" : {"ignore_unmapped" : true} },
    ],
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

GeoDistance

You can also sort by _geo_distance:

{
    "sort" : [
        {
            "_geo_distance" : {
                "pin.location" : [-70, 40],
                "order" : "asc",
                "unit" : "km"
            }
        }
    ],
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

The _geo_distance pin may be provided as:

properties:         "pin.location" : { "lat" : 40, "lon", -70 }
string:             "pin.location" : "-70,40"
geohash:            "pin.location" : "drm3btev3e86"
array:              "pin.location" : [-70, 40]

Script Based Sorting

{
    "query" : {
        ....
    },
    "sort" : {
        "_script" : { 
            "script" : "doc['field_name'].value * factor",
            "type" : "number",
            "params" : {
                "factor" : 1.1
            },
            "order" : "asc"
        }
    }
}

Note: for single field based sorting, use custom_score query - it's faster.

Scores

When sorting on a field, scores are not computed. By setting track_scores to true, scores will still be computed and tracked.

{
    "track_scores": true,
    "sort" : [
        { "post_date" : {"reverse" : true} },
        { "name" : "desc" },
        { "age" : "desc" }
    ],
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

Note

Beware that all relevant fields used for sorting have to be loaded to memory.

When sorting on string fields, the field sorted on should not be analyzed/tokenized. For numeric types, it is recommended to explicitly set the type - short, integer, float, ... .