Searching¶
- Overview
- All about S:
S
- Where to search
- By default, S does a Match All
- Queries:
query
- Advanced queries:
Q
andquery_raw
- Filters:
filter
- Advanced filters:
F
andfilter_raw
- Query-time field boosting:
boost
- Ordering:
order_by
- Demoting:
demote
- Highlighting:
highlight
- Suggestions:
suggest
- Facets
- Scores and explanations
All about S: S
¶
What is S?¶
elasticutils.S
helps you define an Elasticsearch
search.
searcher = S()
This creates an untyped elasticutils.S
using the
defaults:
- uses an
elasticsearch.client.Elasticsearch
instance configured to connect tolocalhost
– callelasticutils.S.es()
to specify connection parameters - searches across all indexes – call
elasticutils.S.indexes()
to specify indexes - searches across all doctypes – call
elasticutils.S.doctypes()
to specify doctypes
S is chainable¶
elasticutils.S
has methods that return a new S instance
with the additional specified criteria. In this way S is chainable and
you can reuse S objects for your searches.
For example:
s1 = S()
s2 = s1.query(content__text='tabs')
s3 = s2.filter(awesome=True)
s4 = s2.filter(awesome=False)
s1, s2, and s3 are all different S objects. s1 is a match all.
s2 has a query.
s3 has everything in s2 with a awesome=True
filter.
s4 has everything in s2 with a awesome=False
filter.
S can be typed and untyped¶
When you create an elasticutils.S
with no type, it’s
called an untyped S. By default, search results for a untyped S
are returned in the form of a sequence of
elasticutils.DefaultMappingType
instances. You can
explicitly state that you want a sequence of dicts or lists, too. See
S results can be returned in many shapes for more details on how to return results in
various formats.
You can also construct a typed S which is an S with a
elasticutils.MappingType
subclass. By default, search
results for a typed S are returned in the form of a sequence of
instances of that type. See Mapping types and Indexables for more about
MappingTypes.
S can be sliced to return the results you want¶
By default Elasticsearch gives you the first 10 results.
If you want something different than that, elasticutils.S
supports slicing allowing you to get back the specific results you’re
looking for.
For example:
some_s = S()
results = list(some_s) # returns first 10 results (default)
results = list(some_s[:10]) # returns first 10 results
results = list(some_s[10:20]) # returns results 10 through 19
The slicing is chainable, too:
some_s = S()[:10]
first_ten_pitchers = some_s.filter(position='pitcher')
Note
The slicing happens on the Elasticsearch side—it doesn’t pull all the results back and then slice them in Python. Ew.
Note
Unlike slicing other things in Python, if you choose a start, but no end, then you get 10 results starting with the start.
In other words, this:
some_s = S()[10:]
does not give you all the results from index 10 onwards. Instead it gives you results 10 through 19.
If you want “all the results from index 10 onwards”, then you could do something like this:
SOME_LARGE_NUMBER = 1000000
some_s = S()[10:SOME_LARGE_NUMBER]
If you know you have fewer results than SOME_LARGE_NUMBER
or you
could do this which will kick off two Elasticsearch queries:
some_s = S()[10:some_s.count()]
Note that doing open-ended queries like this has the same
ramifications as calling
elasticutils.S.everything()
. Refer to that documentation
for the fearsome details.
See also
- http://www.elasticsearch.org/guide/reference/api/search/from-size.html
- Elasticsearch from / size documentation
S is lazy¶
The search won’t execute until you do one of the following:
- use the
elasticutils.S
in an iterable context - call
len()
on aelasticutils.S
- call the
elasticutils.S.execute()
,elasticutils.S.everything()
,elasticutils.S.count()
,elasticutils.S.suggestions()
orelasticutils.S.facet_counts()
methods
Once you execute the search, then it will cache the results and
further executions of that elasticutils.S
won’t result in
another roundtrip to your Elasticsearch cluster.
S results can be returned in many shapes¶
An untyped S (e.g. S()
) will return instances of
elasticutils.DefaultMappingType
by default.
A typed S (e.g. S(FooMappingType)
), will return instances of
that type (e.g. type FooMappingType
) by default.
elasticutils.S.values_list()
gives you a list of
tuples. See documentation for more details.
elasticutils.S.values_dict()
gives you a list of dicts. See
documentation for more details.
If you use elasticutils.S.execute()
, you get back a
elasticutils.SearchResults
instance which has additional
useful bits including the raw response from Elasticsearch. See
documentation for details.
Where to search¶
Specifying connection parameters: es
¶
elasticutils.S
will generate an
elasticsearch.client.Elasticsearch
object that connects
to localhost
by default. That’s usually not what you want. You can use the
elasticutils.S.es()
method to specify the arguments used to create the
elasticsearch-py Elasticsearch object.
Examples:
q = S().es(urls=['localhost'])
q = S().es(urls=['localhost:9200'], timeout=10)
See elasticutils.get_es()
for the list of arguments you
can pass in.
Specifying indexes to search: indexes
¶
An untyped S will search all indexes by default.
A typed S will search the index returned by the
elasticutils.MappingType.get_index()
method.
If that’s not what you want, use the
elasticutils.S.indexes()
method.
For example, this searches all indexes:
q = S()
This searches just “someindex”:
q = S().indexes('someindex')
This searches “thisindex” and “thatindex”:
q = S().indexes('thisindex', 'thatindex')
This searches whatever FooMappingType.get_index()
returns:
q = S(FooMappingType)
Specifying doctypes to search: doctypes
¶
An untyped S will search all doctypes by default.
A typed S will search the doctype returned by the
elasticutils.MappingType.get_mapping_type_name()
method.
If that’s not what you want, then you should use the
elasticutils.S.doctypes()
method.
For example, this searches all doctypes:
q = S()
This searches just the “sometype” doctype:
q = S().doctypes('sometype')
This searches “thistype” and “thattype”:
q = S().doctypes('thistype', 'thattype')
By default, S does a Match All¶
By default, elasticutils.S
with no filters or queries
specified will do a match_all
query in Elasticsearch.
See also
- http://www.elasticsearch.org/guide/reference/query-dsl/match-all-query.html
- Elasticsearch match_all documentation
Queries: query
¶
Queries are specified using the elasticutils.S.query()
method. See those docs for API details.
ElasticUtils uses this syntax for specifying queries:
fieldname__fieldaction=value
- fieldname: the field the query applies to
- fieldaction: the kind of query it is
- value: the value to query for
The fieldname and fieldaction are separated by __
(that’s two
underscores).
For example:
q = S().query(title__match='taco trucks')
will do an Elasticsearch match query on the title field for “taco trucks”.
There are many different field actions to choose from:
field action | Elasticsearch query type |
---|---|
(no action specified) | Term query |
term | Term query |
terms | Terms query |
in | Terms query |
text | Text query (DEPRECATED) |
match | Match query [1] |
prefix | Prefix query [2] |
gt, gte, lt, lte | Range query |
range | Range query [4] |
fuzzy | Fuzzy query |
wildcard | Wildcard query |
text_phrase | Text phrase query (DEPRECATED) |
match_phrase | Match phrase query [1] |
query_string | Querystring query [3] |
distance | Geo distance query [5] |
[1] | (1, 2) Elasticsearch 0.19.9 renamed text queries to match queries. If you’re using Elasticsearch 0.19.9 or later, you should use match and match_phrase. If you’re using a version prior to 0.19.9 use text and text_phrase. |
[2] | You can also use startswith , but that’s deprecated. |
[3] | When doing query_string queries, if the query text is malformed
it’ll raise a SearchPhaseExecutionException exception. |
[4] | The range field action is a shortcut for defining both sides of
the range at once. The range is inclusive on both sides and accepts
a tuple with the lower value first and upper value second. |
[5] | The distance field need accepts a tuple with distance, latitude and longitude
where distance is a string like 5km. |
See also
- http://www.elasticsearch.org/guide/reference/query-dsl/
- Elasticsearch docs for query dsl
- http://www.elasticsearch.org/guide/reference/query-dsl/term-query.html
- Elasticsearch docs on term queries
- http://www.elasticsearch.org/guide/reference/query-dsl/terms-query.html
- Elasticsearch docs on terms queries
- http://www.elasticsearch.org/guide/reference/query-dsl/text-query.html
- Elasticsearch docs on text and text_phrase queries
- http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html
- Elasticsearch docs on match and match_phrase queries
- http://www.elasticsearch.org/guide/reference/query-dsl/prefix-query.html
- Elasticsearch docs on prefix queries
- http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html
- Elasticsearch docs on range queries
- http://www.elasticsearch.org/guide/reference/query-dsl/fuzzy-query.html
- Elasticsearch docs on fuzzy queries
- http://www.elasticsearch.org/guide/reference/query-dsl/wildcard-query.html
- Elasticsearch docs on wildcard queries
- http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html
- Elasticsearch docs on query_string queries
- http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-filter.html
- Elasticsearch docs on geo_distance queries
Advanced queries: Q
and query_raw
¶
calling .query() multiple times¶
Calling elasticutils.S.query()
multiple times will combine
all the queries together.
should, must and must_not¶
By default all queries must match a document in order for the document to show up in the search results.
You can alter this behavior by flagging your queries with should
,
must
, and must_not
flags.
should
A query added with
should=True
affects the score for a result, but it won’t prevent the document from being in the result set.Example:
qs = S().query(title__text='castle', summary__text='castle', should=True)If the document matches either the
title__text
or thesummary__text
then it’s included in the results set. It doesn’t have to match both.
must
This is the default, so if you don’t specify, then it’s a must.
A query added with
must=True
must match in order for the document to be in the result set.Example:
qs = S().query(title__text='castle', summary__text='castle') qs = S().query(title__text='castle', summary__text='castle', must=True)These two are equivalent. The document must match both the
title__text
andsummary__text
queries in order to be included in the result set. If it doesn’t match one of them, then it’s not included.
must_not
A query added with
must_not=True
must NOT match in order for the document to be in the result set.Example:
qs = (S().query(title__text='castle') .query(author='castle', must_not=True))For a document to be included in the result set, it must match the
title__text
query and must NOT match theauthor
query. I.e. The title must have “castle”, but the document can’t have been written by someone with “castle” in their name.
The Q class¶
You can manipulate query units with the elasticutils.Q
class. For example, you can incrementally build your query:
q = Q()
if search_authors:
q += Q(author_name=search_text, should=True)
if search_keywords:
q += Q(keyword=search_text, should=True)
q += Q(title__text=search_text, summary__text=search_text,
should=True)
The +
Python operator will combine two Q instances together and
return a new instance.
You can then use one or more Q classes in a query call:
if search_authors:
q += Q(author_name=search_text, should=True)
if search_keywords:
q += Q(keyword=search_text, should=True)
q += Q(title__text=search_text, summary__text=search_text,
should=True)
s = S().query(q)
query_raw¶
elasticutils.S.query_raw()
lets you explicitly define the
query portion of an Elasticsearch search.
For example:
q = S().query_raw({'match': {'title': 'example'}})
This will override all .query()
calls you’ve made in your
elasticutils.S
before and after the .query_raw call.
This is helpful if ElasticUtils is missing functionality you need.
adding new query actions¶
You can subclass elasticutils.S
and add handling for
additional query actions. This is helpful in two circumstances:
- ElasticUtils doesn’t have support for that query type
- ElasticUtils doesn’t support that query type in a way you need—for example, ElasticUtils uses different argument values
See elasticutils.S
for more details on how to do this.
Filters: filter
¶
Filters are specified using the elasticutils.S.filter()
method. See those docs for API details.
q = S().filter(language='korean')
will do a search and only return results where the language is Korean.
elasticutils.S.filter()
uses the same syntax for specifying
fields, actions and values as elasticutils.S.query()
.
field action | elasticsearch filter |
---|---|
in | Terms filter |
gt, gte, lt, lte | Range filter |
range | Range filter [6] |
prefix, startswith | Prefix filter |
(no action) | Term filter |
[6] | The range field action is a shortcut for defining both sides of
the range at once. The range is inclusive on both sides and accepts
a tuple with the lower value first and upper value second. |
You can also filter on fields that have None
as a value or have no
value:
q = S().filter(language=None)
This uses the Elasticsearch Missing filter.
Note
In order to filter on fields that have None
as a value, you
have to tell Elasticsearch that the field can have null values. To
do this, you have to add null_value: True
to the mapping for
that field.
http://www.elasticsearch.org/guide/reference/mapping/core-types.html
See also
- http://www.elasticsearch.org/guide/reference/query-dsl/
- Elasticsearch docs for query dsl
- http://www.elasticsearch.org/guide/reference/query-dsl/terms-filter.html
- Elasticsearch docs for terms filter
- http://www.elasticsearch.org/guide/reference/query-dsl/range-filter.html
- Elasticsearch docs for range filter
- http://www.elasticsearch.org/guide/reference/query-dsl/prefix-filter.html
- Elasticsearch docs for prefix filter
- http://www.elasticsearch.org/guide/reference/query-dsl/term-filter.html
- Elasticsearch docs for term filter
- http://www.elasticsearch.org/guide/reference/query-dsl/missing-filter.html
- Elasticsearch docs for missing filter
Advanced filters: F
and filter_raw
¶
and vs. or¶
Calling filter multiple times is equivalent to an “and”ing of the filters.
For example:
q = (S().filter(style='korean')
.filter(price='FREE'))
will do a query for style ‘korean’ AND price ‘FREE’. Anything that has a style other than ‘korean’ or a price other than ‘FREE’ is removed from the result set.
You can do the same thing by putting both filters in the same
elasticutils.S.filter()
call.
For example:
q = S().filter(style='korean', price='FREE')
The F class¶
Suppose you want either Korean or Mexican food. For that, you need an “or”. You can do something like this:
q = S().filter(or_={'style': 'korean', 'style':'mexican'})
But, wow—that’s icky looking and not particularly helpful!
So, we’ve also got an elasticutils.F()
class that makes this
sort of thing easier.
You can do the previous example with F
like this:
q = S().filter(F(style='korean') | F(style='mexican'))
will get you all the search results that are either “korean” or “mexican” style.
What if you want Mexican food, but only if it’s FREE, otherwise you want Korean?:
q = S().filter(F(style='mexican', price='FREE') | F(style='korean'))
F supports &
(and), |
(or) and ~
(not) operations.
Additionally, you can create an empty F and build it incrementally:
qs = S()
f = F()
if some_crazy_thing:
f &= F(price='FREE')
if some_other_crazy_thing:
f |= F(style='mexican')
qs = qs.filter(f)
If neither some_crazy_thing or some_other_crazy_thing are
True
, then F will be empty. That’s ok because empty filters are
ignored.
filter_raw¶
elasticutils.S.filter_raw()
lets you explicitly define
the filter portion of an Elasticsearch search.
For example:
qs = S().filter_raw({'term': {'title': 'foo'}})
This will override all .filter()
calls you’ve made in your
elasticutils.S
before and after the .filter_raw call.
This is helpful if ElasticUtils is missing functionality you need.
adding new filteractions¶
You can subclass elasticutils.S
and add handling for
additional filter actions. This is helpful in two circumstances:
- ElasticUtils doesn’t have support for that filter type
- ElasticUtils doesn’t support that filter type in a way you need—for example, ElasticUtils uses different argument values
See elasticutils.S
for more details on how to do this.
Query-time field boosting: boost
¶
ElasticUtils allows you to specify query-time field boosts with
elasticutils.S.boost()
.
These boosts take effect at the time the query is executing. After the query has executed, then the boost is applied and that becomes the final score for the query.
This is a useful way to weight queries for some fields over others.
See elasticutils.S.boost()
for more details.
Note
Boosts are ignored if you use query_raw.
Ordering: order_by
¶
ElasticUtils elasticutils.S.order_by()
lets you change the
order of the search results.
See elasticutils.S.order_by()
for more details.
See also
- http://www.elasticsearch.org/guide/reference/api/search/sort.html
- Elasticsearch docs on sort parameter in the Search API
Demoting: demote
¶
You can demote documents that match query criteria:
q = (S().query(title='trucks')
.demote(0.5, description__text='gross'))
This does a query for trucks, but demotes any that have “gross” in the description with a fraction boost of 0.5.
Note
You can only call elasticutils.S.demote()
once. Calling it
again overwrites previous calls.
This is implemented using the boosting query in Elasticsearch.
Anything you specify with elasticutils.S.query()
goes into
the positive section. The negative query and negative boost
portions are specified as the first and second arguments to
elasticutils.S.demote()
.
Note
Order doesn’t matter. So:
q = (S().query(title='trucks')
.demote(0.5, description__text='gross'))
does the same thing as:
q = (S().demote(0.5, description__text='gross')
.query(title='trucks'))
See also
- http://www.elasticsearch.org/guide/reference/query-dsl/boosting-query.html
- Elasticsearch docs on boosting query (which are as clear as mud)
Highlighting: highlight
¶
ElasticUtils can highlight excerpts for search results.
See elasticutils.S.highlight()
for more details.
See also
- http://www.elasticsearch.org/guide/reference/api/search/highlighting.html
- Elasticsearch docs for highlight
Suggestions: suggest
¶
Spelling suggestions can be asked for by using the
elasticutils.S.suggest()
method, and then retrieved
in elasticutils.S.suggestions()
:
q = S().query(text='Aice').suggest('mysuggest', 'Alice', field='text')
print q.suggestions()['mysuggest'][0]['options']
Note
Spelling suggestions require Elasticsearch 0.90 or later.
See also
- http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters.html
- Elasticsearch docs for suggesters
Facets¶
Basic facets: facet
¶
q = (S().query(title='taco trucks')
.facet('style', 'location'))
will do a query for “taco trucks” and return terms facets for the
style
and location
fields.
Note that the fieldname you provide in the
elasticutils.S.facet()
call becomes the facet name as well.
The facet counts are available through
elasticutils.S.facet_counts()
. For example:
q = (S().query(title='taco trucks')
.facet('style', 'location'))
counts = q.facet_counts()
Also, you can get them with the facets
attribute of the search results:
q = (S().query(title='taco trucks')
.facet('style', 'location'))
results = q.execute()
counts = results.facets
You can also restrict the number of terms returned per facet by
passing a size
keyword argument to
elasticutils.S.facet()
:
q = S().query(title='taco trucks')
.facet('style', 'location', size=5)
See also
- http://www.elasticsearch.org/guide/reference/api/search/facets/
- Elasticsearch docs on facets
- http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html
- Elasticsearch docs on terms facet
Facet Results¶
The execution methods elasticutils.S.facet_counts()
and
elasticutils.S.execute()
will return a dictionary containing
the named parameter and a elasticutils.FacetResult
object.
For example:
>>> facet_counts = S().facet('primary_country_id').facet_counts()
>>> facet_counts
{u'primary_country_id': <elasticutils.FacetResult at 0x45f12d0>}
The FacetResult object contains all of the information returned in the facet stanza.
In the above case, we faceted on primary_country_id as a terms facet. To see the facet results simply iterate over the FacetResult object:
>>> for facet_result in facet_counts['primary_country_id']:
... print facet_result
...
{u'count': 187293, u'term': 41}
{u'count': 24177, u'term': 9}
{u'count': 17200, u'term': 50}
{u'count': 13015, u'term': 15}
{u'count': 10296, u'term': 30}
{u'count': 8824, u'term': 32}
{u'count': 7703, u'term': 6}
{u'count': 7502, u'term': 23}
{u'count': 5614, u'term': 2}
{u'count': 5214, u'term': 33}
And to get the “other”, “missing” and “total” information from the facetresult:
>>> facet_counts['primary_country_id'].missing
3475
>>> facet_counts['primary_country_id'].other
25273
>>> facet_counts['primary_country_id'].total
312111
FacetResult is backwords compatible with older versions of ElasticUtils, so you shouldn’t need to change anything when upgrading:
>>> some_s = S().facet_raw(primary_country_id={'statistical':{"field":"primary_country_id"}})
>>> facet_counts = some_s.facet_counts()
>>> facet_counts['primary_country_id'].max == facet_counts['primary_country_id']['max']
True
Facets and scope (filters and global)¶
What happens if your search includes filters?
Here’s an example:
q = (S().query(title='taco trucks')
.filter(style='korean')
.facet('style', 'location'))
The “style” and “location” facets here ONLY apply to the results of the query and are not affected at all by the filters.
If you want your filters to apply to your facets as well, pass in the filtered flag.
For example:
q = (S().query(title='taco trucks')
.filter(style='korean')
.facet('style', 'location', filtered=True))
What if you want the filters to apply just to one of the facets and not the other? You need to add them incrementally.
For example:
q = (S().query(title='taco trucks')
.filter(style='korean')
.facet('style', filtered=True)
.facet('location'))
What if you want the facets to apply to the entire corpus and not just the results from the query? Use the global_ flag.
For example:
q = (S().query(title='taco trucks')
.filter(style='korean')
.facet('style', 'location', global_=True))
Note
The flag name is global_ with an underscore at the end. Why? Because global with no underscore is a Python keyword.
See also
- http://www.elasticsearch.org/guide/reference/api/search/facets/
- Elasticsearch docs on facets, facet_filter, and global
- http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html
- Elasticsearch docs on terms facet
Facets... RAW!: facet_raw
¶
Elasticsearch facets can do a lot of other things. Because of this,
there exists elasticutils.S.facet_raw()
which will do
whatever you need it to. Specify key/value args by facet name.
You could do the first facet example with:
q = (S().query(title='taco trucks')
.facet_raw(style={'terms': {'field': 'style'}}))
One of the things this lets you do is scripted facets.
For example:
q = (S().query(title='taco trucks')
.facet_raw(styles={
'field': 'style',
'script': 'term == korean ? true : false'
}))
Warning
If for some reason you have specified a facet with the same name
using both elasticutils.S.facet()
and
elasticutils.S.facet_raw()
, the facet_raw stuff will
override the facet stuff.
See also
- http://www.elasticsearch.org/guide/reference/modules/scripting.html
- Elasticsearch docs on scripting
Filter and query facets¶
You can also define arbitrary facets for queries and facets as documented in Elasticsearch’s docs.
For example:
q = (S().query(title='taco trucks')
.facet_raw(korean_or_mexican={
'filter': {
'or': [
{'term': {'style': 'korean'}},
{'term': {'style': 'mexican'}},
]
}
}))
Then access the custom facet via the name you passed into facet_raw
:
counts = q.facet_counts()
korean_or_mexican_count = counts['korean_or_mexican']['count']
The same can be done with queries:
q = (S().query(title='taco trucks')
.facet_raw(korean={
'query': {
'term': {'style': 'korean'},
}
}))
See also
- http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-query-facet.html
- Elasticsearch docs on query facets
- http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-filter-facet.html
- Elasticsearch docs on filter facets
Scores and explanations¶
Seeing the score¶
Wondering what the score for a document was? ElasticUtils puts that in
the score
attribute of the es_meta
object of the search result.
For example, let’s search an index that holds knowledge base articles
for ones with the word “crash” in them and print out the scores:
q = S().query(title__text='crash', content__text='crash')
for result in q:
print result.es_meta.score
This works regardless of what form the search results are in.
Getting an explanation: explain
¶
Wondering why one document shows up higher in the results than another
that should have shown up higher? Wonder how that score was computed?
You can set the search to pass the explain
flag to Elasticsearch
with elasticutils.S.explain()
.
ElasticUtils puts the explanation in the explanation
attribute
of the es_meta
object of the search result.
For example, let’s do a query on a search corpus of knowledge base articles for articles with the word “crash” in them:
q = (S().query(title__text='crash', content__text='crash')
.explain())
for result in q:
print result.es_meta.explanation
This works regardless of what form the search results are in.
See also
- http://www.elasticsearch.org/guide/reference/api/search/explain.html
- Elasticsearch docs on explain (which are pretty bereft of details).