API docs

Functions

elasticutils.get_es(urls=None, timeout=5, force_new=False, **settings)

Create a pyelasticsearch ElasticSearch object and return it.

This will aggressively re-use ElasticSearch objects with the following rules:

  1. if you pass the same argument values to get_es(), then it will return the same ElasticSearch object
  2. if you pass different argument values to get_es(), then it will return different ElasticSearch object
  3. it caches each ElasticSearch object that gets created
  4. if you pass in force_new=True, then you are guaranteed to get a fresh ElasticSearch object AND that object will not be cached
Parameters:
  • urls – list of uris; Elasticsearch hosts to connect to, defaults to ['http://localhost:9200']
  • timeout – int; the timeout in seconds, defaults to 5
  • force_new – Forces get_es() to generate a new ElasticSearch object rather than pulling it from cache.
  • settings – other settings to pass into ElasticSearch constructor; See http://pyelasticsearch.readthedocs.org/en/latest/api/ for more details.

Examples:

# Returns cached ElasticSearch object
es = get_es()

# Returns a new ElasticSearch object
es = get_es(force_new=True)

es = get_es(urls=['http://localhost:9200'])

es = get_es(urls=['http://localhost:9200'], timeout=10,
            max_retries=3)

The S class

class elasticutils.S(type_=None)

Represents a lazy Elasticsearch Search API request.

The API for S takes inspiration from Django’s QuerySet.

S can be either typed or untyped. An untyped S returns dict results by default.

An S is lazy in the sense that it doesn’t do an Elasticsearch search request until it’s forced to evaluate by either iterating over it, calling .count, doing len(s), or calling .facet_count.

Adding support for other queries

You can add support for queries that S doesn’t have support for by subclassing S with a method called process_query_ACTION. This method takes a key, value and an action.

For example:

claass FunkyS(S):
    def process_query_funkyquery(self, key, val, action):
        return {'funkyquery': {'field': key, 'value': val}}

Then you can use that just like other actions:

s = FunkyS().query(Q(foo__funkyquery='bar'))
s = FunkyS().query(foo__funkyquery='bar')

Many Elasticsearch queries take other arguments. This is a good way of using different arguments. For example, if you wanted to write a handler for fuzzy for dates, you could do:

claass FunkyS(S):
    def process_query_fuzzy(self, key, val, action):
        # val here is a (value, min_similarity) tuple
        return {
            'funkyquery': {
                key: {
                    'value': val[0],
                    'min_similarity': val[1]
                }
            }
        }

Used:

s = FunkyS().query(created__fuzzy=(created_dte, '1d'))

Adding support for other filters

You can add support for filters that S doesn’t have support for by subclassing S with a method called process_filter_ACTION. This method takes a key, value and an action.

For example:

claass FunkyS(S):
    def process_filter_funkyfilter(self, key, val, action):
        return {'funkyfilter': {'field': key, 'value': val}}

Then you can use that just like other actions:

s = FunkyS().filter(F(foo__funkyfilter='bar'))
s = FunkyS().filter(foo__funkyfilter='bar')
__init__(type_=None)

Create and return an S.

Parameters:type – class; the model that this S is based on

Chaining transforms

query(*queries, **kw)

Return a new S instance with query args combined with existing set in a must boolean query.

Parameters:
  • queries – instances of Q
  • kw – queries in the form of field__action=value

There are three special flags you can use:

  • must=True: Specifies that the queries and kw queries must match in order for a document to be in the result.

    If you don’t specify a special flag, this is the default.

  • should=True: Specifies that the queries and kw queries should match in order for a document to be in the result.

  • must_not=True: Specifies the queries and kw queries must not match in order for a document to be in the result.

These flags work by putting those queries in the appropriate clause of an Elasticsearch boolean query.

Examples:

>>> s = S().query(foo='bar')
>>> s = S().query(Q(foo='bar'))
>>> s = S().query(foo='bar', bat__text='baz')
>>> s = S().query(foo='bar', should=True)
>>> s = S().query(foo='bar', should=True).query(baz='bat', must=True)

Notes:

  1. Don’t specify multiple special flags, but if you did, should takes precedence.
  2. If you don’t specify any, it defaults to must.
  3. You can specify special flags in the elasticutils.Q, too. If you’re building your query incrementally, using elasticutils.Q helps a lot.

See the documentation on elasticutils.Q for more details on composing queries with Q.

See the documentation on elasticutils.S for more details on adding support for more query types.

query_raw(query)

Return a new S instance with a query_raw.

Parameters:query – Python dict specifying the complete query to send to Elasticsearch

Example:

S().query_raw({'match': {'title': 'example'}})

Note

If there’s a query_raw in your S, then that’s your query. All .query(), .demote(), .boost() and anything else that affects the query clause is ignored.

filter(*filters, **kw)

Return a new S instance with filter args combined with existing set with AND.

Parameters:
  • filters – this will be instances of F
  • kw – this will be in the form of field__action=value

Examples:

>>> s = S().filter(foo='bar')
>>> s = S().filter(F(foo='bar'))
>>> s = S().filter(foo='bar', bat='baz')
>>> s = S().filter(foo='bar').filter(bat='baz')

By default, everything is combined using AND. If you provide multiple filters in a single filter call, those are ANDed together. If you provide multiple filters in multiple filter calls, those are ANDed together.

If you want something different, use the F class which supports & (and), | (or) and ~ (not) operators. Then call filter once with the resulting F instance.

See the documentation on elasticutils.F for more details on composing filters with F.

See the documentation on elasticutils.S for more details on adding support for new filter types.

filter_raw(filter_)

Return a new S instance with a filter_raw.

Parameters:filter – Python dict specifying the complete filter to send to Elasticsearch

Example:

S().filter_raw({'term': {'title': 'example'}})

Note

If there’s a filter_raw in your S, then that’s your filter. All .filter() and anything else that affects the filter clause is ignored.

order_by(*fields)

Return a new S instance with results ordered as specified

You can change the order search results by specified fields:

q = (S().query(title='trucks')
        .order_by('title')

This orders search results by the title field in ascending order.

If you want to sort by descending order, prepend a -:

q = (S().query(title='trucks')
        .order_by('-title')

You can also sort by the computed field _score or pass a dict as a sort field in order to use more advanced sort options. Read the Elasticsearch documentation for details.

Note

Calling this again will overwrite previous .order_by() calls.

boost(**kw)

Return a new S instance with field boosts.

ElasticUtils allows you to specify query-time field boosts with .boost(). It takes a set of arguments where the keys are either field names or field name + __ + field action.

Examples:

q = (S().query(title='taco trucks',
               description__text='awesome')
        .boost(title=4.0, description__text=2.0))

If the key is a field name, then the boost will apply to all query bits that have that field name. For example:

q = (S().query(title='trucks',
               title__prefix='trucks',
               title__fuzzy='trucks')
        .boost(title=4.0))

applies a 4.0 boost to all three query bits because all three query bits are for the title field name.

If the key is a field name and field action, then the boost will apply only to that field name and field action. For example:

q = (S().query(title='trucks',
               title__prefix='trucks',
               title__fuzzy='trucks')
        .boost(title__prefix=4.0))

will only apply the 4.0 boost to title__prefix.

Boosts are relative to one another and all boosts default to 1.0.

For example, if you had:

qs = (S().boost(title=4.0, summary=2.0)
         .query(title__text=value,
                summary__text=value,
                content__text=value,
                should=True))

title__text would be boosted twice as much as summary__text and summary__text twice as much as content__text.

demote(amount_, *queries, **kw)

Returns a new S instance with boosting query and demotion.

You can demote documents that match query criteria:

q = (S().query(title='trucks')
        .demote(0.5, description__text='gross'))

q = (S().query(title='trucks')
        .demote(0.5, Q(description__text='gross')))

This is implemented using the boosting query in Elasticsearch. Anything you specify with .query() goes into the positive section. The negative query and negative boost portions are specified as the first and second arguments to .demote().

Note

Calling this again will overwrite previous .demote() calls.

facet(*args, **kw)

Return a new S instance with facet args combined with existing set.

facet_raw(**kw)

Return a new S instance with raw facet args combined with existing set.

highlight(*fields, **kwargs)

Set highlight/excerpting with specified options.

Parameters:fields – The list of fields to highlight. If the field is None, then the highlight is cleared.

Additional keyword options:

  • pre_tags – List of tags before highlighted portion
  • post_tags – List of tags after highlighted portion

Results will have a _highlight property which contains the highlighted field excerpts.

For example:

q = (S().query(title__text='crash', content__text='crash')
        .highlight('title', 'content'))

for result in q:
    print result._highlight['title']
    print result._highlight['content']

If you pass in None, it will clear the highlight.

For example, this search won’t highlight anything:

q = (S().query(title__text='crash')
        .highlight('title')          # highlights 'title' field
        .highlight(None))            # clears highlight

Note

Calling this again will overwrite previous .highlight() calls.

Note

Make sure the fields you’re highlighting are indexed correctly. Read the Elasticsearch documentation for details.

values_list(*fields)

Return a new S instance that returns ListSearchResults.

Parameters:fields

the list of fields to have in the results.

With no arguments, returns a list of tuples of all the data for that document.

With arguments, returns a list of tuples where the fields in the tuple are in the order specified.

For example:

>>> list(S().values_list())
[(1, 'fred', 40), (2, 'brian', 30), (3, 'james', 45)]
>>> list(S().values_list('id', 'name'))
[(1, 'fred'), (2, 'brian'), (3, 'james')]
>>> list(S().values_list('name', 'id'))
[('fred', 1), ('brian', 2), ('james', 3)]

Note

If you don’t specify fields, the data comes back in an arbitrary order. It’s probably best to specify fields or use values_dict.

values_dict(*fields)

Return a new S instance that returns DictSearchResults.

Parameters:fields

the list of fields to have in the results.

With no arguments, this returns a list of dicts with all the fields.

With arguments, it returns a list of dicts with the specified fields.

For example:

>>> list(S().values_dict())
[{'id': 1, 'name': 'fred', 'age': 40}, ...]
>>> list(S().values_dict('id', 'name'))
[{'id': 1, 'name': 'fred'}, ...]
es(**settings)

Return a new S with specified ElasticSearch settings.

This allows you to configure the ElasticSearch object that gets used to execute the search.

Parameters:settings – the settings you’d use to build the ElasticSearch—same as what you’d pass to get_es().
indexes(*indexes)

Return a new S instance that will search specified indexes.

doctypes(*doctypes)

Return a new S instance that will search specified doctypes.

Note

Elasticsearch calls these “mapping types”. It’s the name associated with a mapping.

explain(value=True)

Return a new S instance with explain set.

Methods to override if you need different behavior

get_es(default_builder=<function get_es at 0x2412410>)

Returns the ElasticSearch object to use.

Parameters:default_builder – The function that takes a bunch of arguments and generates a pyelasticsearch ElasticSearch object.

Note

If you desire special behavior regarding building the ElasticSearch object for this S, subclass S and override this method.

get_indexes(default_indexes=None)

Returns the list of indexes to act on.

get_doctypes(default_doctypes=None)

Returns the list of doctypes to use.

to_python(obj)

Converts strings in a data structure to Python types

It converts datetime-ish things to Python datetimes.

Override if you want something different.

Parameters:obj – Python datastructure
Returns:Python datastructure with strings converted to Python types

Note

This does the conversion in-place!

Methods that force evaluation

__iter__()

Executes search and returns an iterator of results.

Returns:iterator of results

For example:

>>> s = S().query(name__prefix='Jimmy')
>>> for obj in s.execute():
...     print obj['id']
...
__len__()

Executes search and returns the number of results you’d get.

Executes search and returns number of results as an integer.

Returns:integer

For example:

>>> s = S().query(name__prefix='Jimmy')
>>> count = len(s)
>>> results = s().execute()
>>> count = len(results)
True

Note

This is very different than calling .count(). If you call .count() you get the total number of results that Elasticsearch thinks matches your search. If you call len(s), then you get the number of results you’d get if you executed the search. This factors in slices and default from and size values.

all()

Executes search and returns ALL search results.

Returns:SearchResults instance

For example:

>>> s = S().query(name__prefix='Jimmy')
>>> all_results = s.all()

Warning

This returns ALL search results. The way it does this is by calling .count() first to figure out how many to return, then by slicing by that size and returning a list of ALL search results.

Don’t use this if you’ve got 1000s of results!

count()

Executes search and returns number of results as an integer.

Returns:integer

For example:

>>> s = S().query(name__prefix='Jimmy')
>>> count = s.count()
execute()

Executes search and returns a SearchResults object.

Returns:SearchResults instance

For example:

>>> s = S().query(name__prefix='Jimmy')
>>> results = s.execute()
facet_counts()

Executes search and returns facet counts.

Example:

>>> s = S().query(name__prefix='Jimmy')
>>> facet_counts = s.facet_counts()

The F class

class elasticutils.F(**filters)

Filter objects.

Makes it easier to create filters cumulatively using & (and), | (or) and ~ (not) operations.

For example:

f = F()
f &= F(price='Free')
f |= F(style='Mexican')

creates a filter “price = ‘Free’ or style = ‘Mexican’”.

The Q class

class elasticutils.Q(**queries)

Query objects.

Makes it easier to create queries cumulatively.

If there’s more than one query part, they’re combined under a BooleanQuery. By default, they’re combined in the must clause.

You can combine two Q classes using the + operator. For example:

q = Q()
q += Q(title__text='shoes')
q += Q(summary__text='shoes')

creates a BooleanQuery with two must clauses.

Example 2:

q = Q()
q += Q(title__text='shoes', should=True)
q += Q(summary__text='shoes')
q += Q(description__text='shoes', must=True)

creates a BooleanQuery with one should clause (title) and two must clauses (summary and description).

The SearchResults class

class elasticutils.SearchResults(type, response, results, fields)

After executing a search, this is the class that manages the results.

Property type:the mapping type of the S that created this SearchResults instance
Property took:the amount of time the search took
Property count:the total results
Property response:
 the raw Elasticsearch search response
Property results:
 the search results from the response if any
Property fields:
 the list of fields specified by values_list or values_dict

When you iterate over this object, it returns the individual search results in the shape you asked for (object, tuple, dict, etc) in the order returned by Elasticsearch.

Example:

s = S().query(bio__text='archaeologist')
results = s.execute()

# Shows how long the search took
print results.took

# Shows the raw Elasticsearch response
print results.results

The MappingType class

class elasticutils.MappingType

Base class for mapping types.

To extend this class:

  1. implement get_index.
  2. implement get_mapping_type_name.
  3. if this ties back to a model, implement get_model and possibly also get_object.

For example:

class ContactType(MappingType):
    @classmethod
    def get_index(cls):
        return 'contacts_index'

    @classmethod
    def get_mapping_type_name(cls):
        return 'contact_type'

    @classmethod
    def get_model(cls):
        return ContactModel

    def get_object(self):
        return self.get_model().get(id=self._id)
classmethod from_results(results_dict)
get_object()

Returns the model instance

This gets called when someone uses the .object attribute which triggers lazy-loading of the object this document is based on.

By default, this calls:

self.get_model().get(id=self._id)

where self._id is the Elasticsearch document id.

Override it to do something different.

classmethod get_index()

Returns the index to use for this mapping type.

You can specify the index to use for this mapping type. This affects S built with this type.

By default, raises NotImplementedError.

Override this to return the index this mapping type should be indexed and searched in.

classmethod get_mapping_type_name()

Returns the mapping type name.

You can specify the mapping type name (also sometimes called the document type) with this method.

By default, raises NotImplementedError.

Override this to return the mapping type name.

classmethod get_model()

Return the model class related to this MappingType.

This can be any class that has an instance related to this Mappingtype by id.

By default, raises NoModelError.

Override this to return a class that works with .get_object() to return the instance of the model that is related to this document.

The Indexable class

class elasticutils.Indexable

Mixin for mapping types with all the indexing hoo-hah.

Add this mixin to your DjangoMappingType subclass and it gives you super indexing power.

classmethod bulk_index(documents, id_field='id', es=None, index=None)

Adds or updates a batch of documents.

Parameters:
  • documents

    List of Python dicts representing individual documents to be added to the index

    Note

    This must be serializable into JSON.

  • id_field – The name of the field to use as the document id. This defaults to ‘id’.
  • es – The ElasticSearch to use. If you don’t specify an ElasticSearch, it’ll use cls.get_es().
  • index – The name of the index to use. If you don’t specify one it’ll use cls.get_index().

Note

If you need the documents available for searches immediately, make sure to refresh the index by calling refresh_index().

classmethod extract_document(obj_id, obj=None)

Extracts the Elasticsearch index document for this instance

This must be implemented.

Note

The resulting dict must be JSON serializable.

Parameters:
  • obj_id – the object id for the object to extract from
  • obj – if this is not None, use this as the object to extract from; this allows you to fetch a bunch of items at once and extract them one at a time
Returns:

dict of key/value pairs representing the document

classmethod get_es()

Returns an ElasticSearch object

Override this if you need special functionality.

Returns:a pyelasticsearch ElasticSearch instance
classmethod get_indexable()

Returns an iterable of things to index.

Returns:iterable of things to index
classmethod get_mapping()

Returns the mapping for this mapping type.

Example:

@classmethod
def get_mapping(cls):
    return {
        'properties': {
            'id': {'type': 'integer'},
            'name': {'type': 'string'}
        }
    }

See the docs for more details on how to specify a mapping.

Override this to return a mapping for this doctype.

Returns:dict representing the Elasticsearch mapping or None if you want Elasticsearch to infer it. defaults to None.
classmethod index(document, id_=None, overwrite_existing=True, es=None, index=None)

Adds or updates a document to the index

Parameters:
  • document

    Python dict of key/value pairs representing the document

    Note

    This must be serializable into JSON.

  • id

    the id of the document

    Note

    If you don’t provide an id_, then Elasticsearch will make up an id for your document and it’ll look like a character name from a Lovecraft novel.

  • overwrite_existing – if True overwrites existing documents of the same ID and doctype
  • es – The ElasticSearch to use. If you don’t specify an ElasticSearch, it’ll use cls.get_es().
  • index – The name of the index to use. If you don’t specify one it’ll use cls.get_index().

Note

If you need the documents available for searches immediately, make sure to refresh the index by calling refresh_index().

classmethod refresh_index(es=None, index=None)

Refreshes the index.

Elasticsearch will update the index periodically automatically. If you need to see the documents you just indexed in your search results right now, you should call refresh_index as soon as you’re done indexing. This is particularly helpful for unit tests.

Parameters:
  • es – The ElasticSearch to use. If you don’t specify an ElasticSearch, it’ll use cls.get_es().
  • index – The name of the index to use. If you don’t specify one it’ll use cls.get_index().
classmethod unindex(id_, es=None, index=None)

Removes a particular item from the search index.

Parameters:
  • id – The Elasticsearch id for the document to remove from the index.
  • es – The ElasticSearch to use. If you don’t specify an ElasticSearch, it’ll use cls.get_es().
  • index – The name of the index to use. If you don’t specify one it’ll use cls.get_index().

The DefaultMappingType class

class elasticutils.DefaultMappingType

This is the default mapping type for S.

The MLT class

class elasticutils.MLT(id_, s=None, mlt_fields=None, index=None, doctype=None, es=None, **query_params)

Represents a lazy Elasticsearch More Like This API request.

This is lazy in the sense that it doesn’t evaluate and execute the Elasticsearch request unless you force it to by iterating over it or getting the length of the search results.

For example:

>>> mlt = MLT(2034, index='addons_index', doctype='addon')
>>> num_related_documents = len(mlt)
>>> num_related_documents = list(mlt)
__init__(id_, s=None, mlt_fields=None, index=None, doctype=None, es=None, **query_params)

When the MLT is evaluated, it generates a list of dict results.

Parameters:
  • id – The id of the document we want to find more like.
  • s – An instance of an S. Allows you to pass in a query which will be used as the body of the more-like-this request.
  • mlt_fields – A list of fields to look at for more like this.
  • index – The index to use. Falls back to the first index listed in s.get_indexes().
  • doctype – The doctype to use. Falls back to the first doctype listed in s.get_doctypes().
  • es – The ElasticSearch object to use. If you don’t provide one, then it will create one for you.
  • query_params – Any additional query parameters for the more like this call.

Note

You must specify either an s or the index and doctype arguments. Omitting them will result in a ValueError.

to_python(obj)

Converts strings in a data structure to Python types

It converts datetime-ish things to Python datetimes.

Override if you want something different.

Parameters:obj – Python datastructure
Returns:Python datastructure with strings converted to Python types

Note

This does the conversion in-place!

get_es()

Returns an ElasticSearch.

  • If there’s an s, then it returns that ElasticSearch.
  • If the es was provided in the constructor, then it returns that ElasticSearch.
  • Otherwise, it creates a new ElasticSearch and returns that.

Override this if that behavior isn’t correct for you.

raw()

Build query and passes to ElasticSearch, then returns the raw format returned.