elasticutils.contrib.django: Using with Django

Summary

Django helpers are all located in elasticutils.contrib.django.

This chapter covers using ElasticUtils Django bits.

Configuration

ElasticUtils depends on the following settings in your Django settings file:

django.conf.settings.ES_DISABLED

If ES_DISABLED = True, then Any method wrapped with es_required will return and log a warning. This is useful while developing, so you don’t have to have ElasticSearch running.

django.conf.settings.ES_DUMP_CURL

If set to a file path all the requests that ElasticUtils makes will be dumped into the designated file.

If set to a class instance, calls the .write() method with the curl equivalents.

See Debugging for more details.

django.conf.settings.ES_HOSTS

This is a list of ES hosts. In development this will look like:

ES_HOSTS = ['127.0.0.1:9200']
django.conf.settings.ES_INDEXES

This is a mapping of doctypes to indexes. A default mapping is required for types that don’t have a specific index.

When ElasticUtils queries the index for a model, by default it derives the doctype from Model._meta.db_table. When you build your indexes and mapping types, make sure to match the indexes and mapping types you’re using.

Example 1:

ES_INDEXES = {'default': 'main_index'}

This only has a default, so all ElasticUtils queries will look in main_index for all mapping types.

Example 2:

ES_INDEXES = {'default': 'main_index',
              'splugs': 'splugs_index'}

Assuming you have a Splug model which has a Splug._meta.db_table value of splugs, then ElasticUtils will run queries for Splug in the splugs_index. ElasticUtils will run queries for other models in main_index because that’s the default.

Example 3:

ES_INDEXES = {'default': ['main_index'],
              'splugs': ['splugs_index']}

FIXME: The API allows for this. Pretty sure it should query multiple indexes, but we have no tests for that and I haven’t tested it, either.

django.conf.settings.ES_TIMEOUT

Defines the timeout for the ES connection. This defaults to 5 seconds.

ES

The get_es() in the Django contrib will helpfully cache your ES objects thread-local.

It is built with the settings from your django.conf.settings.

Note

get_es() only caches the ES if you don’t pass in any override arguments. If you pass in override arguments, it doesn’t cache it and instead creates a new one.

Using with Django ORM models

Requirements:Django

The elasticutils.contrib.django.S class takes a MappingType in the constructor. That allows you to tie Django ORM models to ElasticSearch index search results.

In elasticutils.contrib.django.models is DjangoMappingType which has some additional Django ORM-specific code in it to make it easier.

Define a DjangoMappingType subclass for your model. The minimal you need to define is get_model.

Further, you can use the Indexable mixin to get a bunch of helpful indexing-related code.

For example, here’s a minimal DjangoMappingType subclass:

from django.models import Model
from elasticutils.contrib.django.models import DjangoMappingType


class MyModel(Model):
    ...


class MyMappingType(DjangoMappingType):
    @classmethod
    def get_model(cls):
        return MyModel

searcher = MyMappingType.search()

Here’s one that uses Indexable and handles indexing:

from django.models import Model
from elasticutils.contrib.django.models import DjangoMappingType


class MyModel(Model):
    ...


class MyMappingType(DjangoMappingType, Indexable):
    @classmethod
    def get_model(cls):
        return MyModel

    @classmethod
    def extract_document(cls, obj_id, obj=None):
        if obj is None:
            obj = cls.get_model().get(pk=obj_id)

        return {
            'id': obj.id,
            'name': obj.name,
            'bio': obj.bio,
            'age': obj.age
            }


searcher = MyMappingType.search()

This example doesn’t specify a mapping. That’s ok because ElasticSearch will infer from the shape of the data how it should analyze and store the data.

If you want to specify this explicitly (and I suggest you do for anything that involves strings), then you want to additionally override .get_mapping(). Let’s refine the above example by explicitly specifying .get_mapping().

from django.models import Model
from elasticutils.contrib.django.models import DjangoMappingType


class MyModel(Model):
    ...


class MyMappingType(DjangoMappingType, Indexable):
    @classmethod
    def get_model(cls):
        return MyModel

    @classmethod
    def get_mapping(cls):
        """Returns an ElasticSearch mapping."""
        return {
            # The id is an integer, so store it as such. ES would have
            # inferred this just fine.
            'id': {'type': 'integer'},

            # The name is a name---so we shouldn't analyze it
            # (de-stem, tokenize, parse, etc).
            'name': {'type': 'string', 'index': 'not_analyzed'},

            # The bio has free-form text in it, so analyze it with
            # snowball.
            'bio': {'type': 'string', 'analyzer': 'snowball'},

            # Age is an integer
            'age': {'type': 'integer'}
            }

    @classmethod
    def extract_document(cls, obj_id, obj=None):
        if obj is None:
            obj = cls.get_model().get(pk=obj_id)

        return {
            'id': obj.id,
            'name': obj.name,
            'bio': obj.bio,
            'age': obj.age
            }


searcher = MyMappingType.search()

DjangoMappingType

class elasticutils.contrib.django.models.DjangoMappingType

This has most of the pieces you need to tie back to a Django ORM model.

Subclass this and override at least get_model.

classmethod get_index()

Gets the index for this model.

The index for this model is specified in settings.ES_INDEXES which is a dict of mapping type -> index name.

By default, this uses .get_mapping_type() to determine the mapping and returns the value in settings.ES_INDEXES for that or settings.ES_INDEXES['default'].

Override this to compute it differently.

Returns:index name to use
classmethod get_mapping_type_name()

Returns the name of the mapping.

By default, this is cls.get_model()._meta.db_table.

Override this if you want to compute the mapping type name differently.

Returns:mapping type string
classmethod get_model()

Return the model related to this DjangoMappingType.

This can be any class that has an instance related to this DjangoMappingtype by id.

Override this to return a model class.

Returns:model class
classmethod search()

Returns a typed S for this class.

Returns:an S

Indexable

class elasticutils.contrib.django.models.Indexable

Mixin for mapping types with all the indexing hoo-hah.

Add this mixin to your DjangoMappingType subclass and it gives you super indexing power.

classmethod extract_document(obj_id, obj=None)

Extracts the ES index document for this instance

This must be implemented.

Note

The resulting dict must be JSON serializable.

Parameters:
  • obj_id – the object id for the instance to extract from
  • obj – if this is not None, use this as the object to extract from; this allows you to fetch a bunch of items at once and extract them one at a time
Returns:

dict of key/value pairs representing the document

classmethod get_indexable()

Returns the queryset of ids of all things to be indexed.

Defaults to:

cls.get_model().objects.order_by('id').values_list('id', flat=True)
Returns:iterable of ids of objects to be indexed
classmethod get_mapping()

Returns the mapping for this mapping type.

See the docs for details on how to specify a mapping.

Override this to return a mapping for this doctype.

Returns:dict representing the ES mapping or None if you want ES to infer it. defaults to None.
classmethod index(document, id_=None, bulk=False, force_insert=False, es=None)

Adds or updates a document to the index

Parameters:
  • document

    Python dict of key/value pairs representing the document

    Note

    This must be serializable into JSON.

  • id

    the Django ORM model instance id—this is used to convert an ES search result back to the Django ORM model instance from the db. It should be an integer.

    Note

    If you don’t provide an id_, then ElasticSearch will make up an id for your document and it’ll look like a character name from a Lovecraft novel.

  • bulk – Whether or not this is part of a bulk indexing. If this is, you must provide an ES with the es argument, too.
  • force_insert – TODO
  • es – The ES to use. If you don’t specify an ES, it’ll use elasticutils.contrib.django.get_es().
Raises ValueError:
 

if bulk is True, but es is None.

Note

After you add things to the index, make sure to refresh the index by calling refresh_index()—it doesn’t happen automatically.

TODO: add example.

classmethod refresh_index(timesleep=0, es=None)

Refreshes the index.

TODO: document this better.

classmethod unindex(id_, es=None)

Removes a particular item from the search index.

TODO: document this better.

See also

http://www.elasticsearch.org/guide/reference/mapping/
The ElasticSearch guide on mapping types.
http://www.elasticsearch.org/guide/reference/mapping/core-types.html
The ElasticSearch guide on mapping type field types.

Other helpers

Requirements:Django, Celery

You can then utilize things such as elasticutils.contrib.django.tasks.index_objects() to automatically index all new items.

View decorators

elasticutils.contrib.django.es_required(fun)

Wrap a callable and return None if ES_DISABLED is False.

This also adds an additional es argument to the callable giving you an ES to use.

elasticutils.contrib.django.es_required_or_50x(disabled_template='elasticutils/501.html', error_template='elasticutils/503.html')

Wrap a Django view and handle ElasticSearch errors.

This wraps a Django view and returns 501 or 503 status codes and pages if things go awry.

HTTP 501
Returned when ES_DISABLED is True.
HTTP 503

Returned when any of the following exceptions are thrown:

  • pyes.urllib3.MaxRetryError: Connection problems with ES.
  • pyes.exceptions.IndexMissingException: When the index is missing.
  • pyes.exceptions.ElasticSearchException: Various other ElasticSearch related errors.

Template variables:

  • error: A string version of the exception thrown.
Parameters:
  • disabled_template

    The template to use when ES_DISABLED is True.

    Defaults to elasticutils/501.html.

  • error_template

    The template to use when ElasticSearch isn’t working properly, is missing an index, or something along those lines.

    Defaults to elasticutils/503.html.

Examples:

# This creates a home_view and decorates it to use the
# default templates.

@es_required_or_50x()
def home_view(request):
    ...


# This creates a search_view and overrides the templates

@es_required_or_50x(disabled_template='search/es_disabled.html',
                    error_template('search/es_down.html')
def search_view(request):
    ...

Tasks

elasticutils.contrib.django.tasks.index_objects(model, ids=[...])

Models can asynchronously update their ES index.

If a model extends SearchMixin, it can add a post_save hook like so:

@receiver(dbsignals.post_save, sender=MyModel)
def update_search_index(sender, instance, **kw):
    from elasticutils import tasks
    tasks.index_objects.delay(sender, [instance.id])

Cron

elasticutils.contrib.django.cron.reindex_objects(model, chunk_size[=150])

Creates methods that reindex all the objects in a model.

For example in your myapp.cron.py you can do:

index_all_mymodels = cronjobs.register(reindex_objects(mymodel))

and it will create a commandline callable task for you, e.g.:

./manage.py cron index_all_mymodels

Writing tests

Requirements:Django, test_utils, nose

In elasticutils.contrib.django.estestcase, is ESTestCase which can be subclassed in your app’s test cases.

It does the following:

  • If ES_HOSTS is empty it raises a SkipTest.
  • self.es is available from the ESTestCase class and any subclasses.
  • At the end of the test case the index is wiped.

Example:

from elasticutils.djangolib import ESTestCase


class TestQueries(ESTestCase):
    def test_query(self):
        ...

    def test_locked_filters(self):
        ...

Debugging

You can set the settings.ES_DUMP_CURL to a few different things all of which can be helpful in debugging ElasticUtils.

  1. a file path

    This will cause PyES to write the curl equivalents of the commands it’s sending to ElasticSearch to a file.

    Example setting:

    ES_DUMP_CURL = '/var/log/es_curl.log'
    

    Note

    The file is not closed until the process ends. Because of that, you don’t see much in the file until it’s done.

  2. a class instance that has a .write() method

    PyES will call the .write() method with the curl equivalent and then you can do whatever you want with it.

    For example, this writes curl equivalent output to stdout:

    class CurlDumper(object):
        def write(self, s):
            print s
    ES_DUMP_CURL = CurlDumper()
    

Project Versions

Table Of Contents

Previous topic

Debugging

Next topic

Join this project!

This Page