elasticutils.contrib.django: Using with Django

Summary

Django helpers are all located in elasticutils.contrib.django.

This chapter covers using ElasticUtils Django bits.

Configuration

ElasticUtils depends on the following settings in your Django settings file:

django.conf.settings.ES_DISABLED

If ES_DISABLED = True, then Any method wrapped with es_required will return and log a warning. This is useful while developing, so you don’t have to have ElasticSearch running.

django.conf.settings.ES_DUMP_CURL

If set to a file path all the requests that ElasticUtils makes will be dumped into the designated file.

If set to a class instance, calls the .write() method with the curl equivalents.

See Debugging for more details.

django.conf.settings.ES_HOSTS

This is a list of ES hosts. In development this will look like:

ES_HOSTS = ['127.0.0.1:9200']
django.conf.settings.ES_INDEXES

This is a mapping of doctypes to indexes. A default mapping is required for types that don’t have a specific index.

When ElasticUtils queries the index for a model, by default it derives the doctype from Model._meta.db_table. When you build your indexes and mapping types, make sure to match the indexes and mapping types you’re using.

Example 1:

ES_INDEXES = {'default': 'main_index'}

This only has a default, so all ElasticUtils queries will look in main_index for all mapping types.

Example 2:

ES_INDEXES = {'default': 'main_index',
              'splugs': 'splugs_index'}

Assuming you have a Splug model which has a Splug._meta.db_table value of splugs, then ElasticUtils will run queries for Splug in the splugs_index. ElasticUtils will run queries for other models in main_index because that’s the default.

django.conf.settings.ES_TIMEOUT

Defines the timeout for the ES connection. This defaults to 5 seconds.

ES

The get_es() in the Django contrib will helpfully cache your ES objects thread-local.

It is built with the settings from your django.conf.settings.

Note

get_es() only caches the ES if you don’t pass in any override arguments. If you pass in override arguments, it doesn’t cache it and instead creates a new one.

Using with Django ORM models

Requirements:Django

The elasticutils.contrib.django.S class takes a model in the constructor. That model is a Django ORM model class. For example:

from elasticutils.contrib.django import S
from myapp.models import MyModel

searcher = S(MyModel)

Further, you can have your model extend elasticutils.contrib.django.models.SearchMixin and get a bunch of functionality that makes indexing data easier.

Two things to know:

  1. The doctype for the model is cls._meta.db_table by default.
  2. The index that’s searched is settings.ES_INDEXES[doctype] and if that doesn’t exist, it defaults to settings.ES_INDEXES['default'] by default.

For example, here’s a minimal use of the SearchMixin:

from django.db import models

from elasticutils.contrib.django import SearchMixin


class Contact(models.Model, SearchMixin):
    name = models.CharField(max_length=50)
    bio = models.TextField(blank=True)
    age = models.IntegerField()
    website = models.URLField(blank=True)
    last_udpated = models.DateTimeField(default=datetime.now)

    @classmethod
    def extract_document(cls, obj_id, obj=None):
        """Takes an object id for this class, returns dict."""
        if obj is None:
            obj = cls.objects.get(pk=obj_id)

        return {
            'id': obj.id,
            'name': obj.name,
            'bio': obj.bio,
            'age': obj.age,
            'website': obj.website,
            'last_updated': obj.last_updated
            }

This example doesn’t specify a mapping. That’s ok because ElasticSearch will infer from the shape of the data how it should analyze and store the data.

If you want to specify this explicitly (and I suggest you do for anything that involves strings), then you want to additionally override .get_mapping(). Let’s refine the above example by explicitly specifying .get_mapping().

from django.db import models

from elasticutils.contrib.django import SearchMixin


class Contact(models.Model, SearchMixin):
    name = models.CharField(max_length=50)
    bio = models.TextField(blank=True)
    age = models.IntegerField()
    website = models.URLField(blank=True)
    last_udpated = models.DateTimeField(default=datetime.now)

    @classmethod
    def get_mapping(cls):
        """Returns an ElasticSearch mapping."""
        return {
            # The id is an integer, so store it as such. ES would have
            # inferred this just fine.
            'id': {'type': 'integer'},

            # The name is a name---so we shouldn't analyze it
            # (de-stem, tokenize, parse, etc).
            'name': {'type': 'string', 'index': 'not_analyzed'},

            # The bio has free-form text in it, so analyze it with
            # snowball.
            'bio': {'type': 'string', 'analyzer': 'snowball'},

            # The website also shouldn't be analyzed.
            'website': {'type': 'string', 'index': 'not_analyzed'},

            # The last_updated field is a date.
            'last_updated': {'type': 'date'}
            }

    @classmethod
    def extract_document(cls, obj_id, obj=None):
        """Takes an object id for this class, returns dict."""
        if obj is None:
            obj = cls.objects.get(pk=obj_id)

        return {
            'id': obj.id,
            'name': obj.name,
            'bio': obj.bio,
            'age': obj.age,
            'website': obj.website,
            'last_updated': obj.last_updated
            }

SearchMixin

class elasticutils.contrib.django.models.SearchMixin

Mixin for indexing Django model instances

Add this mixin to your Django ORM model class and it gives you super indexing power. This correlates an ES mapping type to a Django ORM model. Using this allows you to get Django model instances as ES search results.

classmethod extract_document(obj_id, obj=None)

Extracts the ES index document for this instance

This must be implemented.

Note

The resulting dict must be JSON serializable.

Parameters:
  • obj_id – the object id for the instance to extract from
  • obj – if this is not None, use this as the object to extract from; this allows you to fetch a bunch of items at once and extract them one at a time
Returns:

dict of key/value pairs representing the document

classmethod get_index()

Gets the index for this model.

The index for this model is specified in settings.ES_INDEXES which is a dict of mapping type -> index name.

By default, this uses .get_mapping_type() to determine the mapping and returns the value in settings.ES_INDEXES for that or settings.ES_INDEXES['default'].

Override this to compute it differently.

Returns:index name to use
classmethod get_indexable()

Returns the queryset of ids of all things to be indexed.

Defaults to:

cls.objects.order_by('id').values_list('id', flat=True)
Returns:iterable of ids of objects to be indexed
classmethod get_mapping()

Returns the mapping for this mapping type.

See the docs for details on how to specify a mapping.

Override this to return a mapping for this doctype.

Returns:dict representing the ES mapping or None if you want ES to infer it. defaults to None.
classmethod get_mapping_type()

Returns the name of the mapping.

By default, this is cls._meta.db_table.

Override this if you want to compute the mapping type name differently.

Returns:mapping type string
classmethod index(document, id_=None, bulk=False, force_insert=False, es=None)

Adds or updates a document to the index

Parameters:
  • document

    Python dict of key/value pairs representing the document

    Note

    This must be serializable into JSON.

  • id – the Django ORM model instance id—this is used to convert an ES search result back to the Django ORM model instance from the db. It should be an integer.
  • bulk – Whether or not this is part of a bulk indexing. If this is, you must provide an ES with the es argument, too.
  • force_insert – TODO
  • es – The ES to use. If you don’t specify an ES, it’ll use elasticutils.contrib.django.get_es().
Raises ValueError:
 

if bulk is True, but es is None.

TODO: add example.

classmethod refresh_index(timesleep=0, es=None)

Refreshes the index.

TODO: document this better.

classmethod search()

Returns a typed S for this class.

classmethod unindex(id, es=None)

Removes a particular item from the search index.

TODO: document this better.

See also

http://www.elasticsearch.org/guide/reference/mapping/
The ElasticSearch guide on mapping types.
http://www.elasticsearch.org/guide/reference/mapping/core-types.html
The ElasticSearch guide on mapping type field types.

Other helpers

Requirements:Django, Celery

You can then utilize things such as elasticutils.contrib.django.tasks.index_objects() to automatically index all new items.

Tasks

elasticutils.contrib.django.tasks.index_objects(model, ids=[...])

Models can asynchronously update their ES index.

If a model extends SearchMixin, it can add a post_save hook like so:

@receiver(dbsignals.post_save, sender=MyModel)
def update_search_index(sender, instance, **kw):
    from elasticutils import tasks
    tasks.index_objects.delay(sender, [instance.id])

Cron

elasticutils.contrib.django.cron.reindex_objects(model, chunk_size[=150])

Creates methods that reindex all the objects in a model.

For example in your myapp.cron.py you can do:

index_all_mymodels = cronjobs.register(reindex_objects(mymodel))

and it will create a commandline callable task for you, e.g.:

./manage.py cron index_all_mymodels

Writing tests

Requirements:Django, test_utils, nose

In elasticutils.contrib.django.estestcase, is ESTestCase which can be subclassed in your app’s test cases.

It does the following:

  • If ES_HOSTS is empty it raises a SkipTest.
  • self.es is available from the ESTestCase class and any subclasses.
  • At the end of the test case the index is wiped.

Example:

from elasticutils.djangolib import ESTestCase


class TestQueries(ESTestCase):
    def test_query(self):
        ...

    def test_locked_filters(self):
        ...

Debugging

You can set the settings.ES_DUMP_CURL to a few different things all of which can be helpful in debugging ElasticUtils.

  1. a file path

    This will cause PyES to write the curl equivalents of the commands it’s sending to ElasticSearch to a file.

    Example setting:

    ES_DUMP_CURL = '/var/log/es_curl.log'
    

    Note

    The file is not closed until the process ends. Because of that, you don’t see much in the file until it’s done.

  2. a class instance that has a .write() method

    PyES will call the .write() method with the curl equivalent and then you can do whatever you want with it.

    For example, this writes curl equivalent output to stdout:

    class CurlDumper(object):
        def write(self, s):
            print s
    ES_DUMP_CURL = CurlDumper()
    

Project Versions

Table Of Contents

Previous topic

Debugging

Next topic

Join this project!

This Page