Using ElasticUtils with Django¶
Summary¶
Django-specific code is all located in elasticutils.contrib.django.
This chapter covers using ElasticUtils Django bits. For API documentation, see Django API docs.
How to integrate ElasticUtils with Django¶
- add ElasticUtils configuration settings to your project’s setting file
- write one or more MappingType classes
- write code to create the Elasticsearch index and populate it with documents based on your MappingType subclasses
- use
elasticutils.contrib.django.S
to search and return results - use
elasticutils.contrib.django.estestcase.ESTestCase
to write tests
That’s the gist of it. You can deviate on any of these depending on your needs, of course.
Configuration¶
ElasticUtils depends on the following settings in your Django settings file:
-
django.conf.settings.
ES_DISABLED
¶ If ES_DISABLED = True, then Any method wrapped with es_required will return and log a warning. This is useful while developing, so you don’t have to have Elasticsearch running.
-
django.conf.settings.
ES_URLS
¶ This is a list of Elasticsearch urls. In development this will look like:
ES_URLS = ['http://localhost:9200']
-
django.conf.settings.
ES_INDEXES
¶ This is a mapping of doctypes to indexes. A default mapping is required for types that don’t have a specific index.
When ElasticUtils queries the index for a model, by default it derives the doctype from Model._meta.db_table. When you build your indexes and mapping types, make sure to match the indexes and mapping types you’re using.
Example 1:
ES_INDEXES = {'default': 'main_index'}
This only has a default, so all ElasticUtils queries will look in main_index for all mapping types.
Example 2:
ES_INDEXES = {'default': 'main_index', 'splugs': 'splugs_index'}
Assuming you have a Splug model which has a Splug._meta.db_table value of splugs, then ElasticUtils will run queries for Splug in the splugs_index. ElasticUtils will run queries for other models in main_index because that’s the default.
Example 3:
ES_INDEXES = {'default': ['main_index'], 'splugs': ['splugs_index']}
FIXME: The API allows for this. Pretty sure it should query multiple indexes, but we have no tests for that and I haven’t tested it, either.
-
django.conf.settings.
ES_TIMEOUT
¶ Default:
5
The timeout in seconds for creating the Elasticsearch connection.
Elasticsearch¶
The get_es() in the Django contrib will use Django settings listed above to build the elasticsearch-py Elasticsearch object.
Using with Django ORM models¶
Requirements: | Django |
---|
The elasticutils.contrib.django.S class takes a MappingType in the constructor. That allows you to tie Django ORM models to Elasticsearch index search results.
In elasticutils.contrib.django
is MappingType which
has some additional Django ORM-specific code in it to make it easier.
Define a MappingType subclass for your model. The minimal you need to define is get_model.
Further, you can use the Indexable mixin to get a bunch of helpful indexing-related code.
For example, here’s a minimal MappingType subclass:
from django.models import Model
from elasticutils.contrib.django import MappingType
class MyModel(Model):
# Django model ...
class MyMappingType(MappingType):
@classmethod
def get_model(cls):
return MyModel
searcher = MyMappingType.search()
Here’s one that uses Indexable and handles indexing:
from django.models import Model
from elasticutils.contrib.django import Indexable, MappingType
class MyModel(Model):
# Django model ...
class MyMappingType(MappingType, Indexable):
@classmethod
def get_model(cls):
"""Returns the Django model this MappingType relates to"""
return MyModel
@classmethod
def get_mapping(cls):
"""Returns an Elasticsearch mapping for this MappingType"""
return {
'properties': {
# The id is an integer, so store it as such. Elasticsearch
# would have inferred this just fine.
'id': {'type': 'integer'},
# The name is a name---so we shouldn't analyze it
# (de-stem, tokenize, parse, etc).
'name': {'type': 'string', 'index': 'not_analyzed'},
# The bio has free-form text in it, so analyze it with
# snowball.
'bio': {'type': 'string', 'analyzer': 'snowball'},
# Age is an integer
'age': {'type': 'integer'}
}
}
@classmethod
def extract_document(cls, obj_id, obj=None):
"""Converts this instance into an Elasticsearch document"""
if obj is None:
obj = cls.get_model().objects.get(pk=obj_id)
return {
'id': obj.id,
'name': obj.name,
'bio': obj.bio,
'age': obj.age
}
searcher = MyMappingType.search()
See also
- http://www.elasticsearch.org/guide/reference/mapping/
- The Elasticsearch guide on mapping types.
- http://www.elasticsearch.org/guide/reference/mapping/core-types.html
- The Elasticsearch guide on mapping type field types.
Celery tasks¶
Requirements: | Django, Celery |
---|
You can then utilize things such as
elasticutils.contrib.django.tasks.index_objects()
to
automatically index all new items.
Middleware¶
Requirements: | Django |
---|
There’s a middleware that catches all Elasticsearch-related
exceptions and shows a 501/503 template accordingly. See
elasticutils.contrib.django.ESExceptionMiddleware
for details.
Writing tests¶
Requirements: | Django |
---|
When writing test cases for your ElasticUtils-using code, you’ll want to do a few things:
- Default
ES_DISABLED
to True. This way, the tests that kick off creating data but aren’t testing search-specific things don’t additionally index stuff. That’ll save you a bunch of test time. - When testing ElasticUtils things, override the settings and set
ES_DISABLED
to False. - Use an
ESTestCase
that sets up the indexes before tests run and tears them down after they run. - When testing, make sure you use an index name that’s unique. You don’t want to run your tests and have them affect your production index.
You can use
elasticutils.contrib.django.estestcase.ESTestCase
for your app’s tests. It’s pretty basic but does all of the above
except item 1 which you’ll need to do in your test settings.
Example usage:
from elasticutils.contrib.django.estestcase import ESTestCase
class TestQueries(ESTestCase):
# This class holds tests that do elasticsearch things
def test_query(self):
# Test code ...
def test_locked_filters(self):
# Test code ...
ElasticUtils uses this for it’s Django tests. Look at the test code for more examples of usage:
https://github.com/mozilla/elasticutils/
If it’s not what you want, you could subclass it and override behavior or just write your own.
Helpful things to know¶
Indexing and reset_queries¶
If you are:
- indexing a lot of data pulled out with the Django ORM, and
- have
DEBUG = True
(i.e. development environments)
then you’ll probably want to call django.db.reset_queries()
periodically.
What’s going on is that when DEBUG = True
(i.e. a devleopment
environment), Django helpfully stores all the queries that are made
which when you’re indexing a lot of data is a lot of data. Calling
django.db.reset_queries()
periodically flushes the queries so
it doesn’t monotonically eat all your memory before the indexing
is done.