Django helpers are all located in elasticutils.contrib.django.
This chapter covers using ElasticUtils Django bits.
ElasticUtils depends on the following settings in your Django settings file:
If ES_DISABLED = True, then Any method wrapped with es_required will return and log a warning. This is useful while developing, so you don’t have to have ElasticSearch running.
If set to a file path all the requests that ElasticUtils makes will be dumped into the designated file.
If set to a class instance, calls the .write() method with the curl equivalents.
See Debugging for more details.
This is a list of ES hosts. In development this will look like:
ES_HOSTS = ['127.0.0.1:9200']
This is a mapping of doctypes to indexes. A default mapping is required for types that don’t have a specific index.
When ElasticUtils queries the index for a model, by default it derives the doctype from Model._meta.db_table. When you build your indexes and mapping types, make sure to match the indexes and mapping types you’re using.
Example 1:
ES_INDEXES = {'default': 'main_index'}
This only has a default, so all ElasticUtils queries will look in main_index for all mapping types.
Example 2:
ES_INDEXES = {'default': 'main_index',
'splugs': 'splugs_index'}
Assuming you have a Splug model which has a Splug._meta.db_table value of splugs, then ElasticUtils will run queries for Splug in the splugs_index. ElasticUtils will run queries for other models in main_index because that’s the default.
Defines the timeout for the ES connection. This defaults to 5 seconds.
The get_es() in the Django contrib will helpfully cache your ES objects thread-local.
It is built with the settings from your django.conf.settings.
Note
get_es() only caches the ES if you don’t pass in any override arguments. If you pass in override arguments, it doesn’t cache it and instead creates a new one.
Requirements: | Django |
---|
The elasticutils.contrib.django.S class takes a model in the constructor. That model is a Django ORM model class. For example:
from elasticutils.contrib.django import S
from myapp.models import MyModel
searcher = S(MyModel)
Further, you can have your model extend elasticutils.contrib.django.models.SearchMixin and get a bunch of functionality that makes indexing data easier.
Two things to know:
For example, here’s a minimal use of the SearchMixin:
from django.db import models
from elasticutils.contrib.django import SearchMixin
class Contact(models.Model, SearchMixin):
name = models.CharField(max_length=50)
bio = models.TextField(blank=True)
age = models.IntegerField()
website = models.URLField(blank=True)
last_udpated = models.DateTimeField(default=datetime.now)
@classmethod
def extract_document(cls, obj_id, obj=None):
"""Takes an object id for this class, returns dict."""
if obj is None:
obj = cls.objects.get(pk=obj_id)
return {
'id': obj.id,
'name': obj.name,
'bio': obj.bio,
'age': obj.age,
'website': obj.website,
'last_updated': obj.last_updated
}
This example doesn’t specify a mapping. That’s ok because ElasticSearch will infer from the shape of the data how it should analyze and store the data.
If you want to specify this explicitly (and I suggest you do for anything that involves strings), then you want to additionally override .get_mapping(). Let’s refine the above example by explicitly specifying .get_mapping().
from django.db import models
from elasticutils.contrib.django import SearchMixin
class Contact(models.Model, SearchMixin):
name = models.CharField(max_length=50)
bio = models.TextField(blank=True)
age = models.IntegerField()
website = models.URLField(blank=True)
last_udpated = models.DateTimeField(default=datetime.now)
@classmethod
def get_mapping(cls):
"""Returns an ElasticSearch mapping."""
return {
# The id is an integer, so store it as such. ES would have
# inferred this just fine.
'id': {'type': 'integer'},
# The name is a name---so we shouldn't analyze it
# (de-stem, tokenize, parse, etc).
'name': {'type': 'string', 'index': 'not_analyzed'},
# The bio has free-form text in it, so analyze it with
# snowball.
'bio': {'type': 'string', 'analyzer': 'snowball'},
# The website also shouldn't be analyzed.
'website': {'type': 'string', 'index': 'not_analyzed'},
# The last_updated field is a date.
'last_updated': {'type': 'date'}
}
@classmethod
def extract_document(cls, obj_id, obj=None):
"""Takes an object id for this class, returns dict."""
if obj is None:
obj = cls.objects.get(pk=obj_id)
return {
'id': obj.id,
'name': obj.name,
'bio': obj.bio,
'age': obj.age,
'website': obj.website,
'last_updated': obj.last_updated
}
Mixin for indexing Django model instances
Add this mixin to your Django ORM model class and it gives you super indexing power. This correlates an ES mapping type to a Django ORM model. Using this allows you to get Django model instances as ES search results.
Extracts the ES index document for this instance
This must be implemented.
Note
The resulting dict must be JSON serializable.
Parameters: |
|
---|---|
Returns: | dict of key/value pairs representing the document |
Gets the index for this model.
The index for this model is specified in settings.ES_INDEXES which is a dict of mapping type -> index name.
By default, this uses .get_mapping_type() to determine the mapping and returns the value in settings.ES_INDEXES for that or settings.ES_INDEXES['default'].
Override this to compute it differently.
Returns: | index name to use |
---|
Returns the queryset of ids of all things to be indexed.
Defaults to:
cls.objects.order_by('id').values_list('id', flat=True)
Returns: | iterable of ids of objects to be indexed |
---|
Returns the mapping for this mapping type.
See the docs for details on how to specify a mapping.
Override this to return a mapping for this doctype.
Returns: | dict representing the ES mapping or None if you want ES to infer it. defaults to None. |
---|
Returns the name of the mapping.
By default, this is cls._meta.db_table.
Override this if you want to compute the mapping type name differently.
Returns: | mapping type string |
---|
Adds or updates a document to the index
Parameters: |
|
---|---|
Raises ValueError: | |
if bulk is True, but es is None. |
TODO: add example.
Refreshes the index.
TODO: document this better.
Returns a typed S for this class.
Removes a particular item from the search index.
TODO: document this better.
See also
Requirements: | Django, Celery |
---|
You can then utilize things such as elasticutils.contrib.django.tasks.index_objects() to automatically index all new items.
Models can asynchronously update their ES index.
If a model extends SearchMixin, it can add a post_save hook like so:
@receiver(dbsignals.post_save, sender=MyModel)
def update_search_index(sender, instance, **kw):
from elasticutils import tasks
tasks.index_objects.delay(sender, [instance.id])
Creates methods that reindex all the objects in a model.
For example in your myapp.cron.py you can do:
index_all_mymodels = cronjobs.register(reindex_objects(mymodel))
and it will create a commandline callable task for you, e.g.:
./manage.py cron index_all_mymodels
Requirements: | Django, test_utils, nose |
---|
In elasticutils.contrib.django.estestcase, is ESTestCase which can be subclassed in your app’s test cases.
It does the following:
Example:
from elasticutils.djangolib import ESTestCase
class TestQueries(ESTestCase):
def test_query(self):
...
def test_locked_filters(self):
...
You can set the settings.ES_DUMP_CURL to a few different things all of which can be helpful in debugging ElasticUtils.
a file path
This will cause PyES to write the curl equivalents of the commands it’s sending to ElasticSearch to a file.
Example setting:
ES_DUMP_CURL = '/var/log/es_curl.log'
Note
The file is not closed until the process ends. Because of that, you don’t see much in the file until it’s done.
a class instance that has a .write() method
PyES will call the .write() method with the curl equivalent and then you can do whatever you want with it.
For example, this writes curl equivalent output to stdout:
class CurlDumper(object):
def write(self, s):
print s
ES_DUMP_CURL = CurlDumper()