Indexing¶
Overview¶
ElasticUtils is primarily an API for searching. However, before you can search, you need to create an index and index your documents.
This chapter covers the indexing side of things. It does so lightly—for more details, read through the elasticsearch-py documentation and the Elasticsearch guide.
Getting an Elasticsearch object¶
ElasticUtils uses elasticsearch-py which comes with a handy Elasticsearch object. This lets you:
- create indexes
- create mappings
- apply settings
- check status
- etc.
To access this, you use elasticutils.get_es()
which creates
an Elasticsearch object for you.
See elasticutils.get_es()
for more details.
See also
- http://elasticsearch-py.readthedocs.org/en/latest/api.html#elasticsearch
- elasticsearch-py Elasticsearch documentation.
Indexes¶
An index is a collection of documents.
Before you do anything, you need to have an index. You can create one with .indices.create().
For example:
es = get_es()
es.indices.create(index='blog-index')
You can pass in settings, too. For example, you can set the refresh interval when creating the index:
es.indices.create(index='blog-index', body={'refresh_interval': '5s'})
See also
- http://elasticsearch-py.readthedocs.org/en/latest/api.html#elasticsearch.client.IndicesClient.create
- elasticsearch-py indices.create API documentation
- http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index/
- Elasticsearch create index API documentation
Types and Mappings¶
A type is a set of fields. A document is of a given type if it has those fields. Whenever you index a document, you specify which type the document is. This is sometimes called a “doctype”, “document type” or “doc type”.
A mapping is the definition of fields and how they should be indexed for a type. In ElasticUtils, we call a document type that has a defined mapping a “mapping type” mostly as a shorthand for “document type with a defined mapping” because that’s a mouthful.
Elasticsearch can infer mappings to some degree, but you get a lot more value by specifying mappings explicitly.
To define a mapping, you use .indices.put_mapping().
For example:
es = get_es()
es.indices.put_mapping(
index='blog-index',
doc_type='blog-entry-type',
body={
'blog-entry-type': {
'properties': {
'id': {'type': 'integer'},
'title': {'type': 'string'},
'content': {'type': 'string'},
'tags': {'type': 'string'},
'created': {'type': 'date'}
}
}
}
)
You can also define mappings when you create the index:
es = get_es()
es.indices.create(
index='blog-index',
body={
'mappings': {
'blog-entry-type': {
'properties': {
'id': {'type': 'integer'},
'title': {'type': 'string'},
'content': {'type': 'string'},
'tags': {'type': 'string'},
'created': {'type': 'date'}
}
}
}
}
)
Note
If there’s a possibility of a race condition between creating the index and defining the mapping and some document getting indexed, then it’s good to create the index and define the mappings at the same time.
See also
- http://elasticsearch-py.readthedocs.org/en/latest/api.html#elasticsearch.client.IndicesClient.put_mapping
- elasticsearch-py indices.put_mapping API documentation
- http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping/
- Elasticsearch put_mapping API documentation
- http://www.elasticsearch.org/guide/reference/mapping/
- Elasticsearch mapping documentation
Indexing documents¶
Use .index() to index a document.
For example:
es = get_es()
entry = {'id': 1,
'title': 'First post!',
'content': '<p>First post!</p>',
'tags': ['status', 'blog'],
'created': '20130423T16:50:22'
}
es.index(index='blog-index', doc_type='blog-entry-type', body=entry, id=1)
If you’re indexing a bunch of documents at the same time, you should use elasticsearch.helpers.bulk_index().
For example:
from elasticsearch.helpers import bulk_index
es = get_es()
entries = [{ '_id': 42, ... }, { '_id': 47, ... }]
bulk_index(es, entries, index='blog-index', doc_type='blog-entry-type')
See also
- http://elasticsearch-py.readthedocs.org/en/latest/api.html#elasticsearch.Elasticsearch.index
- elasticsearch-py index API documentation
- http://elasticsearch-py.readthedocs.org/en/latest/helpers.html#elasticsearch.helpers.bulk_index
- elasticsearch-py bulk_index API documentation
- http://www.elasticsearch.org/guide/reference/api/index_/
- Elasticsearch index API documentation
- http://www.elasticsearch.org/guide/reference/api/bulk/
- Elasticsearch bulk index API documentation
Deleting documents¶
You can delete documents with .delete().
For example:
es = get_es()
es.delete(index='blog-index', doc_type='blog-entry-type', id=1)
See also
- http://elasticsearch-py.readthedocs.org/en/latest/api.html#elasticsearch.Elasticsearch.delete
- elasticsearch-py delete API documentation
- http://www.elasticsearch.org/guide/reference/api/delete/
- Elasticsearch delete API documentation
Refreshing¶
After you index documents, they’re not available for searches until after the index is refreshed. By default, the index refreshes every second. If you need the documents to show up in searches before that, call indices.refresh().
For example:
es = get_es()
es.indices.refresh(index='blog-index')
See also
- http://elasticsearch-py.readthedocs.org/en/latest/api.html#elasticsearch.client.IndicesClient.refresh
- elasticsearch-py indices.refresh API documentation
- http://www.elasticsearch.org/guide/reference/api/admin-indices-refresh/
- Elasticsearch refresh API documentation
Delete indexes¶
You can delete indexes with .indices.delete().
For example:
es = get_es()
es.indices.delete(index='blog-index')
See also
- http://elasticsearch-py.readthedocs.org/en/latest/api.html#elasticsearch.client.IndicesClient.delete
- elasticsearch-py indices.delete API documentation
- http://www.elasticsearch.org/guide/reference/api/admin-indices-delete-index/
- Elasticsearch delete index API documentation
Doing all of this with MappingTypes and Indexables¶
If you’re using MappingTypes, then you can do much of the above using
methods and classmethods on MappingType
and
Indexable
classes. See Mapping types and Indexables for
more details.