Migrating from Elasticsearch 0.90 to 1.x with ElasticUtils¶
Note
This is a work in progress and probably doesn’t cover everything.
Summary¶
There are a bunch of API-breaking changes between Elasticsearch 0.90 and 1.x. Because of this, it’s really tricky to get over this hump without having downtime.
This document covers a high-level walk through for upgrading from Elasticsearch 0.90 to 1.x and the steps you should take to reduce your downtime.
Note
“1.x” covers 1.0, 1.1 and 1.2.
Steps¶
Each of these steps should result in a working system. Do them one at a time and test everything in between.
Upgrade to ElasticUtils 0.9.1
You must use elasticsearch-py version 0.4.5–don’t use a later version!
Upgrade your Elasticsearch cluster to version 0.90.13
Upgrade to ElasticUtils 0.10.1
You will need to update elasticsearch-py past 0.4.5. The latest version should work fine.
Make any changes to your code so that it works with both Elasticsearch 0.90 and 1.x
There are some tricky things here, see the Tricky things section.
Upgrade to Elasticsearch 1.x
At that point, you should be using a recent version of the elasticsearch-py library and a recent version of Elasticsearch and should be all set.
Resources¶
See also
- http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/breaking-changes.html
- Breaking changes when migrating to Elasticsearch 1.0
- http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_deprecations.html
- Deprecated features when migrating to Elasticsearch 1.0
Tricky things¶
There are a few tricky differences between Elasticsearch 0.90 and 1.0 that will affect your code.
Changes with .values_dict()
and .values_list()
¶
Explanation¶
In Elasticsearch 1.x, you get back different shapes of things depending
on whether you specify “fields”. To smooth this out and normalize the
differences between Elasticsearch 0.90 and 1.x, ElasticUtils now always
passes in fields
property when you use
elasticutils.S.values_list()
and
elasticutils.S.values_dict()
.
Let’s show some code to illustrate the new behavior.
First, a bunch of setup:
>>> from elasticutils import get_es, S
>>> from elasticsearch.helpers import bulk_index
>>> URL = 'localhost'
>>> INDEX = 'fooindex'
>>> BOOK_DOCTYPE = 'book'
>>> PERSON_DOCTYPE = 'person'
>>> es = get_es(urls=[URL])
>>> es.indices.delete(index=INDEX, ignore=404)
Now define the two document mappings we’re going to use: book and person. Book has no stored fields. Person has two.
>>> mapping = {
... BOOK_DOCTYPE: {
... 'properties': {
... 'id': {'type': 'integer'},
... 'title': {'type': 'string'},
... 'tags': {'type': 'string'},
... }
... },
... PERSON_DOCTYPE: {
... 'properties': {
... 'id': {'type': 'integer', 'store': True},
... 'name': {'type': 'string', 'store': True},
... 'weight': {'type': 'integer'}
... }
... }
... }
Create the index with the mappings, add some books and add some people.
>>> es.indices.create(INDEX, body={'mappings': mapping})
>>> books = [
... {'_id': 1, 'id': 1, 'title': '10 Balloons', 'tags': ['kids', 'hardcover']},
... {'_id': 2, 'id': 2, 'title': 'Puppies', 'tags': ['animals']},
... {'_id': 3, 'id': 3, 'title': 'Dictionary', 'tags': ['reference']},
... ]
>>> bulk_index(es, books, index=INDEX, doc_type=BOOK_DOCTYPE)
(3, [])
>>> people = [
... {'_id': 1, 'id': 1, 'name': 'Bob', 'weight': 40},
... {'_id': 2, 'id': 2, 'name': 'Jim', 'weight': 44},
... {'_id': 3, 'id': 3, 'name': 'Jim Bob', 'weight': 42},
... ]
>>> bulk_index(es, people, index=INDEX, doc_type=PERSON_DOCTYPE)
[...]
>>> es.indices.refresh(index=INDEX)
[...]
Now let’s do some queries so we can see how things work now.
Let’s build a basic_s
that looks at our Elasticsearch cluster and
the index. Also a book_s
and a person_s
.
>>> basic_s = S().es(urls=[URL]).indexes(INDEX)
>>> book_s = basic_s.doctypes(BOOK_DOCTYPE)
>>> person_s = basic_s.doctypes(PERSON_DOCTYPE)
How many documents are in our index?
>>> basic_s.count()
6
Call .values_list()
on books which has no stored fields so we get back
the _id
and _type
for each document returned and all values are lists:
>>> list(book_s.values_list())
[([u'1'], [u'book']), ([u'2'], [u'book']), ([u'3'], [u'book'])]
.values_list('id')
on books, so we get id returned and all values are
lists:
>>> list(book_s.values_list('id'))
[([1],), ([2],), ([3],)]
.values_list()
on persons which does have stored fields (id and
name, but not weight), so we get the stored fields returned and all
values are lists:
>>> list(person_s.values_list())
[([1], [u'Bob']), ([2], [u'Jim']), ([3], [u'Jim Bob'])]
.values_list('id')
on persons which works just like books because
we’ve specified which fields we want back:
>>> list(person_s.values_list('id'))
[([1],), ([2],), ([3],)]
The same goes for .values_dict()
.
What you need to do¶
- If you have calls to
.values_list()
and.values_dict()
that don’t specify any fields, then you either need to change the mapping and store the fields you want back, or change the calls so they specify the fields you want back. - Every time you use results from a
.values_list()
or.values_dict()
call, you need to change it to always treat the values as lists.