In most languages types are your friends. You want types to describe the entity you're dealing with as clearly as possible. You make your base class Animal from which your Dog inherits from, then you make your SheepDog inherit from Dog and finally BorderCollie gets you something specific. (This is a terrible example, please don't quote me on this.)

Let's use a more realistic example. You have 50 different types of MyBaseEvent. When you index your types to ElasticSearch you think "I'll explicitly say these are type MyFirstEvent and MySecondEvent and so on." Don't.

This may actually work for you. But I am pretty confident in saying ElasticSearch (or, in this case more specifically Lucene, the underlying search engine) is not doing what you think it's doing.

So What is it Doing?

Lets imagine we are storing events. Events in this imaginary system abide by these rules:

  • Fields that have the same name are the same type
  • Events all inherit, in code from some base event model

When Lucene detects a new type the default behavior is to add it to the mapping:

{
     "mappings": {
          "my_first_event_type": { properties... }
     }           
}

When it detects a second new type it adds that to the mapping again:

{
     "mappings": {
          "my_first_event_type": { properties... },
          "my_second_event_type": { properties... }
     }           
}

All of the mapping definitions are now effectively duplicated for all of the shared properties. In a lot of event driven systems that share properties are normally the bulk of the event. They all contain identifiers, names, dates, history, etcetera. The properties that are unique per an event type are usually the smaller part of the event.

This can actually bring your system down.

How it Crashes

Every time a node sees that the type received is not in the mapping it must stop processing until this can be resolved. This is also true if the type is found, but a new field has been detected.

To resolve this issue the master node is notified of the requested change to the mapping. The master then resolves the changes, if it can (if it can't due to a field mapping conflict an error will be returned to the ingest node and the original request will error out). Once the changes are resolved it notifies all nodes in the cluster to pause ingest and update their mappings.

As an occasional operation this is OK. However if you receive a flood of new typings the cluster will drag to a halt as all this occurs.

A more catastrophic way that this can fail is the mapping is too large. Because the map has to be posted around the cluster on every update if you treat 50 events as individual types this map can become very large, we saw mappings of a couple hundred megabytes. Once this happens between the frequent type changes/additions and the size the cluster actually went down and nodes became unhealthy because of the length of time they were sitting around waiting on the master. Because the master had so many updates to apply and it took so long to get the mapping it was taken out of rotation as master and then the entire process started over again.

How to Prevent This

Preventing this just means thinking of types differently. If two things are essentially the same "thing" they should be the same type. If you have a system that has articles, blogs, authors and users. You can make a strong argument that articles and blogs are basically the same thing, differing with some metadata. Same for users and authors. One key here though is that you store the exact type on the metadata so that you can do filtering and searching more easily.

Going back to the event example what this would mean is that your mapping would only contain a single item:

{
     "mappings": {
          "event_data": { properties... }
     }           
}

Where one of the properties would include the event's exact type, something like eventName: my_first_event_type.

Why Those Rules Matter

Earlier I mentioned that the events abide by two rules, fields that share the same name are of the same type. This is an important rule if you're going to stash the data into the same type in Lucene. Because the mapping contains the field's type.

{
    "mappings": {
        "event_data": { 
            "properties": {
                "event_name": {
                    "type": "string"
                },
                "event_time": {
                    "type": "date"
                },
                "some_value": {
                    "type": "decimal"
                }
            }
        }
    }           
}

So if you attempted to index an event that uses some_value as a decimal you'll have the above mapping. If you then try to index some_value as a string you'll receive an error.

Further Reading:

Documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping.html

About Author

Siva Katir

Siva Katir

Senior Software Engineer working at PlayFab in Seattle Washington.