Carefully Assess the Cost when Splitting Data into Multiple Elasticsearch Clusters

I'd say that everyone who uses Elastisearch should at some point undertake the exercise of determining whether or not the data placed into a single Elasticsearch cluster can instead be split into multiple clusters, separated by performance characteristics, degree of rules processing required, frequency of updates, and so forth. For one, it can be a step towards breaking up a monolithic application into smaller services that can be managed independently, and this is usually a desirable direction for longer-term operations. Further, it may be possible to realize performance gains or reduction in the needed hardware: (a) direct simpler queries to their own dedicated Elasticsearch cluster; (b) segregate frequently updated data from infrequently updated data, given that most of the tougher challenges in Elasticsearch operations tend to materialize in force only in an environment of frequent updates; and (c) corral all of the heaviest rules processing into its own cluster.

Thinking about this for any given use of Elasticsearch can, at least in the abstract, look like the usual common sense application of separation of concerns at the data layer. However, while it is certainly worth investigating, having done this I can't say that I would recommend it unless it is absolutely necessary, for example as a part of breaking up a monolith. If performance is the driving concern, then in most cases similar performance gains can be realized by increasing the processing power or memory available to the cluster, such as by adding additional nodes. In the normal run of things, this will cost less than splitting the data by different characteristics and running multiple clusters. If lowering the cost of the Elasticsearch data layer is the goal, then it is unlikely to be met by creating more clusters, even if they contain fewer nodes overall.

Elasticsearch clusters require a considerable investment in time and expertise, and every use of Elasticsearch has its own character and pitfalls. A single cluster has a certain minimum level of required maintenance, monitoring, planning, and learning on the part of a development and operations team, irrespective of its size. Clusters will need to be rebuilt every now and again, and this is especially true if an application is under active development, and thus rules and indexes change. If that cluster is split into two, then this component of the total cost of ownership doubles. Rebuilding one cluster is an annoying chore that can be accomplished in short intervals, mixed in with the rest of a single day's work. Rebuilding five clusters is another story, and can easily take a week's worth of dedicated attention away from other tasks.

It has been my experience that the operational costs incurred by each additional Elasticsearch cluster are usually somewhat larger than the savings generated by splitting out data into multiple clusters, at least putting aside the scenario of breaking up a monolithic application. In this age of cheap computation and storage, the relative cost of developer time becomes every higher, and that is before we consider the difficulty of hiring people experienced in the use of Elasticsearch. It is almost always desirable to spend incrementally more on infrastructure rather than incrementally more on staffing if the expectation of benefits is in the same ballpark.

So all things considered, the first recourse for Elasticsearch performance enhancement should usually be anything other than splitting out data into multiple clusters with different characteristics, rules, and sizes. Adding nodes, scaling up the size of nodes, and sensible adjustment of rules and data structure within a single cluster is a much better choice given the relative costs of all of these options. If cost reduction is the goal, then investigating options other than Elasticsearch for the data layer might be a good place to start; it is certainly the case that many of the groups using it do not need to use it, and many less costly technologies would do just as well. If Elasticsearch must be used, then be careful about changes that aim to reduce the number and size of nodes needed: you will probably just end up spending far more on operations and maintenance than was saved.