Autosharding of data streams in Elasticsearch Serverless

Traditionally, users change the sharding configuration of data streams in order to deal with various workloads and make the best use of the available resources. In Elastic Cloud Serverless we've introduced autosharding of data streams, enabling them to be managed and scaled automatically based on indexing load. This post explores the mechanics of autosharding, its benefits, and its implications for users dealing with variable workloads. The autosharding philosophy is to increase the number of shards aggressively and reduce them very conservatively, such that an increase in shards is not followed prematurely by a reduction of shards due to a small period of reduced workload.

Autosharding of data streams in Serverless Elasticsearch

Imagine you have a large pizza that needs to be shared among your friends at a party. If you cut the pizza into only two slices for a group of six friends, each slice will need to serve multiple people. This will create a bottleneck, where one person hogs a whole slice while others wait, leading to a slow sharing process. Additionally, not everyone can enjoy the pizza at the same time; you can practically hear the sighs from the friends left waiting. If more friends show up unexpectedly, you’ll struggle to feed them with just two slices and find yourself scrambling to reshape those slices on the spot.

On the other hand, if you cut the pizza into 36 tiny slices for those same six friends, managing the sharing becomes tricky. Instead of enjoying the pizza, everyone spends more time figuring out how to grab their tiny portions. If the slices are too small, the pizza might even fall apart.

To ensure everyone enjoys the pizza efficiently, you’d aim to cut it into a number of slices that matches the number of friends. If you have six friends, cutting the pizza into 6 or 12 slices allows everyone to grab a slice without long waits. By finding the right balance in slicing your pizza, you’ll keep the party running smoothly and everyone happy.

You know it’s a good analogy when you immediately follow-up with the explanation; the pizza represents the data, the slices represent the index shards, and the friends are the Elasticsearch nodes in your cluster.

Traditionally, users of Elasticsearch had to anticipate their indexing throughput and manually configure the number of shards for each data stream. This approach relied heavily on predictive heuristics and required ongoing adjustments based on workload characteristics whilst also balancing data storage, search analytics, and application performance.

Businesses with seasonal traffic, like retail, often deal with spikes in data demands, while IoT applications can experience rapid load increases at specific times. Development and testing environments typically run only a few hours a week, making fixed shard configurations inefficient. New applications might struggle to estimate workload needs accurately, leading to potential over- or under-provisioning.

We've introduced autosharding of data streams in Elastic Cloud Serverless. Data streams in Serverless are managed and scaled automatically based on indexing load - automatically slicing your pizza as friends arrive to your party or finish eating.

The promise of autosharding

Autosharding addresses these challenges by automatically adjusting the number of shards in response to the current indexing load. This means that instead of users having to manually tweak configurations, Elasticsearch will dynamically manage shard counts for the data streams in your project based on real-time data traffic.

Elasticsearch keeps track of the indexing load for every index as part of a metric named write load, and exposes it for on-prem and ESS deployments as part of the index stats API under the indexing section.