Thought leadership from the most innovative tech companies, all in one place.

How to Change Sharding of Existing Indices on an Elasticsearch Cluster

image

Introduction

As a best practice, it is recommended to maintain the shards' size between 10 and 50 GiB to strike a balance between having too many shards causing an overloaded cluster and having large shards that make cluster recovery difficult. In addition to this, keeping the shard size even and shard count a multiple of nodes would help with the even distribution of shards, reducing storage and performance skew.

Resolution

The primary shard count of an index can only be configured at the time of index creation and cannot be changed afterward. In order to change the sharding, you would have to create a new index with updated sharding and use _reindex API to copy all indices from existing indices to the new index.

For the purpose of this article, let us consider the following indices on an Elasticsearch cluster.

$ curl -XGET 'http://localhost:9200/_cat/indices?s=index&v=true'
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   index-000001 8Dmwh8s-T2Csy8atxzK_8Q   5   1          6            0     32.7kb          9.5kb
green  open   index-000002 SadtrQ8ZRZWMmjhWrRoaUA   5   1          4            0     19.7kb            5kb
green  open   index-000003 wUw8StUKTEKExeaoOKSUEg   5   1          5            0     46.2kb         15.6kb
green  open   index-000004 yPXHeD9nQomucMedA6_Hzg   5   1          6            0     28.3kb         14.1kb
green  open   index-000005 J8aPYx9eRxCPseYvihz-Pg   5   1          8            0     27.7kb         18.5kb

Aggregating indices using index pattern

Often, indices are rolled-over/created on a daily basis to make it easily searchable. This could possibly result in many shards that are also small in size. These indices often have a pattern which can be used to aggregate these indices into less number of indices and thereby reducing shards. For example, the indices above have the pattern “index-*”. In such cases, you can create a new index as your aggregation destination. If you wish to change the sharding (primary shard count), you would have to do the same when creating a new index as below.

$ curl -XPUT 'http://localhost:9200/agg-index-1-10?pretty' -H 'Content-Type: application/json' -d '
> {
>   "settings": {
>     "index": {
>       "number_of_shards": 3,
>       "number_of_replicas": 1
>     }
>   }
> }'
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "agg-index-1-10"
}

Then, you can use the _reindex API to copy all documents from the existing indices to the newly created index.

$ curl -XPOST 'http://localhost:9200/_reindex?wait_for_completion=false&pretty' -H 'Content-Type: application/json' -d '
> {
>   "source": {
>     "index": "index-*"
>   },
>   "dest": {
>     "index": "agg-index-1-10"
>   }
> }'
{
  "task" : "y2N1gjCZTcCUkYXaNTqqzQ:19969987"
}

You can monitor the above task using the _tasks API, until you notice “completed” : true.

$ curl -XGET 'http://localhost:9200/_tasks/sFCzlAnzSnya1X07eN3V6w:14282529?pretty'

Then, verify whether the document count matches. In this case, the sum of documents in “index-*” indices should be equal to the documents “agg-index-1–10” index.

$ curl -XGET 'http://localhost:9200/_cat/indices?s=index&v=true'
health status index          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   agg-index-1-10 YhwfRgMbQ8e_AfX9sy2UqQ   3   1         50            0       35kb         17.5kb
green  open   index-000001   8Dmwh8s-T2Csy8atxzK_8Q   5   1         10            0     46.4kb         23.2kb
green  open   index-000002   SadtrQ8ZRZWMmjhWrRoaUA   5   1         10            0     29.7kb         14.8kb
green  open   index-000003   wUw8StUKTEKExeaoOKSUEg   5   1         10            0     55.8kb           25kb
green  open   index-000004   yPXHeD9nQomucMedA6_Hzg   5   1         10            0       38kb           19kb
green  open   index-000005   J8aPYx9eRxCPseYvihz-Pg   5   1         10            0     46.4kb         23.2kb

Once document count matches and your applications/pipelines/services are pointing towards the new indices, you can delete the old indices as follows.

$ curl -XDELETE 'http://localhost:9200/indices-*?pretty'

Aggregating specific indices

Sometimes, you may have to aggregate or reduce the shards of a specific set of indices. In such cases, you can use the _reindex API as follows.

$ curl -XPOST 'http://localhost:9200/_reindex?wait_for_completion=false&pretty' -H 'Content-Type: application/json' -d '
> {
>   "source": {
>     "index": ["index-000001", "index-000002"]
>   },
>   "dest": {
>     "index": "agg-index-1-2"
>   }
> }'
{
  "task" : "y2N1gjCZTcCUkYXaNTqqzQ:19974516"
}

You can monitor the task as described above. Once you observe “completed”: true in the _tasks API. Ensure that the document count matches and delete unwanted indices.

$ curl -XGET 'http://localhost:9200/_cat/indices?s=index&v=true'
health status index         uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   agg-index-1-2 DofJgv0FRMiLi5XimLMgnw   5   1         20            0     10.8kb           624b
green  open   index-000001  8Dmwh8s-T2Csy8atxzK_8Q   5   1         10            0     46.4kb         23.2kb
green  open   index-000002  SadtrQ8ZRZWMmjhWrRoaUA   5   1         10            0     29.7kb         14.8kb
green  open   index-000003  wUw8StUKTEKExeaoOKSUEg   5   1         10            0     55.8kb           25kb
green  open   index-000004  yPXHeD9nQomucMedA6_Hzg   5   1         10            0       38kb           19kb
green  open   index-000005  J8aPYx9eRxCPseYvihz-Pg   5   1         10            0     46.4kb         23.2kb

Decreasing shard size

Sometimes, your shard size might be too large. This can impact cluster recovery as large shards make it difficult. It can also slow down blue/green deployments that are initiated when configuration changes are triggered on your Amazon Elasticsearch Service domain. In this case, you can increase shard count per index when creating new index as below.

$ curl -XPUT 'http://localhost:9200/inc-index-1-2?pretty' -H 'Content-Type: application/json' -d '
> {
>   "settings": {
>     "index": {
>       "number_of_shards": 10,
>       "number_of_replicas": 1
>     }
>   }
> }'
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "inc-index-1-2"
}

Then, similar to above, use the _reindex API to copy all documents from existing to the new index.

$ curl -XPOST 'http://localhost:9200/_reindex?wait_for_completion=false&pretty' -H 'Content-Type: application/json' -d '
{
  "source": {
    "index": "agg-index-1-2"
  },
  "dest": {
    "index": "inc-index-1-2"
  }
}'
{
  "task" : "sFCzlAnzSnya1X07eN3V6w:14282529"
}

Once the task is complete, you should see the document count matching before going ahead to delete the old index.

$ curl -XGET 'http://localhost:9200/_cat/indices?s=index&v=true'
health status index         uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   agg-index-1-2 DofJgv0FRMiLi5XimLMgnw   5   1         20            0     48.8kb         24.4kb
green  open   inc-index-1-2 7d_26FMXSTqPtvfxXVSBcg  10   1         20            0     74.4kb         37.2kb
green  open   index-000001  8Dmwh8s-T2Csy8atxzK_8Q   5   1         10            0     46.4kb         23.2kb
green  open   index-000002  SadtrQ8ZRZWMmjhWrRoaUA   5   1         10            0     29.7kb         14.8kb
green  open   index-000003  wUw8StUKTEKExeaoOKSUEg   5   1         10            0     55.8kb           25kb
green  open   index-000004  yPXHeD9nQomucMedA6_Hzg   5   1         10            0       38kb           19kb
green  open   index-000005  J8aPYx9eRxCPseYvihz-Pg   5   1         10            0     46.4kb         23.2kb



Continue Learning