| Oct 21, 2021

7 MIN READ

Written by Rhythm Gupta
elasticsearch apr

ElasticSearch Cluster Setup: 10 Best Practices Tips

During the setup of Elasticsearch in production environments, users usually have concerns regarding parameters that are important, which parameters can be left out for later consideration, how users can increase the performance of their existing cluster/s and which settings need to be changed. In this article, I have tried to bundle all these important parameters setup derived from my experience of building multiple Elasticsearch clusters. Hopefully, this can put your life at ease.

Here are 10 Tips that can help you with your Elasticsearch cluster setup:

  • Avoid split brain:

In a cluster of multiple master eligible nodes, we always have a concern that if a network becomes partitioned or unstable then the cluster would accidently elect more than one master, which is referred to as the “split brain” scenario. So, to avoid such a scenario we need at least minimum master nodes votes to win an election for master node. To determine the minimum number of master eligible nodes for this election, we use the N/2 + 1 formula where N is the number of master eligible nodes.

In case of production clusters of 3 master eligible nodes, just add the below in your elasticsearch.yml file.

“discovery.zen.minimum_master_nodes: 2” 

  • Infrastructure cost reduction:

Pay less and get the maximum benefits. Today, while handling massive amounts of data, we worry about the huge infrastructure costs. And, here’s how Elasticsearch can come to your rescue. It has provided Index Lifecycle Management (ILM) policies to automatically manage indices. We can shift our data to the compute, storage or memory optimized systems based upon our performance, resiliency, and retention requirements. We can use Hot-warm-cold architecture in Elasticsearch which can be implemented using Kibana GUI. 

Hot Nodes: Can be used for indexing, as Indexing is a CPU and IO intensive operation. It holds your most-recent, most-frequently-searched time series data. Nodes in the hot tier need to be fast for both reads and writes which requires more hardware resources and faster storage (SSDs).

Warm Nodes: Used for older and read only indices. Indexes can be shrinked using _shrink API to occupy less space. Data can move to the warm tier once it is being queried less frequently. For resiliency, indices in the warm tier should be configured to use one or more replicas and it tends to utilize large, attached disks (usually spinning disks).

Cold Nodes: Use cold nodes for frozen indexes. Once data is no longer being updated, it can move from the warm tier to the cold tier. A frozen index has almost no overhead on the cluster, its shards memory is moved to persistence storage. We can still search but it will take a longer period. For resiliency, the cold tier can use fully mounted indices of searchable snapshots, eliminating the need for replicas.

The good part about this feature is that it is available in Elasticsearch basic subscription as well.

  • JVM heap size settings:

As Elasticsearch and Lucene are written in Java, we need to adjust the maximum heap space and JVM stats. It is important to note that the more heap available to Elasticsearch, the more memory it can use for filtering, caching and other processes to increase query performance. Also, too much heap space can lead to a large garbage collection.

By default, Elasticsearch automatically sets the JVM heap size based on a node’s role and total memory. Using a default sizing is recommended for most production environments.

In case you are overriding the default heap size then note that JVM should be set up to 50% of your RAM, but no more than 32GB (due to Java pointer inefficiency in larger heaps) and both the minimum and maximum heap size settings must be identical.

These values can be configured using the Xmx and Xms settings in the jvm.options file.

  • Disable swapping:

Operating systems try to use as much memory as possible for file system caches and eagerly swap out unused application memory. Elasticsearch performance can drastically suffer when the OS decides so, as it can even swap out Elasticsearch executable pages out of the disk. Disabling OS level swapping and enabling memory lock can help us to avoid such scenarios. Just add the below in your elasticsearch.yml file.

Set bootstrap.memory_lock: true

  • Virtual memory adjustment:

Elasticsearch uses mmaps directory by default to store its indices.  The default operating system limits on mmap counts and is likely to be too low, which may result in out of memory exceptions. So, to avoid running out of virtual memory, increase the number of limits on mmap counts in /etc/sysctl.conf file.

Set vm.max_map_count=262144

On DEB/RPM, this setting is configured automatically. No further configuration is required.

  • Increase Open file descriptor limit:

Elasticsearch uses a lot of file descriptors or file handlers. Running out of file descriptors can be disastrous and will most probably lead to loss of data. Make sure to increase the limit on the number of open files descriptors for the user running Elasticsearch to 65,536 or higher. On DEB/RPM the default settings are already configured to suit this requirement, but you can of course fine tune it in the /etc/security/limits.conf file. Just set nofile to 65536. For more information click here.

  • Disable wildcard:

As the data deleted from Elasticsearch cluster cannot be retrieved so to ensure that someone does not issue a DELETE operation on all indexes (* or _all) disable deleting all the indices by wildcard query. 

Set action.destructive_requires_name to true

  • Cluster sizing:

Since Elasticsearch is a resource intensive setup, there are multiple factors which we should consider before setting up our infrastructure sizing for elastic clusters. Like the size of data, frequency of data ingestion, retention requirement, number of active users, HA requirement for Kibana or Logstash, number of replicas required, data searching and reporting requirements and many more. 

For our ease, we can categorize this into two factors: volume and throughput requirements. Volume for estimating the storage and memory resources required to store the expected amount of data and shards for each tier of the cluster. Throughput for estimating the memory, compute, and network resources required to process the expected operations for each tier of the cluster. 

There is a simple math related to Elasticsearch volume sizing. Example actual space requirement for 100 GB of raw log data for the execution is 100GB * Json conversion factor (1.2) * compression (0.7) * replica set factor

For throughput in case of standard log processing then we may require less CPU like 8 to 16 cores with 8x memory factor. And if there is a need for more analytics computation in the form of DSL queries then we may need more CPU cores. Local SSD drives such as NVMe or Index based disk with high IOPS should be used rather than remote file systems such as NFS or SMB. Ultimately, the goal is to prevent the thread pool cues from growing faster than they are consumed. With insufficient computing resources, there is always a risk of search requests getting dropped.

To get the adequate amount of memory requirement, we should observe our cluster memory usage from the beginning. And if required we can scale in and out our cluster based upon our actual need. This is a great advantage of this technology. For more details refer click here.

  • Field mapping:

When we ingest data into Elasticsearch cluster it will automatically create a mapping with specific type for your field values. This seems quite easy and straightforward but based upon the data we should define a proper field mapping data type as if the wrong field type is chosen, we will get an indexing error.
Example: In case the first document is indexed like this where elasticsearch has marked the payload as date type:

POST index-sample/_doc

{

“Payment”: “In process”,

“Payload”: “2021-10-20”

}

And the next document is like below

POST index-sample/_doc

{

“Payment”: “In process”,

“Payload”: “user_locked”

}

Where the payload is not date type, elasticsearch may pop-up an error as it has already saved the payload field as a date. To avoid the above error, we may set dynamic date detection as false.

PUT index-sample

{

  “mappings”: {

    “date_detection”: false

  }

}

 

PUT index-sample/_doc/1 

{

  ” Payload “: “2021-10-20”

}

Also, in case you define large mapping you might face issues related to syncing the template across multiple nodes of cluster. This doesn’t omit the need to update our template with the changing data model. To avoid the above, we should use dynamic templates in Elasticsearch cluster which will automatically add field mapping based on our predefined mappings for specific types and the new fields. The important thing is to keep the templates as small as possible.

  • Speeding indexing:

To increase the indexing performance of elastic clusters, consider the below points.

Use of bulk request: Rather than using a single document insertion method we should use bulk requests in elastic clusters. Larger requests seem to perform better but we should always do a benchmark test to check the breaking point. This can be identified by trying out multiple values. Cluster can also be in memory pressure when the bulk request is too large going beyond a couple tens of megabytes per request. Need for bulk ingestion can occur when we write our own code to ingest custom data in bulk. Or this can be useful when we need to index data streams that can be queued up and indexed in batches of hundreds or thousands, such as logs. 

Use of multiple threads/workers to ingest data into Elasticsearch cluster: To use our resources to the fullest we should use multiple threads to send data to the cluster. If it is not able to keep up with current indexing rate elasticsearch will indicate the same by TOO_MANY_REQUESTS (429) response codes (EsRejectedExecutionException with the Java client). Testing with various values can tell us about the optimal number like the above scenario.

Increase refresh interval: In use cases where there is quite less or no search, increasing the refresh interval can increase our indexing process efficiency. We need to explicitly set the refresh interval to choose this behavior.

More general recommendations:

  • Configure access permissions on items such as clusters, indexes, and fields using role-based access control (RBAC) mechanism from Kibana GUI. This will provide fine grained access control to your cluster. Click here for more details.
  • Use TLS encryption for security: SSL/TLS encryption helps in preventing threats. Elasticsearch sends data from the nodes and clients in plain text if encryption is disable. Data may contain sensitive information and credentials like passwords. This gives rise to an attack in which attackers may create malicious nodes, attempt to join the clusters, and replicate data to gain access to it. Click here to get the security setup steps for Elasticsearch cluster.
  • Eliminate unnecessary data before indexing into ES. This will help in optimizing the space utilization of the cluster and also a better ground for our production data.
  • Use key instead of plain text passwords in the configuration setup. This will provide a better level of security for the Elasticsearch cluster.

 


Go to Top