- SG: +65 64383504
- IN: 022 25771219
- IN: 022 25792714
- IN: +91 9987536436
Cross Cluster in Elastic Stack – Use case & Improvement recommendations
Ajit Gadge I Senior Database Consultant
I wanted to share some recommendations for setting up an Elastic Stack cluster to achieve high availability / DR.
Recently, I came across a situation where a large one organization is building the ELK stack (Elastic Stack) across the data center, the architecture of which is as belowIf you are a System Admin or a DBA, there is nothing wrong in above architecture to achieve DC/DR deployment. But there are some limitations while you are using cross-cluster DC with Elastic Stack. Let’s look at this in detail.
Typically, architecture and design solution in Elastic Stack is considered in a single data center solution and all the nodes of Elastic Search are in single DC. In the above-given solution, all the nodes in ES are spread across two DC. This architecture creates some challenges as mentioned below:
1. Network Latency:
Network disruption is very common over WAN, especially if DCs/Servers are separately located, physically. Most of the large organizations have dedicated WAN links with very good speed and latency. But Elastic Search is built to be resilient to networking disconnects and that resiliency is intended to handle the exception and not the norm. Latency is very a common problem in WAN network. High network latency may slow the indexing activity in Elastic Search. Elastic Search indexes the data first into primary shards and then replicates to the replica. But if there is latency in the network then indexing is slow and its missing shard (Secondary Replica) is also a very common problem.
2. Unreliable connectivity:
In case DCs loose network connectivity or get isolated for few milliseconds, it is likely that the remote shard may go missing and comes back in a disconnected state.
To sync up this, one needs to sync replica again with primary to provide a consistent result.
3. Master Availability:
In above eventuality, assume that DC1 to DC2 network is down for few milliseconds and Master node1 is currently acting as elected master in DC1, then DC2 eligible master may start electing a new master within DC2. This master electing process starts because DC2 eligible master node is not able to ping the existing master node in DC1 due to network reliability. Even in Elastic Stack which uses Zen protocol for node availability due to network availability, there are chances that it may create an issue. Due to this DC2 may start indexing new data which is inconsistence with DC1. When the link is restored, these nodes will also be pushing data and documents across the network while still handling the full indexing and request load. This necessitates larger or more powerful clusters to ensure enough CPU and IOPS to maintain acceptable performance during such events.
Now, let’s look at the possible solution to achieve cross-cluster in Elastic Stack.
1. Data in both DCs with 2 clusters.
One can configure Elastic Stack with messaging queue like Kafka/Redis MQ etc. Beats can send data to message queue which can replicate at both DCs. From these message queues, local Logstash processes the data and sends it to local Elastic Search cluster. So, each DC will have its own ES cluster. So, indexing of document from relevant queue happens to local ES cluster only.
In case of network down/lost between DCs, it will restore and continue where it left and continue indexing data into local ES cluster.
2. Snapshot and Restore using Curator:
If someone really does not want “Active-Active” cluster at both DCs then one can use curator tool to do a continuous or timely snapshot and restore at another end. One can configure curator to take a snapshot in the interval from DC1 and restore continuously/timely at DC2. Every time when a curator restores the data, it makes sure that only incremental data restores at other end and that is available for search/ DR case.
3. Cross-Cluster Search:
Cross-Cluster Search has recently started receiving support. More details are here. You can search in both DCs as single big cluster but indexing of data can happen locally.
These are some solutions which will surely help achieve high availability / DR in Elastic Stack. You can get in touch with us on email@example.com for further queries.
- Ajit brings over 16 years of solid experience in solution architecting and implementation. He has successfully delivered solutions on various database technologies including PostgreSQL, MySQL, SQLServer. His derives his strength from his passion for database technologies and love of pre-sales engagement with customers. Ajit has successfully engaged with customers from across the globe including South East Asia, USA and India.