cross cluster replication

Cross Cluster Replication in Elasticsearch

Written by Arun Kumar

| Sep 18, 2023

6 MIN READ

Explore the game-changing Cross-Cluster Replication (CCR) feature in Elasticsearch, enabling high availability, disaster recovery, load balancing, analytics, and more. Learn the prerequisites and setup steps for seamless data replication between clusters.

Key Highlights

  • Instant Disaster Recovery: Elasticsearch’s CCR ensures uninterrupted operations, even when your primary cluster goes down.
  • Effortless Load Balancing: Distribute data across clusters for improved performance and real-time scalability.
  • Global Data Analytics Made Easy: Analyze data from multiple regions hassle-free with CCR’s data replication.
  • Seamless Elastic Scaling: Expand Elasticsearch clusters without data inconsistency, thanks to CCR.

Replication of data is one of the most important & necessary features demanded in today’s world. Data replication across clusters had been complicated and time-consuming process. Companies had to use external tools or write custom scripts to move data from one cluster to another. Elasticsearch has come to the rescue with its Cross-Cluster Replication feature.

Cross-Cluster Replication (CCR) is a built-in feature in Elasticsearch that allows for the replication of data from one cluster to another in near real-time.

Cross Cluster b1

This feature holds significant importance in Elasticsearch for various reasons:

1. High Availability: Cross-cluster replication enhances the high availability of data in Elasticsearch. By replicating data to multiple clusters, if one cluster becomes unavailable due to hardware failures, network issues, or other problems, the data remains accessible from other clusters. This ensures uninterrupted access to data for search and analytics.

2. Disaster Recovery: Disaster can strike at any moment, and it’s crucial to have a disaster recovery plan in place. Cross-Cluster Replication can help in disaster recovery scenarios by replicating data from a primary cluster to a secondary cluster in a different geographical location. This ensures that if the primary cluster goes down, the secondary cluster can take over, and business operations can continue without disruption.

3. Load balancing: Large companies with massive amounts of data may find that a single cluster cannot handle the load. Cross-Cluster Replication allows for load balancing by distributing data across multiple clusters, improving performance and scalability. Users can query data from any cluster, and the results will be combined and returned in real-time.

4. Analytics: Data analysis across geographical regions can be challenging, especially when the data is stored in separate clusters. With Cross-Cluster Replication, users can replicate data from multiple clusters to a central cluster, making it easier to perform analytics across all data sources. This enables companies to make informed decisions based on a complete view of their data.

5. Scaling and Elasticity: Elasticsearch clusters can be scaled horizontally by adding more nodes or clusters as needed. CCR ensures that the data remains consistent across all clusters, supporting the elastic scaling of Elasticsearch infrastructure.

6. Data Locality: Replicating data closer to users or applications in different regions improves query response times and user experiences. CCR allows you to store data in proximity to where it’s needed.

7. Testing and Development: CCR can be useful in testing and development environments where you need to replicate production data to a separate cluster for testing without affecting the production environment.

In summary, cross-cluster replication is a crucial feature in Elasticsearch that addresses key concerns such as high availability, disaster recovery, load balancing, geo-redundancy, scalability, and data governance. It ensures that Elasticsearch can meet the demands of modern, distributed, and highly available search and analytics applications.

To set up cross-cluster replication in Elasticsearch, you need to ensure that you have the following prerequisites in place:

Prerequisites:

  • To use CCR, local and remote clusters must be version 6.7.x or higher.
  • Local and remote clusters must be in compatible versions.
  • Local and remote cluster must trust each other.
  • Ensure that security configurations are in place if your clusters use Elasticsearch’s security features
  • Clusters should be properly configured and operational
  • Ensure that the transport and HTTP settings on both clusters allow for communication between them

Limitations:

Cross-cluster replication is designed to replicate user-generated indices only, and doesn’t currently replicate any of the following:

  • System indices
  • Machine learning jobs
  • index templates
  • Index lifecycle management and snapshot lifecycle management polices
  • User permissions and role mappings
  • Snapshot repository settings
  • Cluster settings
  • Searchable snapshot

If you want to replicate any of this data, you must replicate it to a remote cluster manually.

Set up cross-cluster replication:

We need to follow below steps to set up cross cluster replication

  • Connect a local cluster to a remote cluster
  • Create a leader index in a remote cluster
  • Create a follower index that replicates a leader index
  • Automatically create follower indices

An index in one Elasticsearch cluster can be configured to replicate changes from an index in another Elasticsearch cluster. The index that is replicating changes is termed a “follower index” and the index being replicated from is termed the “leader index”. The follower index is passive in that it can serve read requests and searches but can not accept direct writes; only the leader index is active for direct writes.

Cross Cluster b2

Define Remote Clusters

When setting up CCR, Elasticsearch clusters must know about other Elasticsearch clusters, Replication in CCR is pull based, and doesn’t require us to specify a connection from the DC cluster to the DR cluster, So we need to define the Remote cluster.
Let’s define the DC cluster via an API call on the DR cluster.

Cross Cluster b3

(For API based commands, we recommend using the dev tools console within Kibana, this can be found via Kibana -> Dev tools -> Console)

The above API call defines a remote cluster with alias “DC-cluster”, that can be accessed at “127.0.0.1:9300”. One or more seeds can be specified, and it is generally recommended to specify more than one, in case a seed is not available during the handshake phase.

It is also important to note port 9300 for connecting to the ‘DC-cluster’, the ‘DC-cluster’ is listening for the HTTP protocol on port 9200. However, replication occurs using the Elasticsearch transport protocol (for node-to-node communication); the default is port 9300

We can also define DC cluster in kibana UI, click the “Management” (gear icon) within the left navigation panel, then navigate to stack management “Remote Clusters” within the Elasticsearch section.

Cross Cluster b4

GET /_remote/info
This API allows us to retrieve all of the configured remote cluster information. And it returns connection and end point information by the configured remote cluster.

Add follower index

Next we have to add the follower index with respect to the leader index, Follower indices are created with the create follower API. When you create a follower index, you must reference the remote cluster and the leader index that you created in the remote cluster.

Cross Cluster b5

Cross Cluster b6

Syslog-copy is the name of the replicated index within the DR cluster, We’re replicating from the DC cluster we defined previously, and the name of the index we’re replicating is called syslog on the Dc cluster

We’ve configured an index to replicate from one Elasticsearch cluster to another!

Add Auto follow pattern

And one thing here that the example above won’t work very well for time-based use-cases, where there is an index per day. To do that we have CCR API for defining auto-follow patterns.

Cross Cluster b7

The sample API call above will replicate an index that begins with syslog

We can also use the CCR UI in Kibana for defining an auto-follow pattern.

Cross Cluster b8

Conclusion

Cross-Cluster Replication (CCR) in Elasticsearch is a game-changer, offering unparalleled benefits such as disaster recovery, load balancing, global data analytics, and elastic scaling. To harness the full potential of Elasticsearch and its CCR feature, partner with Ashnik. Our expert team can help you set up and optimize CCR for your specific needs. Reach out to us today for a seamless Elasticsearch experience.


Go to Top