Cross Cluster Replication In Elasticsearch

Cross Cluster Replication in Elasticsearch

Observability Platform | Sep 18, 2023

6 MIN READ

Explore the game-changing Cross-Cluster Replication (CCR) feature in Elasticsearch, enabling high availability, disaster recovery, load balancing, analytics, and more. Learn the prerequisites and setup steps for seamless data replication between clusters.

Table of Contents

Key Highlights

Instant Disaster Recovery: Elasticsearch’s CCR ensures uninterrupted operations, even when your primary cluster goes down.
Effortless Load Balancing: Distribute data across clusters for improved performance and real-time scalability.
Global Data Analytics Made Easy: Analyze data from multiple regions hassle-free with CCR’s data replication.
Seamless Elastic Scaling: Expand Elasticsearch clusters without data inconsistency, thanks to CCR.

Prerequisites
Limitations
Set Up Cross-Cluster-Replication

Replication of data is one of the most important & necessary features demanded in today’s world. Data replication across clusters had been complicated and time-consuming process. Companies had to use external tools or write custom scripts to move data from one cluster to another. Elasticsearch has come to the rescue with its Cross-Cluster Replication feature.

Cross-Cluster Replication (CCR) is a built-in feature in Elasticsearch that allows for the replication of data from one cluster to another in near real-time.

This feature holds significant importance in Elasticsearch for various reasons:

1. High Availability: Cross-cluster replication enhances the high availability of data in Elasticsearch. By replicating data to multiple clusters, if one cluster becomes unavailable due to hardware failures, network issues, or other problems, the data remains accessible from other clusters. This ensures uninterrupted access to data for search and analytics.

2. Disaster Recovery: Disaster can strike at any moment, and it’s crucial to have a disaster recovery plan in place. Cross-Cluster Replication can help in disaster recovery scenarios by replicating data from a primary cluster to a secondary cluster in a different geographical location. This ensures that if the primary cluster goes down, the secondary cluster can take over, and business operations can continue without disruption.

3. Load balancing: Large companies with massive amounts of data may find that a single cluster cannot handle the load. Cross-Cluster Replication allows for load balancing by distributing data across multiple clusters, improving performance and scalability. Users can query data from any cluster, and the results will be combined and returned in real-time.

4. Analytics: Data analysis across geographical regions can be challenging, especially when the data is stored in separate clusters. With Cross-Cluster Replication, users can replicate data from multiple clusters to a central cluster, making it easier to perform analytics across all data sources. This enables companies to make informed decisions based on a complete view of their data.

5. Scaling and Elasticity: Elasticsearch clusters can be scaled horizontally by adding more nodes or clusters as needed. CCR ensures that the data remains consistent across all clusters, supporting the elastic scaling of Elasticsearch infrastructure.

6. Data Locality: Replicating data closer to users or applications in different regions improves query response times and user experiences. CCR allows you to store data in proximity to where it’s needed.

7. Testing and Development: CCR can be useful in testing and development environments where you need to replicate production data to a separate cluster for testing without affecting the production environment.

In summary, cross-cluster replication is a crucial feature in Elasticsearch that addresses key concerns such as high availability, disaster recovery, load balancing, geo-redundancy, scalability, and data governance. It ensures that Elasticsearch can meet the demands of modern, distributed, and highly available search and analytics applications.

To set up cross-cluster replication in Elasticsearch, you need to ensure that you have the following prerequisites in place:

Prerequisites:

To use CCR, local and remote clusters must be version 6.7.x or higher.
Local and remote clusters must be in compatible versions.
Local and remote cluster must trust each other.
Ensure that security configurations are in place if your clusters use Elasticsearch’s security features
Clusters should be properly configured and operational
Ensure that the transport and HTTP settings on both clusters allow for communication between them

Limitations:

Cross-cluster replication is designed to replicate user-generated indices only, and doesn’t currently replicate any of the following:

System indices
Machine learning jobs
index templates
Index lifecycle management and snapshot lifecycle management polices
User permissions and role mappings
Snapshot repository settings
Cluster settings
Searchable snapshot

If you want to replicate any of this data, you must replicate it to a remote cluster manually.

Set up cross-cluster replication:

We need to follow below steps to set up cross cluster replication

Connect a local cluster to a remote cluster
Create a leader index in a remote cluster
Create a follower index that replicates a leader index
Automatically create follower indices

An index in one Elasticsearch cluster can be configured to replicate changes from an index in another Elasticsearch cluster. The index that is replicating changes is termed a “follower index” and the index being replicated from is termed the “leader index”. The follower index is passive in that it can serve read requests and searches but can not accept direct writes; only the leader index is active for direct writes.

Define Remote Clusters

When setting up CCR, Elasticsearch clusters must know about other Elasticsearch clusters, Replication in CCR is pull based, and doesn’t require us to specify a connection from the DC cluster to the DR cluster, So we need to define the Remote cluster.
Let’s define the DC cluster via an API call on the DR cluster.

(For API based commands, we recommend using the dev tools console within Kibana, this can be found via Kibana -> Dev tools -> Console)

The above API call defines a remote cluster with alias “DC-cluster”, that can be accessed at “127.0.0.1:9300”. One or more seeds can be specified, and it is generally recommended to specify more than one, in case a seed is not available during the handshake phase.

It is also important to note port 9300 for connecting to the ‘DC-cluster’, the ‘DC-cluster’ is listening for the HTTP protocol on port 9200. However, replication occurs using the Elasticsearch transport protocol (for node-to-node communication); the default is port 9300

We can also define DC cluster in kibana UI, click the “Management” (gear icon) within the left navigation panel, then navigate to stack management “Remote Clusters” within the Elasticsearch section.

GET /_remote/info
This API allows us to retrieve all of the configured remote cluster information. And it returns connection and end point information by the configured remote cluster.

Add follower index

Next we have to add the follower index with respect to the leader index, Follower indices are created with the create follower API. When you create a follower index, you must reference the remote cluster and the leader index that you created in the remote cluster.

Syslog-copy is the name of the replicated index within the DR cluster, We’re replicating from the DC cluster we defined previously, and the name of the index we’re replicating is called syslog on the Dc cluster

We’ve configured an index to replicate from one Elasticsearch cluster to another!

Add Auto follow pattern

And one thing here that the example above won’t work very well for time-based use-cases, where there is an index per day. To do that we have CCR API for defining auto-follow patterns.

The sample API call above will replicate an index that begins with syslog

We can also use the CCR UI in Kibana for defining an auto-follow pattern.

Conclusion

Cross-Cluster Replication (CCR) in Elasticsearch is a game-changer, offering unparalleled benefits such as disaster recovery, load balancing, global data analytics, and elastic scaling. To harness the full potential of Elasticsearch and its CCR feature, partner with Ashnik. Our expert team can help you set up and optimize CCR for your specific needs. Reach out to us today for a seamless Elasticsearch experience.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Talking Open Source Podcast: Demystifying AI For Enterprise - Part 1 Watch Now!

Revolutionize Your CX with
Unified Observability

CloudOps Automation tool for Infrastructure monitoring and deployment.

Indonesia’s top digital credit service provider leverages Ashnik’s PostgreSQL expertise and services

Revolutionize Your CX with Unified Observability

Automate and monitor your PostgreSQL with ease.

The CloudOps Automation Tool for easy Infrastructure deployment and monitoring

Maximize Potential of Your Data with Streaming Data Pipeline Architecture

End-to-End Traceability and Unified Observability for the Modern Infrastructure

Watch: How to auto-scale in deployments using Kubernetes(K8s): A Technical Demo

Cross Cluster Replication in Elasticsearch

Key Highlights

Prerequisites:

Limitations:

Set up cross-cluster replication:

Conclusion

Read More

Why Equality Sort Range Indexes are powerful with MongoDB Queries?

ElasticSearch Cluster Setup: 10 Best Practices Tips

How to configure High available Elasticsearch?

Products