Elasticsearch has become indispensable for data management, but ensuring its reliability is crucial. Cross-Cluster Replication (CCR) is the solution. CCR enables data replication between Elasticsearch clusters, offering redundancy and disaster recovery. This step-by-step guide aims to make setting up CCR Elasticsearch replication easier and enables you to harness its benefits effectively. By the end of this article, you’ll have the knowledge and confidence to implement CCR in your Elasticsearch environment, ensuring data availability and resilience in the face of any challenges.
Whether running Elasticsearch in a single data center or across multiple geographical locations, CCR can be a game-changer for your Elasticsearch infrastructure. So, get started with us!
Steps To To Setting Up CCR Elasticsearch Replication
Implementing Cross-Cluster Replication (CCR) in Elasticsearch involves the following key steps:
Step 1: Connecting to a Remote Cluster
To initiate Cross-Cluster Replication (CCR) in Elasticsearch, the first crucial step is connecting to a remote cluster. This allows you to replicate data from a remote Elasticsearch cluster (Cluster A) to your local cluster (Cluster B). By configuring Cluster A as a remote cluster in Cluster B, you establish the foundation for data replication.
Here’s how you can configure a remote cluster in Elasticsearch:
Method 1: Using Stack Management in Kibana
- Open Kibana and navigate to Stack Management from the side navigation panel.
- Select “Remote Clusters” to access the remote cluster configuration settings.
- Specify the endpoint URL or the IP address/host name of the remote cluster (Cluster A), followed by the transport port (typically 9300). For example, you can enter “cluster.es.eastus2.staging.azure.foundit.no:9400” or “192.168.1.1:9300” as the remote cluster’s connection details.
Method 2: Using the Cluster Update Settings API
- Alternatively, you can use the Elasticsearch Cluster Update Settings API to add a remote cluster programmatically. Here’s an example:
PUT /_cluster/settings
{
"persistent" : {
"cluster" : {
"remote" : {
"leader" : {
"seeds" : [
"127.0.0.1:9300"
]
}
}
}
}
}
- In this API request, you specify the hostname and transport port of a seed node in the remote cluster.
- To verify that your local cluster is successfully connected to the remote cluster, you can issue the following API request:
GET /_remote/info
The API response will confirm the connection status and provide additional details, such as the number of nodes in the remote cluster that your local cluster is connected to.
By completing this step, you establish the essential link between your local and remote clusters, paving the way for data replication from Cluster A to Cluster B.
Step 2: Configuring Privileges for Cross-Cluster Replication
To facilitate Cross-Cluster Replication (CCR), specific privileges must be configured for the cross-cluster replication user on both the remote and local clusters. This ensures that the replication process operates smoothly.
Remote Cluster
On the remote cluster, which houses the leader index, the cross-cluster replication role necessitates the following privileges:
- read_ccr Cluster Privilege: Required for cross-cluster replication.
- Monitor and Read Privileges on the leader index.
- If requests are authenticated using an API key, these privileges must be granted on the local cluster instead of the remote one. If requests are made on behalf of other users, the authenticating user must possess the run_as privilege on the remote cluster.
- To set up these privileges on the remote cluster, use the following request as an example:
POST /_security/role/remote-replication
{
"cluster": [
"read_ccr"
],
"indices": [
{
"names": [
"leader-index-name"
],
"privileges": [
"monitor",
"read"
]
}
]
}
- This request creates a ” remote-replication ” role on the remote cluster with the necessary privileges for cross-cluster replication.
By correctly configuring these privileges on both the remote and local clusters, you ensure that the cross-cluster replication user can efficiently perform its replication tasks, contributing to the seamless data flow between clusters.
Local Cluster Configuration
On the local cluster, which contains the follower index, the remote replication role should include the following privileges:
- manage_ccr cluster privilege.
- Monitor, read, write, and manage_follow_index privileges on the follower index.
- To create the necessary role on the local cluster, execute the following request:
POST /_security/role/remote-replication
{
"cluster": [
"manage_ccr"
],
"indices": [
{
"names": [
"follower-index-name"
],
"privileges": [
"monitor",
"read",
"write",
"manage_follow_index"
]
}
]
}
- Once you’ve defined the remote-replication role on both clusters, create a user on the local cluster and assign the remote-replication role to this user. Here’s an example:
POST /_security/user/cross-cluster-user
{
"password" : "l0ng-r4nd0m-p@ssw0rd",
"roles" : [ "remote-replication" ]
}
- Please note that you only need to create this user on the local cluster.
- Configuring these privileges ensures that your clusters can communicate securely and perform cross-cluster replication tasks effectively.
Step 3: Creating a Follower Index for Specific Replication
To replicate a specific index from a remote cluster (Cluster A) to your local cluster, you must create a follower index. This process involves referencing the remote cluster and the leader index within that remote cluster.
Using Kibana
- Open Kibana and navigate to Cross-Cluster Replication in the side navigation.
- Select the “Follower Indices” tab.
- Choose the cluster (Cluster A) that contains the leader index you want to replicate.
- Enter the name of the leader index. In this tutorial, it’s “kibana_sample_data_ecommerce.”
- Provide a name for your follower index, such as “follower-kibana-sample-data.”
Elasticsearch will initialize the follower index using the remote recovery process. This process transfers existing Lucene segment files from the leader index to the follower index. Initially, the index status will be “Paused.” Once the remote recovery process completes, the index status changes to “Active.”
Whenever you index documents into your leader index, Elasticsearch will automatically replicate those documents into the follower index.
Using the Create Follower API
Alternatively, you can use the create follower API to create follower indices programmatically. When initiating the follower request, you must reference the remote cluster and the leader index from the remote cluster.
Example API request:
PUT /server-metrics-follower/_ccr/follow?wait_for_active_shards=1
{
"remote_cluster" : "leader",
"leader_index" : "server-metrics"
}
Please note that when you initiate the API follower request, the response returns before the remote recovery process completes. To wait for the process to finish, you can add the wait_for_active_shards parameter to your request.
You can use the get follower stats API to inspect the replication status.
Creating a follower index establishes a mechanism for replicating specific data from the leader index in the remote cluster to the follower index in your local cluster, ensuring data synchronization.
Step 4: Creating an Auto-Follow Pattern for Time Series Indices Replication
Auto-follow patterns streamline the replication of rolling time-series indices by automatically creating follower indices for new ones. When a new index with a matching name is created in the remote cluster, an equivalent follower index is generated in the local cluster. Only indices created on the remote cluster after creating the auto-follow pattern will be auto-followed; existing indices on the remote cluster are not auto-followed, even if they match the pattern.
To set up an auto-follow pattern using Kibana:
- Access Kibana and navigate to Cross Cluster Replication in the side navigation.
- Select the “Auto-follow patterns” tab.
- Provide a name for your auto-follow pattern, for instance, “beats.”
- Choose the remote cluster (Cluster A) that contains the index you want to replicate.
- Define one or more index patterns that identify the indices you want to replicate from the remote cluster. For example, you can enter “metricbeat-” and “packetbeat-” to automatically create followers for Metricbeat and Packetbeat indices.
To make it easier to identify replicated indices, add “follower-” as a prefix to the names of the follower indices.
With this setup, whenever new indices that match the specified patterns are created on the remote cluster, Elasticsearch automatically initiates their replication to local follower indices.
Using the create auto-follow pattern API:
You can also configure auto-follow patterns programmatically using the create auto-follow pattern API. Here’s an example API request:
PUT /_ccr/auto_follow/beats
{
"remote_cluster" : "leader",
"leader_index_patterns" :
[
"metricbeat-*",
"packetbeat-*"
],
"follow_index_pattern" : "{{leader_index}}-copy"
}
In this API request, you specify the remote cluster, define leader index patterns (e.g., “metricbeat-” and “packetbeat-“), and set a follow index pattern that derives the follower index name by adding the suffix “-copy” to the leader index’s name.
This automated setup ensures that new indices matching your specified patterns on the remote cluster are seamlessly replicated to local follower indices, simplifying the management of time series data replication.
Conclusion
Cross-cluster replication (CCR) for Elasticsearch bolsters data reliability and availability. This guide has given you the essential knowledge to establish CCR, enhancing data management and disaster recovery capabilities.
For organizations like TRIOTECH SYSTEMS, which specialize in data-intensive solutions, implementing CCR becomes even more critical. CCR ensures that TRIOTECH SYSTEMS can maintain data resilience, making their Elasticsearch clusters robust and adaptable to meet the demands of their data-driven applications.
FAQs
What Is The Primary Use Case For Cross-Cluster Replication (CCR) In Elasticsearch?
CCR is primarily used for creating a replica of data from one Elasticsearch cluster to another. The most common use cases include data redundancy, disaster recovery, and data distribution across different geographical locations.
Is CCR Suitable For Real-Time Data Replication, Or Is There A Delay In Synchronization?
CCR provides near-real-time replication capabilities. While there might be a slight delay in data synchronization due to network latency and other factors, CCR aims to keep the data in the destination cluster as up-to-date as possible.
Are There Any Limitations Or Considerations When Using CCR With Highly Dynamic Data?
When dealing with rapidly changing data, it’s essential to consider the potential impact on resource utilization, especially if you have a high volume of data changes. Monitoring and optimizing resource allocation become crucial in such scenarios.
Can CCR Be Used To Replicate Data Between Clusters With Different Elasticsearch Versions?
In most cases, CCR requires that the Elasticsearch clusters have the same major version for compatibility. However, Elasticsearch continually evolves, so it’s essential to consult the official documentation for version-specific compatibility details.
How Does CCR Handle Conflicts Between Clusters During Replication?
CCR employs conflict resolution strategies to manage conflicts between clusters. Elasticsearch typically uses the “last write wins” strategy, where the most recent update to a document takes precedence. However, you can configure custom conflict resolution logic based on your needs.