Step-By-step Guide On Elasticsearch Snapshot And Restore

Elasticsearch Snapshot and Restore

Observability Platform | May 08, 2023

5 MIN READ

Table of Contents

Introduction

Elasticsearch is a powerful, distributed search and analytics engine designed for handling large amounts of data quickly and efficiently. One of its essential features is the ability to create and restore snapshots, which are incremental backups of the cluster data. In this article, we will explore the concept of Elasticsearch snapshot and restore, their primary use cases, the various repository types supported, and the step-by-step process for creating, restoring, and managing snapshots. By understanding these concepts and processes, you can ensure that your Elasticsearch data is secure, recoverable, and easily transferable between clusters, ultimately enhancing your data management strategy.

What is Snapshot?

A snapshot is a backup taken from a running Elasticsearch cluster, there are various reasons for taking data backups. One of the main reasons is to protect the primary data against any unforeseen damage as a result of system hardware/software failure.

Snapshots have two main uses:

Recovering from failure
For example, if cluster health goes red, you might restore the red indices from a snapshot.
Migrating from one cluster to another
For example, if you’re moving from a proof-of-concept to a production cluster, you might take a snapshot of the former and restore it to the latter.

About snapshot and restore

A snapshot of a cluster contains the cluster state, all regular data streams, and all regular indices. Elasticsearch snapshots are incremental, meaning that they only store data that has changed since the last successful snapshot. The difference in disk usage between frequent and infrequent snapshots is often minimal.

To restore a snapshot of an index, the index must be compatible with the Elasticsearch version you are restoring to. Elasticsearch can read indices created in the current or previous major version. Compatibility is based on the version in which the index was created, not the version from which the snapshot was taken.

Snapshots can contain indices created in more than one version of Elasticsearch. If you attempt to restore a snapshot with incompatible indices, the restore will fail. When backing up your data before an upgrade, keep in mind that to restore the snapshot to the upgraded cluster, all indices in the snapshot must be compatible with the upgrade version.

Repositories supported by Elasticsearch

All the snapshots in Elasticsearch are organized into a container, which is called a repository. Elasticsearch supports various repositories which are used to store the snapshots. In a single repository, you can store one or more snapshots. You can create any number of repositories in Elasticsearch in which you can choose one of them to save your data.

There are several repositories supported by Elasticsearch as follows:

Azure Cloud
HPFS for Hadoop
AWS (Store backups on S3)
NFS on Linux
Directory on Single Node Cluster
Windows shares using Microsoft UNC path

How to create snapshots?

To create the snapshots we need to do the following:

We first need to identify the directory location where we want to store the snapshot files. Ex: “/tmp/backups”
We need to provide directory access to Elasticsearch users so that Elasticsearch can write the snapshot files.
We need to tell Elasticsearch that this is our snapshot directory location. For that, we need to add the “path. repo” setting in the elasticsearch.yml file.
path.repo: [“/tmp/backups”]
Create the repository which would be used for taking a snapshot and restoring. We can create the repository using the following expression:

We probably only need to specify the location, but the following settings are also supported

Setting	Description
location	The shared file system for snapshots. Required.
chunk_size	Breaks large files into chunks during snapshot operations (e.g. 64mb, 1gb), which is important for cloud storage providers and far less important for shared file systems. Default is null (unlimited). Optional.
compress	Whether to compress metadata files. This setting does not affect data files, which might already be compressed, depending on your index settings. Default is false. Optional.
max_restore_bytes_per_sec	The maximum rate at which snapshots restore. Default is 40 MB per second (40m). Optional.
max_snapshot_bytes_per_sec	The maximum rate at which snapshots take. Default is 40 MB per second (40m). Optional.
readonly	Whether the repository is read-only. Useful when migrating from one cluster (“readonly”: false when registering) to another cluster (“readonly”: true when registering). Optional.

After creating the repository we can take a snapshot of all indices using the following expression
If we want to take a snapshot of one or more indexes only then we can specify the index name in a comma-separated form, please refer to the below expression
Once a snapshot is created information about this snapshot can be obtained using the following command

This command returns basic information about the snapshot including start and end time, the version of Elasticsearch that created the snapshot, the list of included indices, the current state of the snapshot and the list of failures that occurred during the snapshot. The snapshot state can be IN_PROGRESS, SUCCESS, FAILED, PARTIAL, or INCOMPATIBLE.

Restore the snapshot by appending the _restore endpoint after the snapshot name. If you’re restoring data to a pre-existing cluster, use Delete and restore, Rename on restore methods to avoid conflicts with existing indices and data streams.
A snapshot can be deleted from the repository using the following command

When a snapshot is deleted from a repository, Elasticsearch deletes all files that are associated with the deleted snapshot and not used by any other snapshots. If the deleted snapshot operation is executed while the snapshot is being created the snapshotting process will be aborted and all files created as part of the snapshotting process will be cleaned. Therefore, the delete snapshot operation can be used to cancel long-running snapshot operations that were started by mistake.

Points of consideration:

Each snapshot repository is separate and independent. Elasticsearch doesn’t share data between repositories.
Clusters should only register a particular snapshot repository bucket once. If you register the same snapshot repository with multiple clusters, only one cluster should have write access to the repository. On other clusters, register the repository as read-only.
When Snapshot is in progress, you can still index documents and make other requests to the cluster, but new documents (and updates and deletes to existing documents) are not included.
The snapshots include only primary shards.

Conclusion

In this article, I talked about Elasticsearch snapshots and restores, highlighting their importance in data management strategies. We covered the primary use cases for snapshots, such as recovering from failures and migrating data between clusters, and the various supported repositories, including Azure Cloud, Hadoop, AWS, NFS on Linux, and more.

Hope you would find these step-by-step instructions for creating, restoring, and managing snapshots, along with essential points to consider to be useful.

For more information or assistance to implement Elasticsearch in your organization, contact us here. Our team of experts is ready to help you enhance your data management strategy and ensure the security and recoverability of your valuable data. Reach out for subscriptions, services, or solutions on Elastic today!

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Talking Open Source Podcast: Demystifying AI For Enterprise - Part 1 Watch Now!

Revolutionize Your CX with
Unified Observability

CloudOps Automation tool for Infrastructure monitoring and deployment.

Indonesia’s top digital credit service provider leverages Ashnik’s PostgreSQL expertise and services

Revolutionize Your CX with Unified Observability

Automate and monitor your PostgreSQL with ease.

The CloudOps Automation Tool for easy Infrastructure deployment and monitoring

Maximize Potential of Your Data with Streaming Data Pipeline Architecture

End-to-End Traceability and Unified Observability for the Modern Infrastructure

Watch: How to auto-scale in deployments using Kubernetes(K8s): A Technical Demo

Elasticsearch Snapshot and Restore

Introduction

What is Snapshot?

About snapshot and restore

Repositories supported by Elasticsearch

How to create snapshots?

Conclusion

Read More

PostgreSQL servers: Back up & disaster recovery with BARMAN

Why Equality Sort Range Indexes are powerful with MongoDB Queries?

ElasticSearch Cluster Setup: 10 Best Practices Tips

Products