Elasticsearch snapshot and restore

Elasticsearch Snapshot and Restore

Written by Arun Kumar

| May 08, 2023

5 MIN READ

Introduction

Elasticsearch is a powerful, distributed search and analytics engine designed for handling large amounts of data quickly and efficiently. One of its essential features is the ability to create and restore snapshots, which are incremental backups of the cluster data. In this article, we will explore the concept of Elasticsearch snapshot and restore, their primary use cases, the various repository types supported, and the step-by-step process for creating, restoring, and managing snapshots. By understanding these concepts and processes, you can ensure that your Elasticsearch data is secure, recoverable, and easily transferable between clusters, ultimately enhancing your data management strategy.

What is Snapshot?

A snapshot is a backup taken from a running Elasticsearch cluster, there are various reasons for taking data backups. One of the main reasons is to protect the primary data against any unforeseen damage as a result of system hardware/software failure.

Snapshots have two main uses:

  1. Recovering from failure
    For example, if cluster health goes red, you might restore the red indices from a snapshot.
  2. Migrating from one cluster to another
    For example, if you’re moving from a proof-of-concept to a production cluster, you might take a snapshot of the former and restore it to the latter.

About snapshot and restore

A snapshot of a cluster contains the cluster state, all regular data streams, and all regular indices. Elasticsearch snapshots are incremental, meaning that they only store data that has changed since the last successful snapshot. The difference in disk usage between frequent and infrequent snapshots is often minimal.

To restore a snapshot of an index, the index must be compatible with the Elasticsearch version you are restoring to. Elasticsearch can read indices created in the current or previous major version. Compatibility is based on the version in which the index was created, not the version from which the snapshot was taken.

Snapshots can contain indices created in more than one version of Elasticsearch. If you attempt to restore a snapshot with incompatible indices, the restore will fail. When backing up your data before an upgrade, keep in mind that to restore the snapshot to the upgraded cluster, all indices in the snapshot must be compatible with the upgrade version.

Repositories supported by Elasticsearch

All the snapshots in Elasticsearch are organized into a container, which is called a repository. Elasticsearch supports various repositories which are used to store the snapshots. In a single repository, you can store one or more snapshots. You can create any number of repositories in Elasticsearch in which you can choose one of them to save your data.

There are several repositories supported by Elasticsearch as follows:

  • Azure Cloud
  • HPFS for Hadoop
  • AWS (Store backups on S3)
  • NFS on Linux
  • Directory on Single Node Cluster
  • Windows shares using Microsoft UNC path

How to create snapshots?

To create the snapshots we need to do the following:

  • We first need to identify the directory location where we want to store the snapshot files. Ex: “/tmp/backups”
  • We need to provide directory access to Elasticsearch users so that Elasticsearch can write the snapshot files.
  • We need to tell Elasticsearch that this is our snapshot directory location. For that, we need to add the “path. repo” setting in the elasticsearch.yml file.
    path.repo: [“/tmp/backups”]
  • Create the repository which would be used for taking a snapshot and restoring. We can create the repository using the following expression:
    elasticsearch 1
  • We probably only need to specify the location, but the following settings are also supported
  • After creating the repository we can take a snapshot of all indices using the following expression
    elasticsearch 2
  • If we want to take a snapshot of one or more indexes only then we can specify the index name in a comma-separated form, please refer to the below expression
    elasticsearch 3
  • Once a snapshot is created information about this snapshot can be obtained using the following command
    elasticsearch 4

This command returns basic information about the snapshot including start and end time, the version of Elasticsearch that created the snapshot, the list of included indices, the current state of the snapshot and the list of failures that occurred during the snapshot. The snapshot state can be IN_PROGRESS, SUCCESS, FAILED, PARTIAL, or INCOMPATIBLE.

  • Restore the snapshot by appending the _restore endpoint after the snapshot name. If you’re restoring data to a pre-existing cluster, use Delete and restore, Rename on restore methods to avoid conflicts with existing indices and data streams.
    elasticsearch 5
  • A snapshot can be deleted from the repository using the following command
    elasticsearch 6

When a snapshot is deleted from a repository, Elasticsearch deletes all files that are associated with the deleted snapshot and not used by any other snapshots. If the deleted snapshot operation is executed while the snapshot is being created the snapshotting process will be aborted and all files created as part of the snapshotting process will be cleaned. Therefore, the delete snapshot operation can be used to cancel long-running snapshot operations that were started by mistake.

Points of consideration:

  • Each snapshot repository is separate and independent. Elasticsearch doesn’t share data between repositories.
  • Clusters should only register a particular snapshot repository bucket once. If you register the same snapshot repository with multiple clusters, only one cluster should have write access to the repository. On other clusters, register the repository as read-only.
  • When Snapshot is in progress, you can still index documents and make other requests to the cluster, but new documents (and updates and deletes to existing documents) are not included.
  • The snapshots include only primary shards.

Conclusion

In this article, I talked about Elasticsearch snapshots and restores, highlighting their importance in data management strategies. We covered the primary use cases for snapshots, such as recovering from failures and migrating data between clusters, and the various supported repositories, including Azure Cloud, Hadoop, AWS, NFS on Linux, and more.

Hope you would find these step-by-step instructions for creating, restoring, and managing snapshots, along with essential points to consider to be useful.

For more information or assistance to implement Elasticsearch in your organization, contact us here. Our team of experts is ready to help you enhance your data management strategy and ensure the security and recoverability of your valuable data. Reach out for subscriptions, services, or solutions on Elastic today!


Go to Top