Managing Elasticsearch and OpenSearch Snapshots with S3 Repositories
June 23, 2025 • 4 min read
Whether you’re using Elasticsearch or OpenSearch, managing snapshots and repositories is crucial for data backup and recovery. This guide walks you through setting up S3 repositories, creating snapshots, and restoring data.
Register an S3 Repository with Your Cluster
Your S3 repository (bucket) can be empty when you start, or it might already contain snapshots from another cluster that you want to restore.
Prerequisites: Setting Up IAM Policy and Role
Before you can connect your cluster to S3, you need the right permissions in place.
1. Create a new IAM Policy
First, create a policy with these permissions. In the example below, we’re assuming the S3 bucket name is es-s3-repository
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [ "s3:ListBucket" ],
"Effect": "Allow",
"Resource": [ "arn:aws:s3:::es-s3-repository" ]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"iam:PassRole"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::es-s3-repository/*"
]
}
]
}
This policy allows your cluster to list the bucket contents and read, write, or delete objects within it.
2. Create an IAM Role
Run this command to create a new IAM role that your cluster can assume:
aws iam create-role \
--role-name es-s3-repository \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Sid": "",
"Effect": "Allow",
"Principal": {"Service": "es.amazonaws.com"},
"Action": "sts:AssumeRole"
}]
}'
Note: For OpenSearch, you might need to use opensearch.amazonaws.com
as the service principal instead of es.amazonaws.com
, depending on your setup.
3. Attach the Policy to the Role
Use the AWS Console to attach the policy you created in step 1 to the role from step 2.
4. Create Your S3 Bucket
Create an S3 bucket that will serve as your snapshot repository. Make sure to note the bucket name as you’ll need it in the next steps.
Register the Repository with Your Cluster
Once your IAM setup is complete, you can register the repository with your cluster.
# If your cluster is in a VPC or allows unsigned requests (not recommended for production)
curl -X PUT "https://{your-cluster-endpoint}/_snapshot/{your-repository-name}" \
-H "Content-Type: application/json" \
-d '{
"type": "s3",
"settings": {
"bucket": "{your-bucket-name}",
"endpoint": "s3.amazonaws.com",
"role_arn": "arn:aws:iam::{your-aws-account}:role/es-s3-repository"
}
}'
Replace the placeholders with your actual values. The endpoint stays the same unless you’re using a specific S3 region endpoint.
Creating Manual Snapshots
Taking snapshots is straightforward. By default, a snapshot includes all indexes in your cluster.
Create a snapshot:
curl -s -XPUT '{your-cluster-endpoint}/_snapshot/{repository-name}/{snapshot-name}'
Check snapshot status:
curl -XGET '{your-cluster-endpoint}/_snapshot/{repository-name}/_status'
The snapshot process runs in the background. Depending on your data size, it might take some time to complete.
Restoring Data from Snapshots
Before restoring data, make sure the index you want to restore doesn’t already exist in your cluster. If it does, you’ll need to delete it first or restore with a different name.
Find Your Snapshots
List all repositories:
GET /_snapshot/_all?pretty
List snapshots in a repository:
GET /{your-cluster-endpoint}/_snapshot/{repository-name}/_all?pretty
Restore a Specific Index
Once you know which snapshot contains the index you need, restore it:
curl -s -XPOST '{your-cluster-endpoint}/_snapshot/{repository-name}/{snapshot-name}/_restore' \
-H "Content-Type: application/json" \
-d '{
"indices": "{index-name}",
"ignore_unavailable": false,
"include_global_state": false
}'
Setting include_global_state
to false
prevents restoring cluster-wide settings, which is usually what you want when restoring individual indexes.
Merging Indexes Manually
Sometimes you need to combine data from two indexes, adding only the missing documents to avoid duplicates.
Reindex Operation
The reindex API lets you copy documents from one index to another. Using op_type: create
ensures only new documents are added:
curl -s -XPOST 'https://{your-cluster-endpoint}/_reindex?wait_for_completion=false' \
-H "Content-Type: application/json" \
-d '{
"conflicts": "proceed",
"source": {
"index": "{source-index-name}"
},
"dest": {
"index": "{destination-index-name}",
"op_type": "create"
}
}'
This operation returns immediately but continues running in the background. Large reindex operations can take hours to complete.
Monitor the Reindex Process
Check if your reindex is still running:
GET /_tasks?actions=*reindex
Verify the Results
Before cleaning up, make sure the merge worked correctly:
- Compare document counts between the source and destination indexes
- Check index sizes to ensure they make sense
- Query specific documents to verify they copied correctly:
GET /{index-name}/_doc/{document-id}
Clean Up
Once you’re confident the merge worked, you can delete the index you no longer need:
DELETE /{index-name}
Important Notes
- Always test these operations in a non-production environment first
- Large operations (snapshots, restores, reindexing) can impact cluster performance
- Make sure you have enough disk space before starting restore operations
- Keep track of your snapshot names and dates for easier management
- Both Elasticsearch and OpenSearch support these operations, but always check the documentation for your specific version as some details might vary
This process works the same way for both Elasticsearch and OpenSearch clusters, making it easy to move data between different deployments or create reliable backup strategies.