Merging OpenSearch or Elasticsearch Indices: Adding Missing Documents
September 15, 2025 • 2 min read
When working with OpenSearch or Elasticsearch, you may need to merge indices while preserving existing documents in the destination index. This guide walks through the process of using the reindex API to copy only missing documents from a source index to a destination index, without overwriting existing data.
Note that while you can do most of the OpenSearch commands via curl, for this article I’m going to assume you have access to Dev Tools within your OpenSearch cluster. This just makes the examples a little cleaner instead of having to include things like auth and headers.
Steps for Merging Two Indices (Adding Missing Documents Only)
1. Start the Reindex Operation
Execute the reindex operation with op_type: create to ensure only missing documents are added:
# This returns immediately with a task ID (does not wait for completion)
# Replace: {source-index-name}, {destination-index-name}
POST /_reindex?wait_for_completion=false
{
"conflicts": "proceed",
"source": {
"index": "{source-index-name}"
},
"dest": {
"index": "{destination-index-name}",
"op_type": "create"
}
}
Note: The op_type: create parameter ensures that documents with existing IDs in the destination index are skipped, preventing overwrites.
2. Monitor the Reindex Progress
Check if the reindex operation is still running:
# List all active reindex tasks
GET /_tasks?actions=*reindex
# Or check a specific task using the task ID returned from step 1
GET /_tasks/{task_id}
3. Verify the Results
Once the reindex is complete, compare the indices to ensure the operation was successful:
# Compare document counts
GET /_cat/indices/{source-index-name},{destination-index-name}?v&h=index,docs.count,store.size
# Check specific documents to verify they were copied correctly
GET /{destination-index-name}/_doc/{document-id}
# Compare index statistics
GET /{source-index-name}/_stats
GET /{destination-index-name}/_stats
4. Clean Up (Optional)
If everything looks correct and you no longer need the source index:
# Delete the source index
DELETE /{source-index-name}
⚠️ Warning: Always verify your data thoroughly before deleting any index. Consider creating a snapshot backup before deletion.
Additional Considerations
- Performance Impact: Reindexing can be resource-intensive. Consider using query parameters like
sizeandscrollto control the batch size - Index Settings: The destination index should have appropriate mappings and settings configured before reindexing
- Error Handling: The
conflicts: proceedsetting allows the operation to continue even if some documents fail to index - Pause Write: To prevent writes to the old index, you can temporarily block writes during the reindexing with the following
PUT {source-index-name}/_settings { "index.blocks.write": true } # reenable writes PUT {source-index-name}/_settings { "index.blocks.write": null }