Merging OpenSearch or Elasticsearch Indices: Adding Missing Documents

September 15, 2025 • 2 min read

When working with OpenSearch or Elasticsearch, you may need to merge indices while preserving existing documents in the destination index. This guide walks through the process of using the reindex API to copy only missing documents from a source index to a destination index, without overwriting existing data.

Note that while you can do most of the OpenSearch commands via curl, for this article I’m going to assume you have access to Dev Tools within your OpenSearch cluster. This just makes the examples a little cleaner instead of having to include things like auth and headers.

Steps for Merging Two Indices (Adding Missing Documents Only)

1. Start the Reindex Operation

Execute the reindex operation with op_type: create to ensure only missing documents are added:

# This returns immediately with a task ID (does not wait for completion)
# Replace: {source-index-name}, {destination-index-name}
POST /_reindex?wait_for_completion=false
{
  "conflicts": "proceed",
  "source": {
    "index": "{source-index-name}"
  },
  "dest": {
    "index": "{destination-index-name}",
    "op_type": "create"
  }
}

Note: The op_type: create parameter ensures that documents with existing IDs in the destination index are skipped, preventing overwrites.

2. Monitor the Reindex Progress

Check if the reindex operation is still running:

# List all active reindex tasks
GET /_tasks?actions=*reindex

# Or check a specific task using the task ID returned from step 1
GET /_tasks/{task_id}

3. Verify the Results

Once the reindex is complete, compare the indices to ensure the operation was successful:

# Compare document counts
GET /_cat/indices/{source-index-name},{destination-index-name}?v&h=index,docs.count,store.size

# Check specific documents to verify they were copied correctly
GET /{destination-index-name}/_doc/{document-id}

# Compare index statistics
GET /{source-index-name}/_stats
GET /{destination-index-name}/_stats

4. Clean Up (Optional)

If everything looks correct and you no longer need the source index:

# Delete the source index
DELETE /{source-index-name}

⚠️ Warning: Always verify your data thoroughly before deleting any index. Consider creating a snapshot backup before deletion.

Additional Considerations

  • Performance Impact: Reindexing can be resource-intensive. Consider using query parameters like size and scroll to control the batch size
  • Index Settings: The destination index should have appropriate mappings and settings configured before reindexing
  • Error Handling: The conflicts: proceed setting allows the operation to continue even if some documents fail to index
  • Pause Write: To prevent writes to the old index, you can temporarily block writes during the reindexing with the following
    PUT {source-index-name}/_settings
    {
      "index.blocks.write": true
    }
    
    # reenable writes
    PUT {source-index-name}/_settings
    {
      "index.blocks.write": null
    }
    

Resources: