Summary: This article explains how to use the Reindex API to migrate data from Elasticsearch to OpenSearch, which is particularly useful when snapshot compatibility is an issue. It outlines the pros and cons of this method, including its versatility in migrating from older Elasticsearch versions and ability to transform data during migration, but notes that it’s slower than other approaches and doesn’t automatically copy index settings and mappings. The article provides examples of how to use the Reindex API for basic migration, selecting specific documents to reindex, and manipulating data during the reindexing process.
In the
last article, we discussed how you can move your data from Elasticsearch to OpenSearch by using the Snapshot/Restore APIs. This is a fast, secure and efficient upgrade path. However, it requires that the snapshot you took with Elasticsearch is compatible with the OpenSearch version you are migrating to and this is not always the case. As a fallback, you can always use the Elasticsearch/OpenSearch Reindex API to move your data.
Pros and Cons
On the plus side, you can use the Reindex API to move data from OpenSearch from almost any Elasticsearch version. The Reindex API, as opposed to Snapshot/Restore, does not rely on a specific binary format.
Instead, reindexing will read all your data from cluster A, and use the REST APIs to re-index it on cluster B.
Pro:
You can migrate data from older Elasticsearch versions (5.x, 6.x) directly to OpenSearch
You can provide a query to filter out only the documents you want to migrate
You can transform documents during the migration
Con:
Reindex is slow compared to other approaches. The Reindex operation will read all documents from your source index, transform them into JSON, and send to the target index.
Settings and Mappings for the target index will not be copied automatically, you have to set it up manually
Reindex API
The Reindex API
“[c]opies documents from a source to a destination. The source can be any existing index, alias, or data stream. The destination must differ from the source. […] Reindex does not copy the settings from the source or its associated template.”
While this might seem not very useful for data migration between two clusters in the first place, the trick here is that the source for the reindexing operation can also be a remote cluster. This means you can leverage the full power of the Reindex API to transfer data between two clusters.
The reindex will not copy any mappings or settings. So before starting the migration you should set up an empty destination index on OpenSearch with the same mappings and settings as your source index.
In case the remote cluster is protected by a security solution like X-Pack Security, Search Guard or ReadOnlyRest, you can also specify HTTP Basic Authentication credentials.
Assume you have your Elasticsearch cluster running on elasticsearch.example.com and want to move data in mysourceindex to an index called mydestinationindex, running on opensearch.example.com. The following command will achieve just that:
copyPOST _reindex
{
"source":{
"remote":{
"host":"elasticsearch.example.com:9200",
"username":"...",
"password":"..."
},
"index": "mysourceindex"
},
"dest":{
"index":"mydestinationindex"
}
}`
You can also specify more than one index. In this case, all documents from all indices will be reindexed to the destination index.
copy "source":{
"index":[ "mysourceindex1", "mysourceindex2"]
}
Data Transformation
Selecting Documents to Reindex
On the source index, you can also specify a query to select the documents you want to reindex:
copyPOST _reindex
{
"source":{
"query":{ "match": { "position": "developer"} },
"index": "mysourceindex"
},
"dest":{
"index":"mydestinationindex"
}
}`
In the example above, only documents where the position field has a value developer will be reindexed/migrated.
Manipulating Data During Redindexing
If you do not want to migrate your data 1:1, there are ways to change it during the reindexing process.
You can add a script to the reindex command which will then be executed on all documents. The default scripting language is
Painless.
copyPOST _reindex
{
"source":{
"index":"mysourceindex"
},
"dest":{
"index":"mydestinationindex"
},
"script":{
"lang":"painless",
"source":"ctx.version++"
}
}
This script would increment the version field for each document by 1.
You can also set up an ingest pipeline on the OpenSearch (destination) cluster to change documents. You can then define what pipeline to use, like:
copyPOST _reindex
{
"source": {
"index": "mysourceindex",
},
"dest": {
"index": "mydestinationindex",
"pipeline": "mypipeline"
}
}
That’s it for this little tutorial on how to use the Reindex API to migrate data from Elasticsearch to OpenSearch. Make sure you also check out the other articles in this series: