OpenSearch

2022-06-15

Elasticsearch to OpenSearch Migration: Reindex API

This article describes how to use the Reindexing API to migrate data from an Elasticsearch to an OpenSearch cluster.

Reading time: 3 minutes
By eliatra
In the last article, we discussed how you can move your data from Elasticsearch to OpenSearch by using the Snapshot/Restore APIs. This is a fast, secure and efficient upgrade path. However, it requires that the snapshot you took with Elasticsearch is compatible with the OpenSearch version you are migrating to and this is not always the case. As a fallback, you can always use the Elasticsearch/OpenSearch Reindex API to move your data.

Pros and Cons

On the plus side, you can use the Reindex API to move data from OpenSearch from almost any Elasticsearch version. The Reindex API, as opposed to Snapshot/Restore, does not rely on a specific binary format.
Instead, reindexing will read all your data from cluster A, and use the REST APIs to re-index it on cluster B.
Pro:
    You can migrate data from older Elasticsearch versions (5.x, 6.x) directly to OpenSearch
    You can provide a query to filter out only the documents you want to migrate
    You can transform documents during the migration
Con:
    Reindex is slow compared to other approaches. The Reindex operation will read all documents from your source index, transform them into JSON, and send to the target index.
    Settings and Mappings for the target index will not be copied automatically, you have to set it up manually

Reindex API

The Reindex API
“[c]opies documents from a source to a destination. The source can be any existing index, alias, or data stream. The destination must differ from the source. […] Reindex does not copy the settings from the source or its associated template.”
(Elasticsearch documentation)
While this might seem not very useful for data migration between two clusters in the first place, the trick here is that the source for the reindexing operation can also be a remote cluster. This means you can leverage the full power of the Reindex API to transfer data between two clusters.
The reindex will not copy any mappings or settings. So before starting the migration you should set up an empty destination index on OpenSearch with the same mappings and settings as your source index.
In case the remote cluster is protected by a security solution like X-Pack Security, Search Guard or ReadOnlyRest, you can also specify HTTP Basic Authentication credentials.
Assume you have your Elasticsearch cluster running on elasticsearch.example.com and want to move data in mysourceindex to an index called mydestinationindex, running on opensearch.example.com. The following command will achieve just that:
copy
POST _reindex
{
   "source":{
      "remote":{
         "host":"elasticsearch.example.com:9200",
         "username":"...",
         "password":"..."
      },
      "index": "mysourceindex"
   },
   "dest":{
      "index":"mydestinationindex"
   }
}`
You can also specify more than one index. In this case, all documents from all indices will be reindexed to the destination index.
copy
  "source":{
    "index":[ "mysourceindex1", "mysourceindex2"]
   }

Data Transformation

Selecting Documents to Reindex

On the source index, you can also specify a query to select the documents you want to reindex:
copy
POST _reindex
{
   "source":{
      "query":{ "match": { "position": "developer"} },
      "index": "mysourceindex"
   },
   "dest":{
      "index":"mydestinationindex"
   }
}`
In the example above, only documents where the position field has a value developer will be reindexed/migrated.

Manipulating Data During Redindexing

If you do not want to migrate your data 1:1, there are ways to change it during the reindexing process.
You can add a script to the reindex command which will then be executed on all documents. The default scripting language is Painless.
copy
POST _reindex
{
   "source":{
      "index":"mysourceindex"
   },
   "dest":{
      "index":"mydestinationindex"
   },
   "script":{
      "lang":"painless",
      "source":"ctx.version++"
   }
}
This script would increment the version field for each document by 1.
You can also set up an ingest pipeline on the OpenSearch (destination) cluster to change documents. You can then define what pipeline to use, like:
copy
POST _reindex
{
  "source": {
    "index": "mysourceindex",
  },
  "dest": {
    "index": "mydestinationindex",
    "pipeline": "mypipeline"
  }
}
That’s it for this little tutorial on how to use the Reindex API to migrate data from Elasticsearch to OpenSearch. Make sure you also check out the other articles in this series:
Ready to get started?!
Let's work together to navigate your OpenSearch journey. Send us a message and talk to the team today!
Get in touch