The easiest and fastest way to migrate data from Elasticsearch to OpenSearch is using the snapshot/restore APIs. A snapshot is a backup of your indices and can (optionally) also include the entire cluster state, like cluster settings, node settings, and metadata. While snapshots are commonly used for backup and recovery, you can also use them to migrate data from one cluster to another.
Migrating your data using snapshots is faster than using the
Reindex API, and much safer than using the shared data directory approach, which we will cover in the following articles.
It’s fast because you move your data using a compressed, binary data format. The
Reindex API, on the other hand ,will read each document on your Elasticsearch cluster and then send it to OpenSearch. OpenSearch then needs to index the document again. This can take a substantial amount of time for large volumes of data.
It’s safe because you take a backup from your Elasticsearch cluster and apply it to a independent running OpenSearch cluster. The data on your Elasticsearch cluster is not altered in any way. If something goes wrong, you can just repeat the complete process over again.
Compatibility
OpenSearch can read snapshots from Elasticsearch 6.0.0 up to Elasticsearch 7.11.2 directly.
If you use a version before 6.0.0, we recommend upgrading to an Elasticsearch version > 6.0.0 first. Ideally, this would be Elasticsearch 7.10.2 since this is the version OpenSearch is forked from.
Unfortunately, snapshots taken with Elasticsearch 7.12.0 and above are not compatible with OpenSearch. If you use a version above 7.12.0, we recommend using the Reindex or the shared data directory approach, which we will cover in the following articles.
Migrating your Data with Snapshots
In this article, we will use a snapshot repository location that is shared between Elasticsearch and OpenSearch via a mounted directory. We assume this directory to be accessible via mnt/repository on both clusters.
Of course, you can use any repository location supported by both
Elasticsearch and OpenSearch. This includes
AWS S3,
Google Cloud Storage,
Azure and many more.
Create the Snapshot Repositories
First, let’s create a snapshot repository on our Elasticsearch cluster, like:
copyPUT localhost:9200/_snapshot/my_repository
{
"type": "fs",
"settings": {
"location": "/mnt/repository",
"compress": true
}
This will create a repository that stores snapshots in the /mnt/repository directory on the local machine and additionally compresses the snapshot.
Given that we have access to the same mounted directory on our OpenSearch cluster, we can execute the same command again. Et voilà - both clusters now have access to the same snapshot repository.
If you cannot set up a shared mounted repository on your infrastructure, you can create local repositories and copy the snapshot data manually.
Take an Elasticsearch Snapshot and Restore it to OpenSearch
We can start the migration now that we have set up the repositories for both clusters.
First, we create a snapshot on the Elasticsearch cluster:
copyPUT _snapshot/my_repository/my_snapshot
{
"indices": "*",
"ignore_unavailable": true
}
Since Elasticsearch and OpenSearch share the same snapshot repository, we can import the data on our OpenSearch cluster like:
copyPOST _snapshot/my_repository/my_snapshot/_restore
{
"accepted": true
}
You can observe the status of the import by calling the recovery endpoint like:
copyGET cat/recovery?active_only=true
Other Methods
In the upcoming articles we will cover how to migrate your data via:
Reindex API (upcoming)
Shared data directory with Rolling Restart or Full Cluster Restart (upcoming)