2024-11-06

Implementing Vector and Hybrid Search with OpenSearch and the Neural Plugin - Part 3

In this article, we will ingest test data to create Vector Embeddings with OpenSearch and then use lexical, semantic and hybrid searches to query our data.

Reading time: 3 minutes

                        
                            By Lucas Jeanniot
                        
                            Machine Learning Engineer

Implementing Search Functionality

Keyword Search

To demonstrate the versatility of our setup, let’s begin with a traditional keyword search. This type of search relies on text matching and serves as a baseline for comparison with more advanced search methods:

copy

GET /my-nlp-index/_search
{
  "_source": {
    "excludes": [
      "passage_embedding"
    ]
  },
  "query": {
    "match": {
      "text": {
        "query": "wild west"
      }
    }
  }
}

This query performs several operations:

It searches for documents containing the terms “wild” and “west” in the “text” field.
It excludes the “passage_embedding” field from the results to improve readability and reduce response size.
It utilizes OpenSearch’s standard text analysis and matching capabilities.

While this method is efficient for exact and partial matches, it may not capture semantic similarities or contextual nuances in the way that vector search can.

Neural Search

Building upon our keyword search, we now implement a neural search that leverages our vector embeddings. This method allows for semantic similarity matching, potentially surfacing relevant results that keyword search might miss:

copy

GET /my-nlp-index/_search
{
  "_source": {
    "excludes": [
      "passage_embedding"
    ]
  },
  "query": {
    "neural": {
      "passage_embedding": {
        "query_text": "wild west",
        "model_id": "aVeif4oB5Vm0Tdw8zYO2",
        "k": 5
      }
    }
  }
}

This neural search query operates as follows:

It uses our registered model to generate an embedding for the query text “wild west”.
It then searches for the k-nearest neighbors (in this case, 5) to this query embedding in our vector space.
The results are ordered by their cosine similarity to the query embedding.
As with the keyword search, we exclude the actual embeddings from the response for clarity.

This method excels at finding semantically similar content, even when the exact keywords may not match.

Hybrid Search

For our final and most sophisticated search implementation, we combine the strengths of both keyword and neural search in a hybrid approach. This method requires two steps: setting up a search pipeline for result combination, and then executing the hybrid search query.

First, let’s create the search pipeline:

copy

PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.3,
              0.7
            ]
          }
        }
      }
    }
  ]
}

This pipeline configuration:

Normalizes the scores from both keyword and neural searches to a common scale.
Combines these normalized scores using a weighted arithmetic mean, with neural search given higher importance (0.7) compared to keyword search (0.3).

Now, let’s execute our hybrid search query:

copy

GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
  "_source": {
    "exclude": [
      "passage_embedding"
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text": {
              "query": "cowboy rodeo bronco"
            }
          }
        },
        {
          "neural": {
            "passage_embedding": {
              "query_text": "wild west",
              "model_id": "aVeif4oB5Vm0Tdw8zYO2",
              "k": 5
            }
          }
        }
      ]
    }
  }
}

This hybrid query combines multiple search paradigms:

It performs a keyword match for “cowboy rodeo bronco”.
Simultaneously, it conducts a neural search for “wild west”.
The results from both searches are then processed by our custom pipeline, which normalizes and combines the scores.
The final result set represents a balance between exact keyword matches and semantic similarity.

By implementing this hybrid approach, we leverage the strengths of both traditional and neural search methods, potentially yielding more comprehensive and relevant search results.

In conclusion, this comprehensive setup enables a sophisticated search capability that can handle both precise keyword matching and semantic similarity matching, providing a powerful tool for information retrieval across a wide range of use cases.

Articles in this Series

Implementing Vector and Hybrid Search with OpenSearch and the Neural Plugin - Part 1
Implementing Vector and Hybrid Search with OpenSearch and the Neural Plugin - Part 2
Implementing Vector and Hybrid Search with OpenSearch and the Neural Plugin - Part 3 (This article)

Implementing Vector and Hybrid Search with OpenSearch and the Neural Plugin - Part 3

In this article, we will ingest test data to create Vector Embeddings with OpenSearch and then use lexical, semantic and hybrid searches to query our data.

Implementing Search Functionality

Keyword Search

Neural Search

Hybrid Search

Articles in this Series

You may also like