AI

2024-11-06

Implementing Vector and Hybrid Search with OpenSearch and the Neural Plugin - Part 3

In this article, we will ingest test data to create Vector Embeddings with OpenSearch and then use lexical, semantic and hybrid searches to query our data.

Reading time: 3 minutes
Machine Learning Engineer

Implementing Search Functionality

Keyword Search

To demonstrate the versatility of our setup, let’s begin with a traditional keyword search. This type of search relies on text matching and serves as a baseline for comparison with more advanced search methods:
copy
GET /my-nlp-index/_search
{
  "_source": {
    "excludes": [
      "passage_embedding"
    ]
  },
  "query": {
    "match": {
      "text": {
        "query": "wild west"
      }
    }
  }
}
This query performs several operations:
  1. It searches for documents containing the terms “wild” and “west” in the “text” field.
  2. It excludes the “passage_embedding” field from the results to improve readability and reduce response size.
  3. It utilizes OpenSearch’s standard text analysis and matching capabilities.
While this method is efficient for exact and partial matches, it may not capture semantic similarities or contextual nuances in the way that vector search can.

Neural Search

Building upon our keyword search, we now implement a neural search that leverages our vector embeddings. This method allows for semantic similarity matching, potentially surfacing relevant results that keyword search might miss:
copy
GET /my-nlp-index/_search
{
  "_source": {
    "excludes": [
      "passage_embedding"
    ]
  },
  "query": {
    "neural": {
      "passage_embedding": {
        "query_text": "wild west",
        "model_id": "aVeif4oB5Vm0Tdw8zYO2",
        "k": 5
      }
    }
  }
}
This neural search query operates as follows:
  1. It uses our registered model to generate an embedding for the query text “wild west”.
  2. It then searches for the k-nearest neighbors (in this case, 5) to this query embedding in our vector space.
  3. The results are ordered by their cosine similarity to the query embedding.
  4. As with the keyword search, we exclude the actual embeddings from the response for clarity.
This method excels at finding semantically similar content, even when the exact keywords may not match.

Hybrid Search

For our final and most sophisticated search implementation, we combine the strengths of both keyword and neural search in a hybrid approach. This method requires two steps: setting up a search pipeline for result combination, and then executing the hybrid search query.
First, let’s create the search pipeline:
copy
PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.3,
              0.7
            ]
          }
        }
      }
    }
  ]
}
This pipeline configuration:
  1. Normalizes the scores from both keyword and neural searches to a common scale.
  2. Combines these normalized scores using a weighted arithmetic mean, with neural search given higher importance (0.7) compared to keyword search (0.3).
Now, let’s execute our hybrid search query:
copy
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
{
  "_source": {
    "exclude": [
      "passage_embedding"
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text": {
              "query": "cowboy rodeo bronco"
            }
          }
        },
        {
          "neural": {
            "passage_embedding": {
              "query_text": "wild west",
              "model_id": "aVeif4oB5Vm0Tdw8zYO2",
              "k": 5
            }
          }
        }
      ]
    }
  }
}
This hybrid query combines multiple search paradigms:
  1. It performs a keyword match for “cowboy rodeo bronco”.
  2. Simultaneously, it conducts a neural search for “wild west”.
  3. The results from both searches are then processed by our custom pipeline, which normalizes and combines the scores.
  4. The final result set represents a balance between exact keyword matches and semantic similarity.
By implementing this hybrid approach, we leverage the strengths of both traditional and neural search methods, potentially yielding more comprehensive and relevant search results.
In conclusion, this comprehensive setup enables a sophisticated search capability that can handle both precise keyword matching and semantic similarity matching, providing a powerful tool for information retrieval across a wide range of use cases.

Articles in this Series

Eliatra Newsletter
Sign up to the Eliatra Newsletter to keep updated about our Manages OpenSearch offerings and services!