OpenSearch is a community-driven, open-source search and analytics suite built on Apache Lucene, offering powerful features and tools for search and data analysis. As a fork of Elasticsearch, OpenSearch enables users to search, analyze, and visualize large volumes of data quickly and efficiently. This guide will help you better understand OpenSearch and its functionality by providing a comprehensive list of key terms and definitions.
A cluster is a group of OpenSearch nodes that work together to store, index, and search data. Clusters enable horizontal scaling and improve fault tolerance.
A node is a single running instance of OpenSearch. Nodes can be classified into different types, such as master-eligible, data, and ingest nodes, based on their roles and responsibilities within the cluster.
An index is a data structure used to store, organize, and search documents. It consists of a collection of documents that share similar characteristics, like a database table.
A shard is a smaller, more manageable part of an index. Each shard is a self-contained index, and OpenSearch distributes shards across nodes in the cluster for load balancing and redundancy.
A replica is a copy of a primary shard, providing failover and load balancing capabilities. OpenSearch automatically distributes replica shards to different nodes in the cluster.
A document is a single unit of searchable data in OpenSearch, similar to a row in a database table. Each document is uniquely identified by an ID and contains a collection of fields.
A field is a named, typed value within a document, similar to a column in a database table. Fields can be of different data types, such as string, number, date, or boolean.
Mapping defines the structure of documents in an index, including field names, data types, and other metadata. It acts as a schema for the documents and helps OpenSearch understand the data and optimize search performance.
An analyzer is a set of components that preprocess text during indexing and searching. Analyzers consist of a tokenizer and zero or more token filters and character filters. They are used to break text into tokens, filter, and normalize tokens for better search performance.
A tokenizer is a component of an analyzer that breaks text into individual tokens or terms, typically by splitting on whitespace or punctuation.
A token filter is a component of an analyzer that processes and modifies tokens after they are generated by the tokenizer. Examples include lowercasing, stemming, and removing stop words.
A character filter is a component of an analyzer that processes text before tokenization, such as replacing or removing specific characters.
A query is a request to search, filter, or aggregate data from an OpenSearch index. OpenSearch supports various query types, such as term, match, range, and boolean queries.
Relevance score, or _score, is a numeric value that represents the relevance of a document to a given query. Higher scores indicate more relevant documents. OpenSearch uses a formula called the Okapi BM25 to compute relevance scores.
Aggregation is a process of grouping and summarizing data based on specified criteria. OpenSearch supports various aggregation types, such as bucket aggregations, metric aggregations, and pipeline aggregations.
An ingest node is a specialized OpenSearch node responsible for processing and transforming incoming documents before they are indexed. Ingest nodes use pipelines and processors to perform these tasks.
A pipeline is a series of processors configured to process and transform data as it enters the cluster.