Overview
In the past 12 months, there were quite a few
activities on improving the performance of OpenSearch. So far, these have concentrated on the core of OpenSearch, especially indexing.
However, there are also other layers involved in the request processing of OpenSearch, which also have a significant impact on the overall performance. A very important component in this regard is the OpenSearch security plugin, which is responsible for enforcing authentication and access controls on the APIs of OpenSearch. Eliatra has extensive expertise in this area - thus, Eliatra took initiative and worked in cooperation with AWS on a performance improvement project for the security plugin.
One result of this project was recently
merged into the code base of the security plugin.
It brings huge improvements; that will be especially the case for clusters with many indices, and for clusters that use document level security or field level security (DLS/FLS).
If everything goes well, these improvements will be released in
one of the next 2.x releases of OpenSearch.
In this article, we will discuss the current situation and the improvements we have achieved. In
part 2, we will dive a bit deeper into the tech details: How have the improvements been achieved? What special data structures are being used?
Performance Improvement Benchmarks
We performed intensive performance testing both on OpenSearch without the improvements and on a pre-release snapshot of OpenSearch with the improvements. In this section, we will present the gathered benchmarks and discuss them. Even more details on the used methodology and the configurations can be found in the
pull request at GitHub; the raw results are available as a
spreadsheet.
Still, we have to start with the usual benchmarking disclaimer: Of course, the following figures can only represent very specific scenarios. Other scenarios can look quite different, as there are so many variables which can be different. Yet, they can give a quite good insight into the overall performance behavior of OpenSearch.
Bulk Ingestion Throughput
Let’s have a look at the first chart - it shows the benchmark results for bulk indexing operations with 10 bulk items per request. Dashed lines represent OpenSearch without the improvements, solid lines represent the improved OpenSearch. The horizontal axis represents the number of indices present on the cluster - this is a significant factor for the performance. The vertical axis represents the measured throughput in documents per second.
The green lines represent a user with a very simple
user role configuration giving them full privileges. You can see that the old and new implementations are more or less on par on clusters with up to 100 indices. However, on clusters with more than 100 indices, the dashed green line starts to go down. On clusters with 300 indices, the user achieves only 82% of the throughput they achieved on clusters with 100 indices. It is important to keep in mind: This is only about the number of indices present of the cluster - it is independent of the number of indices the bulk index request actually refers to. Clusters with 1000 indices are down to 57% of the original throughput. With a growing number of indices, a nice quadratic decline in throughput can be observed.
Of course, one needs to ask the question: Are clusters with 300 or even 3000 indices a realistic thing? Clusters with 300 indices are easy to achieve - it is common practice to configure ingestion to start a new index every day. Thus, without further index life-cycle management, you’ll have 300 indices with a single application after less than a year. Clusters with 3000 indices are a more rare thing, but they still are a thing that can be observed “in the wild”. Clusters with even more indices are really rare - likely because of the performance issues they will face.
The correlation between performance and the number if indices on the cluster is kind of a nasty thing: Initially, everyone starts with only few indices. The performance you are observing will be fine. Only with progressing time, more and more indices will accumulate on a cluster. Then, very slowly, the performance will deteriorate. At one point one will wonder: Hasn’t it been faster once?
So far we only looked at an easy case: a user with full privileges on all indices, represented by the green line. For the access control code, it should be extremely easy to judge whether such a user is authorized to execute a particular action. However, users with restricted privileges are more common. This means restrictions on the allowed indices and/or the allowed actions. These are represented by the yellow lines.
One interesting question when considering performance of role configuration is: does the number of roles assigned to a user affect performance? The answer is yes, and it can be seen quite nicely: The chart shows lines in three different shades of yellow; the rule is: the darker the yellow, the more roles are assigned to the user. Bright yellow represents 1 role, medium yellow means 20 roles, dark yellow - or brown - represents 40 roles.
Let’s look at the actual numbers: When comparing a restricted user with one role to the unrestricted user, one can see that the restricted user achieves about 90% of the throughput of the unrestricted user. The user with 20 roles achieves about 88% of the throughput of the single role user.
But, after looking only at the dashed lines for a while, we really should now look at the elephant in the room - the solid lines - which signify the benchmark results for our optimized code. These lines have two properties which stand out:
- The lines are mostly linear - which means, they do not show a significant change correlated to the number of indices on the cluster. This is expected, it corresponds to the runtime characteristics of the data structures that are being used in the new implementation. More on that in part 2 (spoiler, for the complexity theory fans: O(1)). There’s still a slight decline of performance starting from 3000 indices. We additionally performed micro benchmarks on the privilege evaluation code with up to 100,000 indices. In these micro benchmarks, this performance behavior was not visible. Thus, the decline must come from another source - further research is necessary to find out more about this behavior.
- In any case, the performance shows an significantly improvement. For clusters with up to 100 indices, this will be only noticeable for users with non-trivial role configurations. For clusters with more indices, this will be noticeable for any user. In the case of the cluster with 300 indices, the full privileges user (i.e., with the most trivial role configuration) will achieve an throughput improvement of 27%. On the 1000 indices cluster, we get an improvement of 79%, which is pretty amazing!
Still, bulk index requests with 10 items are kind of small. In order to minimize overhead, it is recommended to use bulk index requests with more items. So, we also tested bulk indexing with 1000 items. The results can be seen in the following chart:
You can see that the recommendation to use bulk index requests with many items is very justified. The overall document throughput grows significantly. You can see that the gaps between the dashed lines and the solid lines get smaller - this is expected as the relative overhead is reduced when increasing the item number. Still, we see clear improvements both on clusters with only few indices and on clusters with many indices. The full privileges user sees a throughput improvement of 3% on the 100 indices cluster, the user with the most complicated role configuration sees a 9% improvement. On a cluster with 1000 indices, the full privileges user sees a 10% improvement, the user with most complicated roles gets a 40% improvement.
Side note: When looking at the chart, you will have noticed that the green lines show a weird dip on the left side. This shouldn’t be there in theory - it is likely an artifact of the benchmark process. After all, benchmarking is a messy business.
One thing which we did not discuss so far is the blue line in the charts: This line signifies the performance of OpenSearch while using a super user certificate. Using such a certificate will bypass most of the access control code within the OpenSearch security plugin. Thus, the throughput observed in this case can be seen as a kind of upper bar to strive for. You can see that there is still a visible gap between the blue line and also the lines with the improved performance. It’s about 14% in the 10 item bulk request case and about 4% in the 1000 item case. This represents the remaining room for improvement. We already have a couple of ideas to address this; we will talk about this further down in this article.
Search Throughput
After looking at the bulk ingestion, let’s now have a look at the performance of search operations. We performed similar benchmarks; we checked:
- Searches on a single index
- Searches on larger sets of indices which were specified by using an index pattern like
index_a*
in the API request
When accessing documents, an additional feature of OpenSearch gets now relevant: Document level security (DLS). We tested that as well.
Let’s start with search requests which operate on a single index:
The picture is similar to the bulk ingestion benchmarks. The old implementation shows a throughput performance which declines quadratically with a growing number of indices. The new implementation shows an improved performance, which is independent of the number of indices on the cluster.
We have a new pair of lines in the chart: The purple lines represent the throughput seen by a user with roles which impose DLS restrictions on the documents the user is allowed to see. You can see that this user experiences the worst performance so far using the old implementation. The optimized implementation, however, shows no noticeable difference between users with DLS and users without DLS. Already on a cluster with just 100 indices, the new implementation improves the throughput of a user with DLS by 58%. On the 1000 index cluster, we see a 542% (fivehundredfourtytwo percent) improvement.
Finally, let’s have a look at a search operation which uses an index pattern like index_a*
:
You can see that the blue line - the optimal performance - now also drops with a growing number of indices on the cluster. This is expected, as naturally the number of indices to be searched is growing. The new implementation (solid lines) still shows clear improvements in performance over the old implementation. It is quite clear that the performance is now much less affected by the number of roles assigned to a user. However, it can be also seen that there’s still quite a gap between the achieved throughput and the optimal throughput indicated by the blue line. This is likely due to the index pattern resolution code which still leaves some room for improvement.
From the Numbers to the Tech
We have seen enormous improvements in performance. To learn how these have been achieved, please head over right to the second part of this article:
A look inside: The new algorithms and data structures for action privilege evaluation!