2023-12-07

ChatGPT and LLMs to Enhance OpenSearch/Elasticsearch Usability

The surge in Large Language Models, like ChatGPT, has inspired diverse applications—generating lists of the best restaurants or crafting JavaScript for complex tasks. Their versatility is evident in both casual queries and intricate programming needs.

Amidst the recent interest in Large Language Models (LMMs) like ChatGPT, many people have been inspired to use them for a variety of uses. Whether it’s in generating a bullet point list of the best restaurants in Dublin, nice places to visit on a holiday or something more in-depth like the generation of a JavaScript to-do list application, we’ve seen many variations on the capabilities of LLMs.

Inspired to Improve Querying of Data Sets

This got us thinking at Eliatra and spurred us into action. How can we use LLMs to make users lives easier when using our products. At Eliatra, we develop feature enhancing software for Opensearch and ElasticSearch, and our cogs started turning. I got inspired by the company founder to start investigating how well LLMs like ChatGPT, Bard and Llama can fill the gaps for those who want to use Elasticsearch and OpenSearch to query their data.

Querying Elastisearch or OpenSearch Required Specialised Knowledge

Note: In this article we use Elasticsearch as our platform of choice. This is mainly because LLMs have a so-called cutoff date. The model has been trained with data only up to this cutoff date. Our assumption was that there were more training data available for Elasticsearch than for OpenSearch, since OpenSearch is newer. However, all principles and approached outlined in this article also apply for OpenSearch.

There are some facts to be considered by those that would like to work with Elasticsearch or OpenSearch. Elasticsearch doesn’t natively support querying with SQL. It has it’s own domain-specific language (DSL). It’s through the DSL that users can construct queries that enable them to interact with its search and analytics capabilities, querying their data.

Therefore, querying an Elasticsearch cluster requires specialised skills and familiarity with Elasticsearch’s domain-specific language (DSL). This created a barrier, limiting data exploration and analysis to individuals with technical expertise.

There have been efforts by the community to bridge the gap between SQL and Elasticsearch. Some plugins allow you to convert SQL-like queries into Elasticsearch’s DSL. This provides a translation layer to facilitate SQL-like interactions but these tools can have limitations in terms of features and performance compared to using the native DSL.

Therefore, the complexity associated with Elasticsearch DSL can discourage non-technical users from engaging with Elasticsearch and OpenSearch. At Eliatra, we want to address these challenges by leveraging language learning models like ChatGPT to assist the user in writing queries to explore their data.

How can Eliatra Improve Querying Data for the Average User Using Language Learning Models

Our aim was to explore whether LLMs can generate Elasticsearch queries from natural language inputs, enabling users, even non-techies, to interact with Elasticsearch in a more user-friendly manner.

Our general approach for testing

Pull the mapping data from OpenSearch for the selected index.

Construct the prompt:

Provide the question in natural language

Provide the index mapping

Ask the LLM to generate a valid query based on the question and mapping

Send it to the LLM

Either present the query or

Execute the query and present the results

We started with providing just the index mapping to the LLM, then asked it to generate queries based off this. In Phase 2, we provided both the index mapping and some sample data to the LLM. Then we asked it to generate the query for us again.

Since the overall goal was to implement an automated tool or plugin, we only used data and information that we could fetch programmatically from Elasticsearch and provided it as-is.

Environment Set Up

To do this we set up a local instance of Elasticsearch with Search Guard and Kibana and started generating and running queries!

LMM Models Used

To test out our idea, we used 4 LLMs initially

Chat GPT 3.0 (the free version)

Chat GPT 4.0 (the paid for version)

Bard (Googles LMM)

OpsGPT

Datasets Used in the Study

Once you have Elastic earch, Search Guard and Kibana set up on your machine, there are predefined sample data sets you can use to test and use.

The sample data sets provided are:

Sample eCommerce Orders (a dataset representing an e-commerce store selling clothes, shoes etc.)

Sample Flight Data (a dataset representing flight logs)

Sample Web Logs (a dataset representing a website and it’s traffic)

Initial Information Provided to the LLM

With these datasets installed, we provided some starting information to the LLMs to give them enough information to generate our database queries for us in DSL.

We started by providing the LLM with the mapping of the index. The index mapping can be obtained by running the following curl command:

curl -u admin:admin -k -X GET 
"https://localhost:9200/kibana_sample_data_logs/_mapping?pretty"

If you have a locally running instance you can just copy and paste the URL segment in your browser.

https://localhost:9200/kibana_sample_data_logs/_mapping?pretty

The actual sample data can be obtained by the curl command

curl -u admin:admin -k -X GET “https://localhost:9200/kibana_sample_data_logs/_search"

Or visiting the URL

https://localhost:9200/kibana_sample_data_logs/_search

Prompt

After fetching the mappings, our prompt looked like:

You are a data scientist that works with Elasticsearch. 
Your data is located in an index <indexname> and the 
data mapping looks like:

<add mapping here>

Generate an Elasticsearch query to answer the following question:

<question in natural language>

In phase 2, we also added some sample data from the index to the prompt.

LLM Response

A typical response by an LLM looked like:

We then tried to run the generated queries unmodified to check whether they are correct or not. In our case, we used the Kibana DevTools:

Results of the Study

Spoiler Alert: The results we achieved were not really satisfying.

Phase 1 We started the investigation by providing only the mapping to the LLMs. Some generated queries did not work, while others worked and provided the correct answer. But overall, the results were poor, so we took it up a notch.

Phase 2 We tried to improve the quality of the results by providing sample data from the index in addition to the mapping. This improved the results, but we were not able to achieve the level of accurateness we hoped for.

Where does that leave us?

The results were not as good as expected. They also showed that the results are highly dependant on the model used. Some are better suited to generate code, others are not. So, our initial approach is far from being anywhere near production readiness. But fear not, there is a lot of room for improvement.

The LLMs used in this study are basically general-purpose models. To improve the quality of the output for one specific domain, two approaches seem promising.

Fine-tuning

The first approach is to fine-tune the model with questions and answers from one specific domain. In our case, Elasticsearch and OpenSearch queries. There are already a couple of articles and approaches out there trying to translate natural language to SQL. The same approaches can be modified and applied to Elasticsearch/OpenSearch queries.

Extend Retrieval Augmented Generation (RAG)

Fine-tuning a model is a long and compute intensive process. As an alternative, one can use Retrieval Augmented Generation (RAG) to provide additional information at prompt-level so that the LLM output is more accurate. This is the approach we took by adding the mapping and sample data to the prompt. The data we used was fetched directly from the cluster without any additional data sources. If available, we could try to augment the prompt with sample questions and queries.

As with LLMs in general, the limiting factor for both approaches is the availability of good sample data.