Eliatra Suite

2023-07-27

Log Analytics, Pt.2: Setting up a Watch in Blocks Mode

In this article we will configure a Watch that periodically scans our log index for errors.

In this series’s last article, we imported sample web log data into OpenSearch and set up Email and Slack notification channels. Today we will configure a Watch that scans the log data periodically for errors.

Setup and Goals

The sample data we imported resides in an index called opensearch_dashboards_sample_data_logs. Each log entry is represented by a corresponding document that roughly looks like this:

{
	"agent": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
	"bytes": 123,
	"clientip": "166.168.152.39",
	"host": "www.opensearch.org",
	"index": "opensearch_dashboards_sample_data_logs",
	"ip": "166.168.152.39",
	"machine": {
		"ram": 3221225472,
		"os": "osx"
	},
	"referer": "http://nytimes.com/success/philip-k-chapman",
	"response": 503,
	"tags": [
		"success",
		"security"
	],
	"timestamp": "2023-07-25T16:11:47.086Z",
	"url": "https://www.opensearch.org/downloads/dataprepper",
	"utc_time": "2023-07-25T16:11:47.086Z",
	...	
}

To check for faulty requests we will use the response field of our documents, which captures the HTTP response code of the server and thus indicates if a request was successful or not. We will also use the timestamp field to count the number of errors in a specific timeframe.

We aim to set up a simple Watch that checks the index every 5 Minutes. If the number of errors exceeds 20 in the last 10 minutes, we want to send out a notification.

Setting up a Watch - Alerting Plus Blocks Mode

We can use the Alerting Plus REST API to set up everything from scratch using the command line and a Watch definition in JSON format. However, life is difficult enough, so we are going to use the Alerting Plus Dashboards UI with its super simple Blocks Mode.

The Alerting Plus UI Blocks Mode allows you to configure a complete Watch one step at a time.

What is Blocks Mode? A Watch consists of the following three steps:

Triggers define when a watch will be executed. Each watch has at least one trigger

Checks, as the name implies, gather data and check it for certain conditions. Each watch can have several checks, which are executed as a chain. Alerting Plus offers

Inputs that pull in data from a source such as an OpenSearch index or an HTTP service

Transformations and calculations to transform the gathered data into a format that subsequent operations may require.

Conditions to analyze the gathered data using scripts and decide whether to proceed with execution or to abort

Actions that are executed if all preceding conditions are met. In our case, we want to send notifications via Email and Slack

The Alerting Plus Dashboards UI follows these three steps and allows you to configure the checks of your watch step-by-step, by breaking them down into blocks. Let’s explore this concept and set up a new Watch.

On the Alerting Plus Watches page, select Add -> Watch:

Adding a Schedule

First, we give our Watch a name and define the schedule. In our case, we want the Watch to execute every 10 minutes, so for the Mode section, we select By Interval and enter 10 Minutes:

You can select other schedules, such as daily, hourly, or weekly, or even define the trigger with Cron syntax for more complex scenarios.

Defining our Data Input

Since we want to work with the Blocks Mode instead of plain JSON, we select it via Definition -> Type -> Blocks.

We can now define our first Block in the execution chain by clicking on Add. This will open a menu where we can choose from pre-defined examples that simplifies our Watch setup. In almost all cases, the first Block will be of type Input. This defines where the data is coming from.

Since our data is located in an OpenSearch index, we want to pull it in with a regular OpenSearch query. Thus, we select Input -> Full Text. This will add our first block to the execution chain.

Let’s now tweak it to fit our use case.

We first give our Block a name and select opensearch_dashboards_sample_data_logs as the index. In the Target field, we can choose a unique name that we can use in subsequent blocks to refer to the result of this block’s execution - hang on, this will get clearer soon. For now, just use http_errors.

Next, we must define an OpenSearch query to fetch the data we want to examine. We want to get all documents with an HTTP response code between 500 and 599 for the last 10 minutes. Our OpenSearch query thus looks like this:

{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "timestamp": {
              "gte": "now-10m"
            }
          }
        },
        {
          "range": {
            "response": {
              "gte": 500,
              "lte": 599
            }
          }
        }
      ]
    }
  }
}

After filling out all required fields, your Input Block should look like this:

Testing it Out - Executing a Check

To test if our query is correct and what the output of this Block looks like, Alerting Plus offers two convenient solutions: Execute and inspect a single Block, or execute all Blocks. This will run a single Block or all Blocks in a chain, and display the JSON output. At the moment we only have a single Block, so we chose “Execute Only This Block” from the menu:

On the right-hand side, we can now see the JSON result of our Input Block a.k.a OpenSearch query. Note that the output of our query is available under the http_errors key, as specified in the Target field of our Block.

We will use this key when defining a condition in the next step.

Permission to Proceed: Adding a Condition

We now need to decide under which condition our Watch should continue to execute. In our case, the condition is quite simple. We want to proceed and send out notifications if the error rate exceeds 20 in the last 10 minutes.

First, let’s add a Condition Block. We select Add again but choose Condition this time. We now see two Blocks in our execution chain:

Our OpenSearch query only returns documents from the last 10 minutes that have response code between 500 and 599. We now only need to count the documents to decide whether to proceed or not. For this, we simply use the standard hits.total.value field of the OpenSearch query. Remember that we used http_errors in the Target field of the Input Block. We can now use this key to access the query result. The condition thus looks like:

The data key indicates that we want to access the runtime data of our Watch, as opposed to the Watch metadata.

That’s it for today.

To recap: In this article, we set up a basic Watch using the Alerting Plus UI in Blocks Mode. After defining the execution schedule, we added an Input that issues an OpenSearch query every time the Watch is executed. The query collects data from the opensearch_dashboards_sample_data_logs index and makes it available under the key http_errors for further processing steps. Last, we added a Condition that stops the execution if the number of found documents by the Input query is below 20.

In the following article, we will add our already configured channels to the Watch, so we can send out notifications.

Articles in This Series

Log Analytics, Pt.1: Setting up Notification Channels

Log Analytics, Pt.2: Setting up a Watch in Blocks Mode (this article)

Log Analytics, Pt.3: Sending Notifications

Log Analytics, Pt.4: Implementing Escalation Levels