Company and Culture

2022-11-10

JSON Processing for the Command Line with jq

This is a tutorial for jq, the command line tool for processing JSON on Bash.

At Eliatra, we put lots of effort into quality assurance: For a software company, there’s nothing worse than bugs that only surface on customers’ production systems.
For our QA processes, we take a multi-step approach. First, there are your typical Unit- and Integration Tests. Unit tests check our code in isolation, whereas integration tests start an OpenSearch cluster in-process, send requests, and validate the result. This already gives you a good sense of the quality of the software. However, we take QA a step further and try replicating real-world environments as well as possible situations.
We use Docker and Docker Compose to create an environment close to what our customers run in production. It includes OpenSearch clusters with various sizes and configurations, OpenLDAP, Kerberos, Keycloak (for OIDC and SAML), HTTP Proxies, etc. We then run an extensive suite of tests against this Docker environment. These tests are Shell-based and rely heavily on two Open Source projects:
    JQ, “a lightweight and flexible command-line JSON processor”
    Bats, “a TAP-compliant testing framework for Bash”
A typical test sends curl requests to the Docker environment and then uses jq to validate the JSON result from OpenSearch.

jq - A Swiss Army Knife for JSON Processing

OpenSearch uses JSON as the response format for its APIs. So we needed a library for processing JSON in Bash scripts. We chose jq, which is probably the most powerful library for this purpose.
JQ is a versatile and simple-to-use free Open Source JSON processor. It enables users to obtain specific fields or attribute-value pairs from a JSON file. It has an extensive range of filters and operations capable of modifying, examining, and transforming JSON data. It is often called the “sed for JSON”.

Basic Usage - Input and Output

Let’s start with a simple JSON stored in a file staff.json and see what jq can do for us.
{
	"FirstName": "LEE",
	"LastName": "AARON",
	"Designation": "CEO",
	"Department": "Management",
	"Address": { "Street": "7 Euclid Dr.", "City": "Yorktown, VA 23693" },
	"Interests": ["Piano", "Literature", "Hiking", "Sports"],
	"EmployedSince": 2020
}
Since jq is a Bash command as any other, we can simply cat the contents of our staff.json file, pipe it to jq and apply some filters. The most simple filter jq provides is the identity filter (“.”). It will take the JSON input and pretty-print it unchanged.
$ cat staff.json | jq '.'

{
  "FirstName": "LEE",
  "LastName": "AARON",
  "Designation": "CEO",
  "Department": "Management",
  "Address": {
    "Street": "7 Euclid Dr.",
    "City": "Yorktown, VA 23693"
  },
  "Interests": [
    "Piano",
    "Literature",
    "Hiking",
    "Sports"
  ],
  "EmployedSince": 2020
}
You can combine jq with any other Bash command. For example, if the JSON data is stored on a server, use it with curl like:
$ curl 'https://example.com/staff.json' | jq '.'
If your JSON is stored in a variable, use:
$ jq --raw-output '.'  <<< $STAFF
And, of course, you can capture the output of jq in a variable for further processing as well:
result=$(jq --raw-output '.'  <<< $STAFF)

Simple Filters: Accessing Properties

The simplest use case is to extract some values from a JSON object. For this we can use the “Object Identifier-Index” syntax, where we provide the path to the key we are interested in:
$ cat staff.json | jq '.FirstName'  
It will select the value of the key LastName and thus yields
"LEE"
You can also select multiple keys at once:
$ cat staff.json | jq '.FirstName, .LastName, .Designation'   

"LEE"
"AARON"
"CEO"
And, of course, you can reference nested objects as well:
$ cat staff.json | jq '.Address.Street'

"7 Euclid Dr."

Working with Arrays

You can also use the same syntax to reference arrays:
$ cat staff.json | jq '.Interests'

[
  "Piano",
  "Literature",
  "Hiking",
  "Sports"
]
If you want to pick a specific array element, you can refer to it via its index:
$ cat staff.json | jq '.Interests[2]'

"Hiking"
Or you can slice the array to return a subarray of an array:
cat staff.json | jq '.Interests[0:2]'

[
  "Piano",
  "Literature"
]
You can also omit one of the indices; In this case, slicing will run to the end of the array. Negative index values are allowed as well, which makes jq at the end of the array. The following command will give us all elements starting from 2 indices from the end of the array to the beginning:
cat staff.json | jq '.Interests[-2:]'

[
  "Hiking",
  "Sports"
]

Using Functions

Let’s expand our example JSON a bit. Let’s assume our top-level JSON element contains an array of employee records, like:
[
	{
		"FirstName": "LEE",
		"LastName": "AARON",
		"Designation": "CEO",
		"Department": "Management",
		"Address": { "Street": "7 Euclid Dr.", "City": "Yorktown, VA 23693" },
		"Interests": ["Piano", "Literature", "Hiking", "Sports"],
		"EmployedSince": 2020
	},
	{
		"FirstName": "ANGELA",
		"LastName": "GOSSOW",
		"Designation": "Manager",
		"Department": "IT",
		"Address": { "Street": "81 West Lake St.", "City": "Midland, MI 48640" },
		"Interests": ["Singing", "Performing"],
		"EmployedSince": 2021,
		"ReportsTo": "LEE AARON"
	},
	{
		"FirstName": "SCOTT",
		"LastName": "IAN",
		"Designation": "Developer",
		"Department": "IT",
		"Address": { "Street": "730 North Lake Ave.", "City": "Chesapeake, VA 23320" },
		"Interests": ["Guitar", "Performing"],
		"EmployedSince": 2019,
		"ReportsTo": "ANGELA GOSSOW"
	}
]
This is where jq shines. Let’s start easy and try to select all the first names of our staff. We need to let jq know that we are referencing an array first. This is done by adding square brackets. Then we specify which field we are interested in:
$ cat staff.json | jq '.[].FirstName'

"LEE"
"ANGELA"
"SCOTT"
Now let’s use some of the powerful built-in functions of jq. Say you only want select employees that report to a specific person.
cat staff.json | jq '.[] | select(.ReportsTo=="LEE AARON") '

{
  "FirstName": "ANGELA",
  "LastName": "GOSSOW",
  "Designation": "Manager",
  "Department": "IT",
  "Address": {
    "Street": "81 West Lake St.",
    "City": "Midland, MI 48640"
  },
  "Interests": [
    "Singing",
    "Performing"
  ],
  "EmployedSince": 2021,
  "ReportsTo": "LEE AARON"
}
Instead of using a specific value, we can also check if an attribute contains a particular String:
cat staff.json | jq '.[] | select(.Address.City | contains("idl")) '                      

{
  "FirstName": "ANGELA",
  "LastName": "GOSSOW",
  "Designation": "Manager",
  "Department": "IT",
  "Address": {
    "Street": "81 West Lake St.",
    "City": "Midland, MI 48640"
  },
  "Interests": [
    "Singing",
    "Performing"
  ],
  "EmployedSince": 2021,
  "ReportsTo": "LEE AARON"
}
You can also use the “contains” filter to select objects based on an array value. For selecting all employees that have “Performing” as one of their interests, we can use:
cat staff.json | jq '.[] | select(.Interests[] | contains("Performing"))'

{
  "FirstName": "ANGELA",
  "LastName": "GOSSOW",
  "Designation": "Manager",
  "Department": "IT",
  "Address": {
    "Street": "81 West Lake St.",
    "City": "Midland, MI 48640"
  },
  "Interests": [
    "Singing",
    "Performing"
  ],
  "EmployedSince": 2021,
  "ReportsTo": "LEE AARON"
}
{
  "FirstName": "SCOTT",
  "LastName": "IAN",
  "Designation": "Developer",
  "Department": "IT",
  "Address": {
    "Street": "730 North Lake Ave.",
    "City": "Chesapeake, VA 23320"
  },
  "Interests": [
    "Guitar",
    "Performing"
  ],
  "EmployedSince": 2019,
  "ReportsTo": "ANGELA GOSSOW"
}
Finally, filters can also be chained. Based on the example above, let’s look for the LastName of employees who are interested in Performing and have been employed since 2021:
$ cat staff.json | jq '.[] | select(.Interests[] | contains("Performing")) | select (.EmployedSince == 2021) | .LastName' 

"GOSSOW"
There are a lot of other built-in filters and functions, for example, to get the minimum or maximum of a numeric field. The UNIX style way jq uses to combine filters and functions is what makes it so powerful.

Conclusion

jq is a very powerful tool for working with JSON on the command line. This article briefly described how to select attributes and objects from a JSON file and apply filters and functions.
jq can, of course, also be used to mutate and transform JSON. This subject will be covered in the following article.

Further Reading

Ready to get started?!
Let's work together to navigate your OpenSearch journey. Send us a message and talk to the team today!
Get in touch