OpenSearch

2022-11-23

Transform JSON Data on Bash Using jq

We will explore how you can use jq to transform and modify your JSON data on the command line.

In the last article, we looked at jq and how you can use it to extract attributes and apply filters and functions on JSON data on Bash. In this post, we will explore how you can use jq to transform and modify your JSON data.

Recap - Our Sample JSON Data

To recap, this is the JSON data we will be working with. It’s stored in a file called staff.json:

[
	{
		"FirstName": "LEE",
		"LastName": "AARON",
		"Designation": "CEO",
		"Department": "Management",
		"Address": { "Street": "7 Euclid Dr.", "City": "Yorktown, VA 23693" },
		"Interests": ["Piano", "Literature", "Hiking", "Sports"],
		"EmployedSince": 2020
	},
	{
		"FirstName": "ANGELA",
		"LastName": "GOSSOW",
		"Designation": "Manager",
		"Department": "IT",
		"Address": { "Street": "81 West Lake St.", "City": "Midland, MI 48640" },
		"Interests": ["Singing", "Performing"],
		"EmployedSince": 2021,
		"ReportsTo": "LEE AARON"
	},
	{
		"FirstName": "SCOTT",
		"LastName": "IAN",
		"Designation": "Developer",
		"Department": "IT",
		"Address": { "Street": "730 North Lake Ave.", "City": "Chesapeake, VA 23320" },
		"Interests": ["Guitar", "Performing"],
		"EmployedSince": 2019,
		"ReportsTo": "ANGELA GOSSOW"
	}
]

Adding and Deleting Keys

Let’s start with an easy example: Adding and deleting key/value pairs from our original JSON:

$ cat staff.json | jq '.[1] + {"Floor": 2}' 

{
  "FirstName": "ANGELA",
  "LastName": "GOSSOW",
  "Designation": "Manager",
  "Department": "IT",
  "Address": {
    "Street": "81 West Lake St.",
    "City": "Midland, MI 48640"
  },
  "Interests": [
    "Singing",
    "Performing"
  ],
  "EmployedSince": 2021,
  "ReportsTo": "LEE AARON",
  "Floor": 2
}

For the sake of brevity, we are selecting the first element from our staff JSON (“.[1]”) and add a new key called “Floor”. We can remove key/values basically the same way:

$ cat staff.json | jq 'map(del(.Address)) | .[1]'

{
  "FirstName": "ANGELA",
  "LastName": "GOSSOW",
  "Designation": "Manager",
  "Department": "IT",
  "Interests": [
    "Singing",
    "Performing"
  ],
  "EmployedSince": 2021,
  "ReportsTo": "LEE AARON"
}

Sorting

Now we want to sort our fake staff data by LastName. We can use the built-in sort_by function for this. As you already know if you read our first article in this series, we can simply use the pipe operator for chaining filters and functions. Here, we pipe the content of the staff.json file to jq, apply the sort_by function, and pipe the output to a filter that extracts the LastName attribute:

$ cat staff.json | jq 'sort_by(.LastName) | .[] | .LastName' 

"AARON"
"GOSSOW"
"IAN"

Tip: Formatting the Output as tsv, csv and More

Here’s a small tip on how you can control the output of jq. Say we want to sort our objects by LastName as before, but we want to select the FirstName and the LastName. As described in the last article, you can select multiple attributes at once by comma separating them:

cat staff.json | jq 'sort_by(.LastName) | .[] | .FirstName, .LastName'           

"LEE"
"AARON"
"ANGELA"
"GOSSOW"
"SCOTT"
"IAN"

However, the output format is different from what we expected. Instead of each attribute printed on a newline, we would rather have FirstName and LastName on the same line. So let’s use one of the built-in formatters to produce a tab-separated output:

cat staff.json | jq 'sort_by(.LastName) | .[] | [.FirstName, .LastName] | @tsv' 

"LEE\tAARON"
"ANGELA\tGOSSOW"
"SCOTT\tIAN"

Note the @tsv formatter command at the end. This is much better, and the result is usable for further processing. If you want to avoid escaping the tab, use the -r (–raw-output) command line switch:

cat staff.json | jq -r 'sort_by(.LastName) | .[] | [.FirstName, .LastName] | @tsv'

LEE	AARON
ANGELA	GOSSOW
SCOTT	IAN

If we use @csv instead of @tsv, we get:

cat staff.json | jq -r 'sort_by(.LastName) | .[] | [.FirstName, .LastName] | @csv' 

"LEE","AARON"
"ANGELA","GOSSOW"
"SCOTT","IAN"

Sort by Multiple Keys

The sort_by function also takes multiple arguments, so you can sort on LastName and then on FirstName. Which, in our case, does not change the output since the LastName in our sample data is unique. But you get the point…

cat staff.json | jq -r 'sort_by(.LastName, .FirstName) | .[] | [.FirstName, .LastName] | @csv'

"LEE","AARON"
"ANGELA","GOSSOW"
"SCOTT","IAN"

Math Functions

jq has built-in math functions as well. We can use it in filters, for example, to select all personnel that was hired in 2020 or later:

$cat staff.json | jq -r '.[] | select(.EmployedSince >= 2020) | [.FirstName, .LastName, .EmployedSince] | @csv'

"LEE","AARON",2020
"ANGELA","GOSSOW",2021

However, these functions can also be used to transform our original JSON. Let’s have a look at this command which combines some of the techniques we learned before and adds “2” to the EmployedSince field (for now particular reason other than this tutorial 😉):

$ cat staff.json | jq '.[].EmployedSince += 2' | jq -r '.[] | [.LastName, .FirstName, .EmployedSince] | @csv'

"AARON","LEE",2022
"GOSSOW","ANGELA",2023
"IAN","SCOTT",2021

First, we tell jq that we want to process an array of objects and that we want to add “2” to the “EmployedSince” field:

$ cat staff.json | jq -r '.[].EmployedSince += 2' ...

The output of this command is our original JSON array of objects but with the incremented “EmployedSince” field. Next, we pipe this output of jq back to jq, and tell it (again) that we are working with an arrays and want to extract the fields LastName, FirstName, and EmployedSince.

We want to have the jq output as an array. Without the formatting option, we would get:

$ cat staff.json | jq '.[].EmployedSince += 2' | jq -r '.[] | [.LastName, .FirstName, .EmployedSince]'       

[
  "AARON",
  "LEE",
  2022
]
[
  "GOSSOW",
  "ANGELA",
  2023
]
[
  "IAN",
  "SCOTT",
  2021
]

Finally, we are adding the @csv formatter as in the example before to get our final output:

cat staff.json | jq '.[].EmployedSince += 2' | jq -r '.[] | [.LastName, .FirstName, .EmployedSince] | @csv'   

"AARON","LEE",2022
"GOSSOW","ANGELA",2023
"IAN","SCOTT",2021

Grouping and Creating JSON Output

Last, let’s use the group_by function to count the number of employess in each department.

$ cat staff.json | jq 'group_by (.Department)[]'

[
  {
    "FirstName": "ANGELA",
    "LastName": "GOSSOW",
    ...
    "ReportsTo": "LEE AARON"
  },
  {
    "FirstName": "SCOTT",
    "LastName": "IAN",
    ...
  }
]
[
  {
    "FirstName": "LEE",
    "LastName": "AARON",
    ...
]

(output abbreviated for clarity)

This yields an array of objects, where each index contains all staff that work in the same department. We can now apply the length function on each element, and grab the Department value. Since we grouped by Department, it does not matter from which object we take it for each array index. We just grab it from the first. For getting the number of employees for each Department, we just apply the length function.

This example also shows how we can create an entire new JSON structure as output, and reference values from the step before:

$ cat staff.json | jq 'group_by (.Department)[] | {Department: .[0].Department, length: length}'

{
  "Department": "IT",
  "length": 2
}
{
  "Department": "Management",
  "length": 1
}