In the last article, we looked at jq and how you can use it to extract attributes and apply filters and functions on JSON data on Bash. In this post, we will explore how you can use jq to transform and modify your JSON data.
Recap - Our Sample JSON Data
To recap, this is the JSON data we will be working with. It’s stored in a file called staff.json:
copy[
{
"FirstName": "LEE",
"LastName": "AARON",
"Designation": "CEO",
"Department": "Management",
"Address": { "Street": "7 Euclid Dr.", "City": "Yorktown, VA 23693" },
"Interests": ["Piano", "Literature", "Hiking", "Sports"],
"EmployedSince": 2020
},
{
"FirstName": "ANGELA",
"LastName": "GOSSOW",
"Designation": "Manager",
"Department": "IT",
"Address": { "Street": "81 West Lake St.", "City": "Midland, MI 48640" },
"Interests": ["Singing", "Performing"],
"EmployedSince": 2021,
"ReportsTo": "LEE AARON"
},
{
"FirstName": "SCOTT",
"LastName": "IAN",
"Designation": "Developer",
"Department": "IT",
"Address": { "Street": "730 North Lake Ave.", "City": "Chesapeake, VA 23320" },
"Interests": ["Guitar", "Performing"],
"EmployedSince": 2019,
"ReportsTo": "ANGELA GOSSOW"
}
]
Adding and Deleting Keys
Let’s start with an easy example: Adding and deleting key/value pairs from our original JSON:
copy$ cat staff.json | jq '.[1] + {"Floor": 2}'
{
"FirstName": "ANGELA",
"LastName": "GOSSOW",
"Designation": "Manager",
"Department": "IT",
"Address": {
"Street": "81 West Lake St.",
"City": "Midland, MI 48640"
},
"Interests": [
"Singing",
"Performing"
],
"EmployedSince": 2021,
"ReportsTo": "LEE AARON",
"Floor": 2
}
For the sake of brevity, we are selecting the first element from our staff JSON (“.[1]”) and add a new key called “Floor”. We can remove key/values basically the same way:
copy$ cat staff.json | jq 'map(del(.Address)) | .[1]'
{
"FirstName": "ANGELA",
"LastName": "GOSSOW",
"Designation": "Manager",
"Department": "IT",
"Interests": [
"Singing",
"Performing"
],
"EmployedSince": 2021,
"ReportsTo": "LEE AARON"
}
Sorting
Now we want to sort our fake staff data by LastName. We can use the built-in sort_by function for this. As you already know if you read
our first article in this series, we can simply use the pipe operator for chaining filters and functions. Here, we pipe the content of the staff.json file to jq, apply the sort_by function, and pipe the output to a filter that extracts the LastName attribute:
copy$ cat staff.json | jq 'sort_by(.LastName) | .[] | .LastName'
"AARON"
"GOSSOW"
"IAN"
Tip: Formatting the Output as tsv, csv and More
Here’s a small tip on how you can control the output of jq. Say we want to sort our objects by LastName as before, but we want to select the FirstName and the LastName. As described in the last article, you can select multiple attributes at once by comma separating them:
copycat staff.json | jq 'sort_by(.LastName) | .[] | .FirstName, .LastName'
"LEE"
"AARON"
"ANGELA"
"GOSSOW"
"SCOTT"
"IAN"
However, the output format is different from what we expected. Instead of each attribute printed on a newline, we would rather have FirstName and LastName on the same line. So let’s use one of the
built-in formatters to produce a tab-separated output:
copycat staff.json | jq 'sort_by(.LastName) | .[] | [.FirstName, .LastName] | @tsv'
"LEE\tAARON"
"ANGELA\tGOSSOW"
"SCOTT\tIAN"
Note the @tsv formatter command at the end. This is much better, and the result is usable for further processing. If you want to avoid escaping the tab, use the -r (–raw-output) command line switch:
copycat staff.json | jq -r 'sort_by(.LastName) | .[] | [.FirstName, .LastName] | @tsv'
LEE AARON
ANGELA GOSSOW
SCOTT IAN
If we use @csv instead of @tsv, we get:
copycat staff.json | jq -r 'sort_by(.LastName) | .[] | [.FirstName, .LastName] | @csv'
"LEE","AARON"
"ANGELA","GOSSOW"
"SCOTT","IAN"
Sort by Multiple Keys
The sort_by function also takes multiple arguments, so you can sort on LastName and then on FirstName. Which, in our case, does not change the output since the LastName in our sample data is unique. But you get the point…
copycat staff.json | jq -r 'sort_by(.LastName, .FirstName) | .[] | [.FirstName, .LastName] | @csv'
"LEE","AARON"
"ANGELA","GOSSOW"
"SCOTT","IAN"
Math Functions
jq has
built-in math functions as well. We can use it in filters, for example, to select all personnel that was hired in 2020 or later:
copy$cat staff.json | jq -r '.[] | select(.EmployedSince >= 2020) | [.FirstName, .LastName, .EmployedSince] | @csv'
"LEE","AARON",2020
"ANGELA","GOSSOW",2021
However, these functions can also be used to transform our original JSON. Let’s have a look at this command which combines some of the techniques we learned before and adds “2” to the EmployedSince field (for now particular reason other than this tutorial 😉):
copy$ cat staff.json | jq '.[].EmployedSince += 2' | jq -r '.[] | [.LastName, .FirstName, .EmployedSince] | @csv'
"AARON","LEE",2022
"GOSSOW","ANGELA",2023
"IAN","SCOTT",2021
First, we tell jq that we want to process an array of objects and that we want to add “2” to the “EmployedSince” field:
copy$ cat staff.json | jq -r '.[].EmployedSince += 2' ...
The output of this command is our original JSON array of objects but with the incremented “EmployedSince” field. Next, we pipe this output of jq back to jq, and tell it (again) that we are working with an arrays and want to extract the fields LastName, FirstName, and EmployedSince.
We want to have the jq output as an array. Without the formatting option, we would get:
copy$ cat staff.json | jq '.[].EmployedSince += 2' | jq -r '.[] | [.LastName, .FirstName, .EmployedSince]'
[
"AARON",
"LEE",
2022
]
[
"GOSSOW",
"ANGELA",
2023
]
[
"IAN",
"SCOTT",
2021
]
Finally, we are adding the @csv formatter as in the example before to get our final output:
copycat staff.json | jq '.[].EmployedSince += 2' | jq -r '.[] | [.LastName, .FirstName, .EmployedSince] | @csv'
"AARON","LEE",2022
"GOSSOW","ANGELA",2023
"IAN","SCOTT",2021
Grouping and Creating JSON Output
Last, let’s use the group_by function to count the number of employess in each department.
copy$ cat staff.json | jq 'group_by (.Department)[]'
[
{
"FirstName": "ANGELA",
"LastName": "GOSSOW",
...
"ReportsTo": "LEE AARON"
},
{
"FirstName": "SCOTT",
"LastName": "IAN",
...
}
]
[
{
"FirstName": "LEE",
"LastName": "AARON",
...
]
(output abbreviated for clarity)
This yields an array of objects, where each index contains all staff that work in the same department. We can now apply the length function on each element, and grab the Department value. Since we grouped by Department, it does not matter from which object we take it for each array index. We just grab it from the first. For getting the number of employees for each Department, we just apply the length function.
This example also shows how we can create an entire new JSON structure as output, and reference values from the step before:
copy$ cat staff.json | jq 'group_by (.Department)[] | {Department: .[0].Department, length: length}'
{
"Department": "IT",
"length": 2
}
{
"Department": "Management",
"length": 1
}
Further Readings
Apart from the links we provided in the previous article, we recommend to checkout the following links as well: