Flatten Hierarchical JSON with SQL
JSON is a widely used data format that every developer should know how to read, parse, and manipulate. Maybe I will write a JSON 101 post at a later date, but today we are going to learn how to flatten hierarchical JSON with SQL. I will specifically be using T-SQL for MS SQL Server.
First things first. We need to come up with a scenario. Let's pretend we have a list of flight itineraries in JSON. Each itinerary can have multiple passengers and multiple legs to the trip. The final goal is to completely flatten the data where we have a single record for each itinerary, flight, passenger combination.
Now we need some sample JSON that we are going to flatten.
Here is what we want our data to look like.
Itinerary | Name | Source | Destination | Departure |
---|---|---|---|---|
1 | Molly Brown | LAX | DEN | May 5th 10:00AM |
1 | Sam Brown | LAX | DEN | May 5th 10:00AM |
1 | Molly Brown | DEN | ORD | May 5th 5:00PM |
1 | Sam Brown | DEN | ORD | May 5th 5:00PM |
1 | Molly Brown | ORD | LAX | May 12th 9:00AM |
1 | Sam Brown | ORD | LAX | May 12th 9:00AM |
2 | Sarah White | DEN | ORD | May 5th 5:00PM |
2 | Sarah White | ORD | DEN | May 10th 11:00AM |
3 | John Smith | ORD | DEN | May 10th 11:00AM |
3 | Ellie Smith | ORD | DEN | May 10th 11:00AM |
3 | Madison Smith | ORD | DEN | May 10th 11:00AM |
Now let us walk through how we can get there.
We will start by querying the first layer of data, the itineraries. We can use the OPENJSON function to do this. Here is the SQL and results from the first level query.
Itinerary | flights | passengers |
---|---|---|
1 | [{"source":"LAX","destination":"DEN","departure":"May 5th 10:00AM"}, {"source":"DEN","destination":"ORD","departure":"May 5th 5:00PM"},{"source":"ORD","destination":"LAX","departure":"May 12th 9:00AM"}] | [ {"name": "Molly Brown"}, {"name": "Sam Brown"} ] |
2 | [{"source":"DEN","destination":"ORD","departure":"May 5th 5:00PM"},{"source":"ORD","destination":"DEN","departure":"May 10th 11:00AM"}] | [{"name": "Sarah White"}] |
3 | [{"source":"ORD","destination":"DEN","departure":"May 10th 11:00AM"}] | [{"name": "John Smith"},{"name": "Ellie Smith"},{"name": "Madison Smith"}] |
Now lets parse the JSON that is now labled as flights for each itenerary. We will use the OPENJSON function again in conjuction with an OUTER APPLY. The OUTER APPLY will run the OPENJSON function across each record and create a recordset of data from both sides of the OUTER APPLY.
Itinerary | Source | Destination | Departure | passengers |
---|---|---|---|---|
1 | LAX | DEN | May 5th 10:00AM | [{"name": "Molly Brown"},{"name": "Sam Brown"}] |
1 | DEN | ORD | May 5th 5:00PM | [{"name": "Molly Brown"},{"name": "Sam Brown"}] |
1 | ORD | LAX | May 12th 9:00AM | [{"name": "Molly Brown"},{"name": "Sam Brown"}] |
2 | DEN | ORD | May 5th 5:00PM | [{"name": "Sarah White"}] |
2 | ORD | DEN | May 10th 11:00AM | [{"name": "Sarah White"}] |
3 | ORD | DEN | May 10th 11:00AM | [{"name": "John Smith"},{"name": "Ellie Smith"},{"name": "Madison Smith"}] |
Now we have flattened the JSON to itinerary, flight level. All that is left is parsing passenger data in the same way that we used an OUTER APPLY for the flight data. Here we have the final SQL query.
We can use this same method for deeper hierarchies. Imagine we added another layer of data to each passenger for their luggage. Here is some json with the deeper hierarchy, the SQL and the output.
Itinerary | Name | Source | Destination | Departure | LuggageId | LuggageType |
---|---|---|---|---|---|---|
1 | Molly Brown | LAX | DEN | May 5th 10:00AM | 1 | Carry On |
1 | Molly Brown | LAX | DEN | May 5th 10:00AM | 2 | Checked |
1 | Sam Brown | LAX | DEN | May 5th 10:00AM | 3 | Carry On |
1 | Sam Brown | LAX | DEN | May 5th 10:00AM | 4 | Checked |
1 | Sam Brown | LAX | DEN | May 5th 10:00AM | 5 | Checked |