Day 7: Catmandu JSON paths

07_librecatrojectYesterday we learned the command catmandu and how it can be used to parse structured information. Today we will go deeper into catmandu and describe how to pluck data out of structured information. As always, you need to startup your Virtual Catmandu (hint: see our day 1 tutorial) and start up the UNIX prompt (hint: see our day 2 tutorial).

Today will we fetch a new weather report and store it in a new file weather2.json. Lets try to download Tokyo:

$ curl http://api.openweathermap.org/data/2.5/weather?q=Tokyo,jp > weather2.json

From the previous tutorials we know many commands how to examine this data set. For instance, to get a quick overview of the content of weather2.json we can use the cat command:

$ cat weather2.json

Or, we could use the less command:

$ less weather2.json

Remember to type the ‘q’ key to exit less.

We could also use nano to inspect the data, but we skip that for now. Nano is a text editor and is not particularly suited for data.

To count the number of lines, words and characters in weather2.json we can use the wc command:

$ wc weather2.json
1 3 463

This output shows that weather2.json contains 1 line , 3 words and 463 characters. The 1 line is indeed correct: the file contains one big line of JSON. The 463 characters is also correct: when you count every character including spaces you get to 463. But 3 words is obviously wrong. Generic UNIX programs like wc have trouble with counting words in structured information. The command doesn’t know this file is in the JSON format which contains fields and values. You need to use specialized tools like catmandu to make sense of this output.

We also saw in the previous post how you can use catmandu to transform the JSON format into the YAML format which is easier to read and contains the same information:

$ catmandu convert JSON to YAML < weather2.json

Screenshot_28_11_14_14_06-2

 

We also learned some fixes to retrieve information out of the JSON file like retain_field(main.temp).

In this post we delve a bit deeper into ways how to point to fields in a JSON file.

This main.temp is called a JSON Path and points to a part of the JSON data you are interested in. The data, as shown above, is structured like a tree. There are top level simple fields like: base,cod,dt,id which contain only text values or numbers. There are also fields like coord that contain a deeper structure like lat and lon.

Using a JSON path you can point to every part of the JSON file using a dot-notation. For simple top level fields the path is just the name of the field:

  • base
  • cod
  • dt
  • id
  • name

For the fields with deeper structure you add a dot ‘.’ to point to the leaves:

  • clouds.all
  • coord.lat
  • coord.lon
  • main.temp
  • etc…

So for example. If you would have a deeply nested structure like:

Screenshot_28_11_14_14_34

Then you would point to the c field with the JSON Path x.y.z.a.b.c.

There is one extra path structure I would like to explain and that is the when a field can have more than one value. This is called an array and looks like this in YAML:

Screenshot_28_11_14_14_39

In the example above you see a field my which contains a deeper field colors which has 3 values. To point to one of the colors you need to use an index. The first index in a array has value 0, the second the value 1, the third the value 2. So, the JSON path of the color red would be:

  • my.color.2

In almost all programming languages things get counted starting with 0. An old programming joke is:

There are 10 types of people in the world:
Those who understand binary,
Those who don’t,
And those who count from zero.

(hint: this is a double joke, 10 in binary == 2 if you count from 0, or 3 when you count from 1).

There is one array type in our JSON report and that is the weather field. To point to the description of the weather you need the JSON Path weather.0.description.

In this post we learned the JSON Path syntax and how it can be used to point to parts of a JSON data set want to manipulate. We explained the JSON path using a YAML transformation as example, because this is easier to read. YAML and JSON are two formats that contain the same informational content (and thus both can work with JSON Path) but look different when written into a file.

Continue to Day 8: Processing JSON data from webservices >>

Advertisements

6 comments

  1. Joh

    Kudos for this tutorial! I went straight through all the sections from day one, but get stuck on this one.

    **The dot notation seems not to work correctly for me.**

    (1) I always get the whole file in YAML.
    `$ catmandu convert JSON –fix ‘retain_field(weather.0.description)’ to YAML < weather.json`

    (2) I can get the values from the top level fields correctly:
    `$ catmandu convert JSON to YAML –fix 'retain_field(name)' Gent

    (3) However, it partly works with a .fix file (cf. day 6):
    `$ catmandu convert JSON –fix weather.fix to YAML
    weather:
    – description: few clouds

    (4) When I try to access two Paths at the same time, like this:
    retain_field(main)
    retain_field(main.temp)
    retain_field(sys)
    retain_field(sys.country)

    Then I get an empty set:
    =>
    — {}

    BTW: I don’t use the virtual machine, but installed perl following the instructions on http://learn.perl.org/installing/osx.html then I installed catmandu with `cpanm Catmandu`. So I have perl 5, version 16, subversion 0 (v5.16.0) built for darwin-2level on a mac if this matters.

    Like

    • Joh

      WP filtered some parts of my comment, so I hope you can follow me…

      Regarding (3): I have to specifiy the superordinated field in my fix file, like this:

      retain_field(main)
      retain_field(main.temp)

      otherwise, I also get the whole YAML and not only the specified field

      Hope anyone could help me out…

      Like

  2. hochstenbach

    Hi! Thanks for the feedback. Yes, your conclusions are correct! At the time of writing we used the retain_field to make a deeply nested JSON path smaller (which is what you most of the time need in large fixes). Because of writing this tutorial we saw the problem you showed above and created a new version of Catmandu with a new fix called ‘retain’ which does the things you need.

    The command would then be:

    $ catmandu convert JSON –fix ‘retain(weather.0.description)’ to YAML < weather.json

    To get a new fresh Catmandu on your virtual box you need to use the command:

    $ sudo /usr/local/bin/cpanm Catmandu

    Like

    • Joh

      Hey, thanks for the quick reply!
      $ catmandu convert JSON –fix ‘retain(weather.0.description)’ to YAML < weather.json
      works fine with the recent catmandu v.(0.9209)

      Liked by 1 person

  3. Pingback: Day 6: Introduction into Catmandu | LibreCat
  4. Jens

    Shouldn’t the hint to the programmer’s joke be the other way round:
    “10 in binary == 2 if you count from **1**, or 3 when you count from **0** ” ?!?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s