Catmandu 1.04

Catmandu 1.04 has been released to with some nice new features. There are some new Fix routines that were asked by our community:

error

The “error” fix stops immediately the execution of the Fix script and throws an error. Use this to abort the processing of a data stream:

$ cat myfix.fix
unless exists(id)
    error("no id found?!")
end
$ catmandu convert JSON --fix myfix.fix < data.json

valid

The “valid” fix condition can be used to validate a record (or part of a record) against a JSONSchema. For instance we can select only the valid records from a stream:

$ catmandu convert JSON --fix 'select valid('', JSONSchema, schema:myschema.json)' < data.json

Or, create some logging:

$ cat myfix.fix
unless valid(author, JSONSchema, schema:authors.json)
log("errors in the author field")
end
$ catmandu convert JSON --fix myfix.fix < data.json

rename

The “rename” fix can be used to recursively change the names of fields in your documents. For example, when you have this JSON input:

{
"foo.bar": "123",
"my.name": "Patrick"
}

you can transform all periods (.) in the key names to underscores with this fix:

rename('','\.','_')

The first parameter is the fields “rename” should work on (in our case it is an empty string, meaning the complete record). The second and third parameters are the regex search and replace parameters. The result of this fix is:

{
"foo_bar": "123",
"my_name": "Patrick"
}

The “rename” fix will only work on the keys of JSON paths. For example, given the following path:

my.deep.path.x.y.z

The keys are:

  • my
  • deep
  • path
  • x
  • y
  • z

The second and third argument search and replaces these seperate keys. When you want to change the paths as a whole take a look at the “collapse()” and “expand()” fixes in combination with the “rename” fix:

collapse()
rename('',"my\.deep","my.very.very.deep")
expand()

Now the generated path will be:

my.very.very.deep.path.x.y.z

Of course the example above could be written more simple as “move_field(my.deep,my.very.very.deep)”, but it serves as an example  that powerful renaming is possible.

import_from_string

This Fix is a generalisation of the “from_json” Fix. It can transform a serialised string field in your data into an array of data. For instance, take the following YAML record:


---
foo: '{"name":"patrick"}'
...

The field ‘foo’ contains a JSON fragment. You can transform this JSON into real data using the following fix:


import_from_string(foo,JSON)

Which creates a ‘foo’ array containing the deserialised JSON:


---
foo:
- name: patrick

The “import_from_string” look very much like the “from_json” string, but you can use any Catmandu::Importer. It always created an array of hashes. For instance, given the following YAML record:


---
foo: "name;hobby\nnicolas;drawing\npatrick;music"

You can transform the CSV fragment in the ‘foo’ field into data by using this fix:


import_from_string(foo,CSV,sep_char:";")

Which gives as result:


---
foo:
- hobby: drawing
  name: nicolas
- hobby: music
  name: patrick
...

I the same way it can process MARC, XML, RDF, YAML or any other format supported by Catmandu.

export_to_string

The fix “export_to_string” is the opposite of “import_from_string” and is the generalisation of the “to_json” fix. Given the YAML from the previous example:


---
foo:
- hobby: drawing
  name: nicolas
- hobby: music
  name: patrick
...

You can create a CSV fragment in the ‘foo’ field with the following fix:


export_to_string(foo,CSV,sep_char:";")

Which gives as result:


---
foo: "name;hobby\nnicolas;drawing\npatrick;music"

search_in_store

The fix “search_in_store” is a generalisation of the “lookup_in_store” fix. The latter is used to query the “_id” field in a Catmandu::Store and return the first hit. The former, “search_in_store” can query any field in a store and return all (or a subset) of the results. For instance, given the YAML record:


---
foo: "(title:ABC OR author:dave) AND NOT year:2013"
...

then the following fix will replace the ‘foo’ field with the result of the query in a Solr index:


search_in_store('foo', store:Solr, url: 'http://localhost:8983/solr/catalog')

As a result, the document will be updated like:


---
foo:
    start: 0,
    limit: 0,
    hits: [...],
    total: 1000
...

where

  • start: the starting index of the search result
  • limit: the number of result per page
  • hits: an array containing the data from the result page
  • total: the total number of search results

Every Catmandu::Solr can have another layout of the result page. Look at the documentation of the Catmandu::Solr implementations for the specific details.

Thanks for all your support for Catmandu and keep on data converting 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s