In the last days you have learned how to store data with Catmandu. Storing data is a cool thing, but sharing data is awesome. Interoperability is important as other people may use your data (and you will profit from other people’s interoperable data)
In the day 13 tutorial we’ve learned the basic principle of metadata harvesting via OAI-PMH.
$ cpanm Dancer
$ cpanm Dancer::Plugin::Catmandu::OAI
and you also might need
$ cpanm Template
Let’s start and index some data with Elasticsearch as learned in the previous post:
$ catmandu import OAI --url http://pub.uni-bielefeld.de/oai --metadataPrefix oai_dc --from 2014-12-01 --handler oai_dc to Elasticsearch --index_name oai --bag publication
$ catmandu import OAI --url http://ds.ub.uni-bielefeld.de/viewer/oai --metadataPrefix oai_dc --from 2014-12-01T00:00:00Z --handler oai_dc to Elasticsearch --index_name oai --bag publication
After this, you should have some data in your Elasticsearch index. Run the following command to check this:
$ catmandu export Elasticsearch --index_name oai --bag publication
Everything is fine, so let’s create a simple webservice which exposes to collected data via OAI-PMH. The following code can be downloaded from this gist.
What’s going on here? Well, the script oai-app.pl defines a route /oai via the plugin Dancer::Plugin::Catmandu::OAI.
The template oai_dc.tt defines the xml output of the records. And finally the configuration file catmandu.yml handles the settings for the Dancer plugin as well as for the Elasticsearch indexing and querying.
Run the following command to start a local webserver
$ perl oai-app.pl
and point your browser to
https://localhost:3000/oai?verb=Identify. To get some records go to
Yes, it’s that easy. You can extend this simple example by adding fixes to transform the data as you need it.
Continue to Day 15: MARC to Dublin Core >>