Day 14: Set up your own OAI data service

14_librecatIn the last days you have learned how to store data with Catmandu. Storing data is a cool thing, but sharing data is awesome. Interoperability is important as other people may use your data (and you will profit from other people’s interoperable data)

In the day 13 tutorial we’ve learned the basic principle of metadata harvesting via OAI-PMH.

We will set up our OAI service with the Perl Dancer framework and an easy-to-use plugin called Dancer::Plugin::Catmandu::OAI. To install the required modules run:

$ cpanm Dancer

$ cpanm Dancer::Plugin::Catmandu::OAI

and you also might need

$ cpanm Template

Let’s start and index some data with Elasticsearch as learned in the previous post:

$ catmandu import OAI --url http://pub.uni-bielefeld.de/oai --metadataPrefix oai_dc --from 2014-12-01 --handler oai_dc to Elasticsearch --index_name oai --bag publication


$ catmandu import OAI --url http://ds.ub.uni-bielefeld.de/viewer/oai --metadataPrefix oai_dc --from 2014-12-01T00:00:00Z --handler oai_dc to Elasticsearch --index_name oai --bag publication

After this, you should have some data in your Elasticsearch index. Run the following command to check this:

$ catmandu export Elasticsearch --index_name oai --bag publication

Everything is fine, so let’s create a simple webservice which exposes to collected data via OAI-PMH. The following code can be downloaded from this gist.

What’s going on here? Well, the script oai-app.pl defines a route /oai via the plugin Dancer::Plugin::Catmandu::OAI.
The template oai_dc.tt defines the xml output of the records. And finally the configuration file catmandu.yml handles the settings for the Dancer plugin as well as for the Elasticsearch indexing and querying.

Run the following command to start a local webserver

$ perl oai-app.pl

and point your browser to https://localhost:3000/oai?verb=Identify. To get some records go to http://localhost:3000/oai?verb=ListRecords&metadataPrefix=oai_dc.

Yes, it’s that easy. You can extend this simple example by adding fixes to transform the data as you need it.

Continue to Day 15: MARC to Dublin Core >>

Advertisements

2 comments

  1. Pingback: Day 13: Harvest data with OAI-PMH | LibreCat

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s