Day 14: Set up your own OAI data service

14_librecatIn the last days you have learned how to store data with Catmandu. Storing data is a cool thing, but sharing data is awesome. Interoperability is important as other people may use your data (and you will profit from other people’s interoperable data)

In the day 13 tutorial we’ve learned the basic principle of metadata harvesting via OAI-PMH.

We will set up our OAI service with the Perl Dancer framework and an easy-to-use plugin called Dancer::Plugin::Catmandu::OAI. To install the required modules run:

$ cpanm Dancer

$ cpanm Dancer::Plugin::Catmandu::OAI

and you also might need

$ cpanm Template

Let’s start and index some data with Elasticsearch as learned in the previous post:

$ catmandu import OAI --url https://lib.ugent.be/oai --metadataPrefix oai_dc --set flandrica --handler oai_dc to Elasticsearch --index_name oai --bag publication

After this, you should have some data in your Elasticsearch index. Run the following command to check this:

$ catmandu export Elasticsearch --index_name oai --bag publication

Everything is fine, so let’s create a simple webservice which exposes to collected data via OAI-PMH. The following code can be downloaded from this gist.

Download this gist and create a symbolic link

$ ln -s catmandu.yml config.yml

This is necessary for the dancer app. In this case Catmandu and Dancer are using the same configuration file.

view raw README.md hosted with ❤ by GitHub
store:
oai:
package: Elasticsearch
options:
index_name: oai
bags:
publication:
cql_mapping:
default_index: basic
indexes:
_id:
op:
'any': true
'all': true
'=': true
'exact': true
field: '_id'
basic:
op:
'any': true
'all': true
'=': true
'<>': true
field: '_all'
description: "index with common fields..."
datestamp:
op:
'=': true
'<': true
'<=': true
'>=': true
'>': true
'exact': true
field: '_datestamp'
index_mappings:
publication:
properties:
_datestamp: {type: date, format: date_time_no_millis}
plugins:
'Catmandu::OAI':
store: oai
bag: publication
datestamp_field: datestamp
repositoryName: "My OAI DataProvider"
uri_base: "http://oai.service.com/oai"
adminEmail: me@example.com
earliestDatestamp: "1970-01-01T00:00:01Z"
deletedRecord: persistent
repositoryIdentifier: oai.service.com
cql_filter: "datestamp>2014-12-01T00:00:00Z"
limit: 200
delimiter: ":"
sampleIdentifier: "oai:oai.service.com:1585315"
metadata_formats:
-
metadataPrefix: oai_dc
schema: "http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
metadataNamespace: "http://www.openarchives.org/OAI/2.0/oai_dc/"
template: oai_dc.tt
fix:
- nothing()
sets:
-
setSpec: openaccess
setName: Open Access
cql: 'oa=1'
view raw catmandu.yml hosted with ❤ by GitHub
#!/usr/bin/env perl
use Dancer;
use Catmandu;
use Dancer::Plugin::Catmandu::OAI;
Catmandu->load;
Catmandu->config;
oai_provider '/oai';
dance;
view raw oai-app.pl hosted with ❤ by GitHub
<oai_dc:dc xmlns="http://www.openarchives.org/OAI/2.0/oai_dc/&quot;
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/&quot;
xmlns:dc="http://purl.org/dc/elements/1.1/&quot;
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot;
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"&gt;
[%- FOREACH var IN ['title' 'creator' 'subject' 'description' 'publisher' 'contributor' 'date' 'type' 'format' 'identifier' 'source' 'language' 'relation' 'coverage' 'rights'] %]
[%- FOREACH val IN $var %]
<dc:[% var %]>[% val | html %]</dc:[% var %]>
[%- END %]
[%- END %]
</oai_dc:dc>
view raw oai_dc.tt hosted with ❤ by GitHub

What’s going on here? Well, the script oai-app.pl defines a route /oai via the plugin Dancer::Plugin::Catmandu::OAI.
The template oai_dc.tt defines the xml output of the records. And finally the configuration file catmandu.yml handles the settings for the Dancer plugin as well as for the Elasticsearch indexing and querying.

Run the following command to start a local webserver

$ perl oai-app.pl

and point your browser to https://localhost:3000/oai?verb=Identify. To get some records go to http://localhost:3000/oai?verb=ListRecords&metadataPrefix=oai_dc.

Yes, it’s that easy. You can extend this simple example by adding fixes to transform the data as you need it.

Continue to Day 15: MARC to Dublin Core >>

Advertisement

2 comments

  1. Pingback: Day 13: Harvest data with OAI-PMH | LibreCat

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s