One Day of a Catmandu Developer

By Patrick Hochstenbach

At Ghent University Library we are using Catmandu these days in a project to create a new discovery interface for our Aleph catalog. Daily we export MARC sequential files from several ALEPH catalogs and store them in a MongoDB store. Into this store we also add records from our SFX server and our institutional repository Biblio.

We use the MongoDB store to do cleaning of our datasets plus a FRBR-ized merge of records. This merge is logical in our setup. One collection contains MARC records, one other collection is used to create relations between these records. When the data is cleaned and merged, we export the data to a Solr indexer which is used by the BlackLight frontend.

In the image below the architecture is shown. The Catmandu trail is clearly visible. For importing MARC records into MongoDB we use Catmandu importers. When we have all the data in the store we run a bunch of Catmandu fixers to cleanup the data. At the end of the day we use Catmandu exporters to send the data as JSON files to Solr where we index the data and make it available in BlackLight.

20130618_discovery

Advertisements

5 comments

  1. Emmanuel Di Pretoro

    Can I asked how you “FRBRize” your records? Are you using USBC or a variant of it? Are you using another in house method? Is this FRBRization done via a Catmandu plugin or a fix?

    Thank you in advance for your answer!

    Like

    • hochstenbach

      We in Ghent use in house methods targeted to our local needs and local infrastructure. Catmandu doesn’t provide an out of the box FRBRization but provides (Perl) libraries to ease storage, indexing and transformation of MARC records to be able to create such environment.

      A FRBRization toolkit would be a whole project .. currently there is no man power to support this worldwide.

      Like

      • edipretoro

        Thank you for your answer.

        Any chance to have more details about the methods used? I’m a professor at IESSID (http://www.iessid.be) where we’re training future librarians. I’m currently researching various methods to dedupe and/or FRBRize bibliographic records in ordre to write a study. I hope to write some code to test these methods, eg USBC for which I wrote a Perl module (https://github.com/edipretoro/Biblio-Record-USBC). I didn’t yet code a plugin (or a fix) for Catmandu but this definitively something I have in mind. I don’t know yet what is the best way to do it the « Catmandu way » but I’ll try first with a fix.

        Anyway, I hope you’ll be able to point me some methods you have used to FRBRize your collections!

        Have a nice day!

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s