Day 11: Store your data in MongoDB

11_librecatMongoDB is a cross-platform document-oriented database. As a NoSQL database, MongoDB uses JSON-like documents (BSON) with dynamic schemas, making the integration of data in applications easier and faster. Install guides for various platforms are available at the MongoDB manual. To install the corresponding Catmandu module run:

$ cpanm Catmandu::Store::MongoDB

Now get some JSON data to work with:

$ wget -O banned_books.json http://www.berlin.de/rubrik/hauptstadt/verbannte_buecher/verbannte-buecher.json

First import the data to MongoDB. You have to specify in which database (--database_name) and collection (--bag) you want to store the data:

$ catmandu import -v JSON --multiline 1 to MongoDB --database_name books --bag banned < banned_books.json

Now you can export all items from a collection to different formats, like XLSX, YAML and XML:

$ catmandu export MongoDB --database_name books --bag banned to YAML
$ catmandu export MongoDB --database_name books --bag banned to XML
$ catmandu export -v MongoDB --database_name books --bag banned to XLSX --file banned_books.xlsx

You can count all items in a collection or those which match a query:

$ catmandu count MongoDB --database_name books --bag banned
$ catmandu count MongoDB --database_name books --bag banned --query '{"firstEditionPublicationYear": "1937"}'
$ catmandu count MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace": "Berlin"}'

MongoDB uses a JSON-like query language that supports a variety of operators.

You can query a collection for a specific value and export all matching items:

$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationYear": "1937"}' to JSON
$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace": "Berlin"}' to CSV --fields '_id,authorFirstname,authorLastname,title,firstEditionPublicationPlace'

You can use regular expressions for queries, e.g. to get all items which where published at a place starting with “B”:

$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace": {"$regex":"^B.*"}}' to CSV --fields '_id,firstEditionPublicationPlace'

MongoDB supports several comparison operators, e.g. you can query items which where published before/after a specific date or at specific places:

$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationYear": {"$lt":"1940"}}' to CSV --fields '_id,firstEditionPublicationYear'
$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationYear": {"$gt":"1940"}}' to CSV --fields '_id,firstEditionPublicationYear'
$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace":{"$in":["Berlin","Bern"]}}' to CSV --fields '_id,firstEditionPublicationPlace'

Logical operators are also supported, so you can combine query clauses:

$ catmandu export MongoDB --database_name books --bag banned --query '{"$and":[{"firstEditionPublicationYear": "1937"},{"firstEditionPublicationPlace": "Berlin"}]}' to JSON
$ catmandu export MongoDB --database_name books --bag banned --query '{"$or":[{"firstEditionPublicationPlace": "Berlin"},{"secondEditionPublicationPlace": "Berlin"}]}' to JSON

With the element query operators you can match items that contain a specified field

$ catmandu export MongoDB --database_name books --bag banned --query '{"field_xyz":{"$exists":"true"}}'

Collection and items can be moved within MongoDB or even to other stores or search engines:

$ catmandu move MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace": "Berlin"}' to MongoDB --database_name books --bag berlin
$ catmandu move MongoDB --database_name books --bag banned to Elasticsearch --index_name books --bag banned

You can delete whole collections from a database or just items which match a query:

$ catmandu delete MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace": "Berlin"}'
$ catmandu delete MongoDB --database_name books --bag banned

MongoDB supports several more methods. These methods are not available via the Catmandu commandline interface, but can be used in Catmandu modules and scripts.

See Catmandu::Store::MongoDB for further documentation.

Continue to Day 12: Index your data with ElasticSearch >>

Advertisements

7 comments

  1. Marcin

    I have problem to store data in MondoDB. When I try to import data:
    catmandu import -v JSON –multiline 1 to MongoDB –database_name books –bag banned < banned_books.json
    I got a message: "couldn't connect to server localhost:27017 at /usr/local/lib/perl/5.14.2/MongoDB/MongoClient.pm line 356"
    Any ideas?

    I'm using Ubuntu 12.04.3 LTS, 32bit, VirtualBox on Windows 7 with manual installation of Catmandu.

    Like

  2. Pingback: Day 10: Working with CSV and Excel files | LibreCat
  3. RH

    I’m getting an “unrecognized command: move” response when I try to run “catmandu move MongoDB –database_name books –bag banned –query ‘{“firstEditionPublicationPlace”: “Berlin”}’ to MongoDB –database_name books –bag berlin”. Any thoughts?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s