Day 11: Store your data in MongoDB
MongoDB is a cross-platform document-oriented database. As a NoSQL database, MongoDB uses JSON-like documents (BSON) with dynamic schemas, making the integration of data in applications easier and faster. Install guides for various platforms are available at the MongoDB manual. To install the corresponding Catmandu module run:
$ cpanm Catmandu::Store::MongoDB
Now get some JSON data to work with:
$ wget -O banned_books.json https://lib.ugent.be/download/librecat/data/verbannte-buecher.json
First import the data to MongoDB. You have to specify in which database (--database_name) and collection (--bag) you want to store the data:
$ catmandu import -v JSON --multiline 1 to MongoDB --database_name books --bag banned < banned_books.json
Now you can export all items from a collection to different formats, like XLSX, YAML and XML:
$ catmandu export MongoDB --database_name books --bag banned to YAML
$ catmandu export MongoDB --database_name books --bag banned to XML
$ catmandu export -v MongoDB --database_name books --bag banned to XLSX --file banned_books.xlsx
You can count all items in a collection or those which match a query:
$ catmandu count MongoDB --database_name books --bag banned
$ catmandu count MongoDB --database_name books --bag banned --query '{"firstEditionPublicationYear": "1937"}'
$ catmandu count MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace": "Berlin"}'
MongoDB uses a JSON-like query language that supports a variety of operators.
You can query a collection for a specific value and export all matching items:
$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationYear": "1937"}' to JSON
$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace": "Berlin"}' to CSV --fields '_id,authorFirstname,authorLastname,title,firstEditionPublicationPlace'
You can use regular expressions for queries, e.g. to get all items which where published at a place starting with “B”:
$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace": {"$regex":"^B.*"}}' to CSV --fields '_id,firstEditionPublicationPlace'
MongoDB supports several comparison operators, e.g. you can query items which where published before/after a specific date or at specific places:
$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationYear": {"$lt":"1940"}}' to CSV --fields '_id,firstEditionPublicationYear'
$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationYear": {"$gt":"1940"}}' to CSV --fields '_id,firstEditionPublicationYear'
$ catmandu export MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace":{"$in":["Berlin","Bern"]}}' to CSV --fields '_id,firstEditionPublicationPlace'
Logical operators are also supported, so you can combine query clauses:
$ catmandu export MongoDB --database_name books --bag banned --query '{"$and":[{"firstEditionPublicationYear": "1937"},{"firstEditionPublicationPlace": "Berlin"}]}' to JSON
$ catmandu export MongoDB --database_name books --bag banned --query '{"$or":[{"firstEditionPublicationPlace": "Berlin"},{"secondEditionPublicationPlace": "Berlin"}]}' to JSON
With the element query operators you can match items that contain a specified field
$ catmandu export MongoDB --database_name books --bag banned --query '{"field_xyz":{"$exists":"true"}}'
Collection and items can be moved within MongoDB or even to other stores or search engines:
$ catmandu move MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace": "Berlin"}' to MongoDB --database_name books --bag berlin
$ catmandu move MongoDB --database_name books --bag banned to Elasticsearch --index_name books --bag banned
You can delete whole collections from a database or just items which match a query:
$ catmandu delete MongoDB --database_name books --bag banned --query '{"firstEditionPublicationPlace": "Berlin"}'
$ catmandu delete MongoDB --database_name books --bag banned
MongoDB supports several more methods. These methods are not available via the Catmandu commandline interface, but can be used in Catmandu modules and scripts.
See Catmandu::Store::MongoDB for further documentation.
Continue to Day 12: Index your data with ElasticSearch >>
I have problem to store data in MondoDB. When I try to import data:
catmandu import -v JSON –multiline 1 to MongoDB –database_name books –bag banned < banned_books.json
I got a message: "couldn't connect to server localhost:27017 at /usr/local/lib/perl/5.14.2/MongoDB/MongoClient.pm line 356"
Any ideas?
I'm using Ubuntu 12.04.3 LTS, 32bit, VirtualBox on Windows 7 with manual installation of Catmandu.
LikeLike
After installation of MongoDB (see http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/) you have to start the server:
$ sudo service mongod start
Then you can test the connection to the MongoDB server with the “mongo” client:
$ mongo
If that works, try out again the catmandu command.
LikeLike
Thanks! It works:)
LikeLike
I’m getting an “unrecognized command: move” response when I try to run “catmandu move MongoDB –database_name books –bag banned –query ‘{“firstEditionPublicationPlace”: “Berlin”}’ to MongoDB –database_name books –bag berlin”. Any thoughts?
LikeLike
If you have a recent version of Catmandu use ‘copy’ instead of ‘move’.
LikeLike
Thank you!
LikeLike