Importing files from a hotfolder directory
The Catmandu data processing toolkit facilitates many import, export, and conversion tasks by support of common APIs (e.g. SRU, OAI-PMH) and databases (e.g. MongoDB, CouchDB, SQL…). But sometimes the best API and database is the file system. In this brief article I’ll show how to use a “hotfolder” to automatically import files into another Catmandu store.
A hotfolder is a directory in which files can be placed to automatically get processed. To facilitate the creation of such directories I created the CPAN module File::Hotfolder. Let’s first define a sample importer and storage in catmandu.yml configuration file:
--- importer: json: package: JSON options: multiline: 1 store: couchdb: package: CouchDB options: default_bag: import ...
We can now manually import JSON files into the import
database of a local CouchDB like this:
catmandu import json to couchdb < filename.json
Manually calling such command for each file can be slow and requires access to the command line. How about defining a hotfolder to automatically import all JSON files into CouchDB? Here is an implementation:
use Catmandu -all; use File::Hotfolder; use File::Basename; my $hotfolder = "import"; my $importer = "json"; my $store = "couchdb"; my $suffix = qr{\.json}; my $store = store($store); watch( $hotfolder, filter => $suffix, scan => 1, delete => 1, print => WATCH_DIR | FOUND_FILE | CATCH_ERROR, callback => sub { $store->add_many( importer($importer, file => shift) ); }, catch => 1, )->loop;
The directory import
is first scanned for existing files with extension .json
and then watched for modified or new files. As soon as a file has been found, it is imported. The CATCH_ERROR
options ensures to not kill the program if an import failed, for instance because of invalid JSON.
The current version of File::Hotfolder only works with Unix but it may be extended to other operating systems as well.