DPLA launches open-source Spark OAI Harvester
(16 August 2017) The Digital Public Library of America is launching an open-source tool for fast, large-scale data harvests from OAI repositories. The tool uses a Spark distributed processing engine to speed up and scale up the harvesting operation, and to perform complex analysis of the harvested data. It is helping DPLA to improve internal workflows…