(16 August 2017) The Digital Public Library of America is launching an open-source tool for fast, large-scale data harvests from OAI repositories. The tool uses a Spark distributed processing engine to speed up and scale up the harvesting operation, and to perform complex analysis of the harvested data. It is helping DPLA to improve internal workflows and provide better service to hubs. The Spark OAI Harvester is freely available.
The announcement in full is here.