(19 May 2021) The digitization of Chinese classics is challenging, as Chinese ancient characters are complex. Throughout history, one Chinese character might have several variants and written forms. Digitizing Chinese ancient books through optical character recognition (OCR) not only facilitates machine reading but also gives a new life to numerous ancient books for public peruse.
Alibaba DAMO Academy (DAMO), the global research institute of Alibaba, started a new project to digitize Chinese classics together with the Alibaba Foundation, the Library of University of California, Berkeley, Sichuan University, National Library of China, and Zhejiang Library. The program aims to digitize and aggregate ancient Chinese books and convert scanned images into texts for open access. This way, libraries in China and abroad can work together to make their ancient Chinese books freely available to the world.
More information can be found here.