论文标题

从书籍到知识图

From Books to Knowledge Graphs

论文作者

Kokash, Natallia, Romanello, Matteo, Suyver, Ernest, Colavizza, Giovanni

论文摘要

科学出版行业的数字化转型导致了内容可发现性和信息分析的巨大改善。不幸的是,在研究领域,这些改进并不统一。艺术,人文和社会科学(AHSS)的科学文献仍然落后,部分原因是模拟积压的规模,民族语言的持续重要性以及由许多,中小型企业制成的出版商生态系统。我们提出了一种自下而上的方法,以支持发布者在开放型域中创建和维护自己的出版物知识图。我们通过释放能够从AHSS出版物的书目和索引中提取结构化信息的管道来做到这一点,从而将歧义,正常化和导出为链接数据。我们测试了有关Brill经典收藏的拟议管道,并在开源中发布实现,以进一步使用和改进。

The digital transformation of the scientific publishing industry has led to dramatic improvements in content discoverability and information analytics. Unfortunately, these improvements have not been uniform across research areas. The scientific literature in the arts, humanities and social sciences (AHSS) still lags behind, in part due to the scale of analog backlogs, the persisting importance of national languages, and a publisher ecosystem made of many, small or medium enterprises. We propose a bottom-up approach to support publishers in creating and maintaining their own publication knowledge graphs in the open domain. We do so by releasing a pipeline able to extract structured information from the bibliographies and indexes of AHSS publications, disambiguate, normalize and export it as linked data. We test the proposed pipeline on Brill's Classics collection, and release an implementation in open source for further use and improvement.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源