论文标题

数字人文学科的数据湖泊

Data Lakes for Digital Humanities

论文作者

Darmont, Jérôme, Favre, Cécile, Loudcher, Sabine, Noûs, Camille

论文摘要

数字人文学科项目中的传统数据具有各种格式(结构化,半结构化,文本),并且需要进行实质性转换(编码和标记,茎,柠檬水等)才能进行管理和分析。为了充分掌握此过程,我们建议将数据湖泊用作数据孤岛和大数据品种问题的解决方案。我们描述了我们目前与人文和社会科学研究人员密切合作的数据湖项目,并讨论了经营这些项目的经验教训。

Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems. We describe data lake projects we currently run in close collaboration with researchers in humanities and social sciences and discuss the lessons learned running these projects.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源