论文标题
科学科学的语义和关系空间:文章矢量化的深度学习模型
Semantic and Relational Spaces in Science of Science: Deep Learning Models for Article Vectorisation
论文作者
论文摘要
在上个世纪,我们观察到全球科学出版物的稳定和指数增长。大量的可用文献使基于手动检查的领域和领域之间的研究进行了整体分析。需要自动支持文学审查过程的技术才能找到嵌入在科学出版物中的认知和社会模式。在计算机科学中,已经开发了新工具来处理大量数据。特别是,深度学习技术为自动化的端到端模型开辟了可能性,以将观察结果投影到一个新的低维空间,其中每个观察值最相关的信息都被突出显示。使用深度学习来建立科学出版物的新表示是一个越来越多但仍在新兴的研究领域。本文的目的是讨论深度学习的潜力和局限性,以收集有关科学研究文章的见解。我们使用自然语言处理(NLP)和图神经网络(GNNS)基于文章的语义和关系方面的文档级嵌入。我们探索这些技术产生的不同结果。我们的结果表明,使用NLP我们可以编码文章的语义空间,而使用GNN,我们能够建立一个关系空间,其中还编码了研究界的社会实践。
Over the last century, we observe a steady and exponentially growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are embedded in scientific publications. In computer sciences, new tools have been developed to deal with large volumes of data. In particular, deep learning techniques open the possibility of automated end-to-end models to project observations to a new, low-dimensional space where the most relevant information of each observation is highlighted. Using deep learning to build new representations of scientific publications is a growing but still emerging field of research. The aim of this paper is to discuss the potential and limits of deep learning for gathering insights about scientific research articles. We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs). We explore the different outcomes generated by those techniques. Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.