论文标题

通过语义非阴性矩阵分解对超导性文献的主题分析

Topic Analysis of Superconductivity Literature by Semantic Non-negative Matrix Factorization

论文作者

Stanev, Valentin, Skau, Erik, Takeuchi, Ichiro, Alexandrov, Boian S.

论文摘要

我们利用了一种名为SENMFK的最近开发的主题建模方法,通过合并文本的语义结构并添加了一个可靠的系统来确定主题数量,从而扩展了标准的非负矩阵分解(NMF)方法。在Senmfk的情况下,我们能够提取由人类专家验证的连贯主题。从这些主题中,一些是相对一般的,涵盖了广泛的概念,而大多数可以精确地映射到特定的科学效果或测量技术。主题也因普遍存在而有所不同,几乎40%的摘要中只有三个主题,而每个特定主题倾向于主导摘要的一小部分。这些结果表明,SENMFK能够对大型科学语料库进行分层和细微的分析。

We utilize a recently developed topic modeling method called SeNMFk, extending the standard Non-negative Matrix Factorization (NMF) methods by incorporating the semantic structure of the text, and adding a robust system for determining the number of topics. With SeNMFk, we were able to extract coherent topics validated by human experts. From these topics, a few are relatively general and cover broad concepts, while the majority can be precisely mapped to specific scientific effects or measurement techniques. The topics also differ by ubiquity, with only three topics prevalent in almost 40 percent of the abstract, while each specific topic tends to dominate a small subset of the abstracts. These results demonstrate the ability of SeNMFk to produce a layered and nuanced analysis of large scientific corpora.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源