论文标题
两阶段蒙版的LM方法用于术语设置扩展
A Two-Stage Masked LM Method for Term Set Expansion
论文作者
论文摘要
我们解决了术语设置扩展的任务(TSE):给定一个从语义类中的示例术语的小种子集,从而找到了该类的更多成员。该任务具有很大的实用性,也是理论上的实用性,因为它需要从少数示例中进行概括。 TSE任务的先前方法可以被描述为基于分布的或基于模式的。我们利用神经掩盖语言模型(MLM)的力量,并提出了一种新颖的TSE算法,该算法结合了基于模式的和分布方法。由于种子集的尺寸较小,微调方法无效,要求更多创造性地使用MLM。该想法的要旨是使用MLM首先使用有关种子集的信息模式,然后通过概括这些模式来获得更多的种子类成员。我们的方法的表现优于最先进的TSE算法。实施可用:https://github.com/ guykush/termsetExpansion-mpb/
We tackle the task of Term Set Expansion (TSE): given a small seed set of example terms from a semantic class, finding more members of that class. The task is of great practical utility, and also of theoretical utility as it requires generalization from few examples. Previous approaches to the TSE task can be characterized as either distributional or pattern-based. We harness the power of neural masked language models (MLM) and propose a novel TSE algorithm, which combines the pattern-based and distributional approaches. Due to the small size of the seed set, fine-tuning methods are not effective, calling for more creative use of the MLM. The gist of the idea is to use the MLM to first mine for informative patterns with respect to the seed set, and then to obtain more members of the seed class by generalizing these patterns. Our method outperforms state-of-the-art TSE algorithms. Implementation is available at: https://github.com/ guykush/TermSetExpansion-MPB/