论文标题
一种基于上下文的歧义模型,用于使用概念袋方法的情感概念
A Context-based Disambiguation Model for Sentiment Concepts Using a Bag-of-concepts Approach
论文作者
论文摘要
随着不同社交网络上用户生成的内容以及在线消费者系统(例如亚马逊)的广泛传播,Internet上可用的大量信息的数量已增加。情感分析的主要任务之一是检测文本中的极性。现有的极性检测方法主要集中于关键字及其天真的频率计数;但是,他们不太考虑自然概念的含义和隐式维度。尽管背景知识在确定概念的极性方面起着关键作用,但在极性检测方法中已被忽略。这项研究提出了一种基于上下文的模型,以使用常识知识来解决模棱两可的极性概念。首先,提出了一个模型,以通过计算概率分布来生成基于Senticnet的模棱两可的情感概念来源。然后,该模型使用概念袋方法来消除歧义和语义增强,并使用概念网处理来克服丢失的知识。 ConceptNet是一个具有大量常识概念的大型语义网络。在本文中,点相互信息(PMI)度量用于选择与模棱两可概念有牢固关系的上下文概念。歧义概念的极性是使用正/负面上下文概念以及语义知识基础中概念的关系精确检测的。文本表示方案使用numberBatch在语义上丰富,这是一个基于概念网络网络概念的单词嵌入模型。通过应用称为Semeval的产品评论来评估所提出的模型。实验结果表明,准确率为82.07%,代表了所提出的模型的有效性。
With the widespread dissemination of user-generated content on different social networks, and online consumer systems such as Amazon, the quantity of opinionated information available on the Internet has been increased. One of the main tasks of the sentiment analysis is to detect polarity within a text. The existing polarity detection methods mainly focus on keywords and their naive frequency counts; however, they less regard the meanings and implicit dimensions of the natural concepts. Although background knowledge plays a critical role in determining the polarity of concepts, it has been disregarded in polarity detection methods. This study presents a context-based model to solve ambiguous polarity concepts using commonsense knowledge. First, a model is presented to generate a source of ambiguous sentiment concepts based on SenticNet by computing the probability distribution. Then the model uses a bag-of-concepts approach to remove ambiguities and semantic augmentation with the ConceptNet handling to overcome lost knowledge. ConceptNet is a large-scale semantic network with a large number of commonsense concepts. In this paper, the point mutual information (PMI) measure is used to select the contextual concepts having strong relationships with ambiguous concepts. The polarity of the ambiguous concepts is precisely detected using positive/negative contextual concepts and the relationship of the concepts in the semantic knowledge base. The text representation scheme is semantically enriched using Numberbatch, which is a word embedding model based on the concepts from the ConceptNet semantic network. The proposed model is evaluated by applying a corpus of product reviews, called Semeval. The experimental results revealed an accuracy rate of 82.07%, representing the effectiveness of the proposed model.