论文标题

通过特征选择调查分类技术,用于从Twitter提要中进行意图挖掘

Investigating Classification Techniques with Feature Selection For Intention Mining From Twitter Feed

论文作者

Mishael, Qadri, Ayesh, Aladdin

论文摘要

在过去的十年中,社交网络成为交流和互动的最受欢迎的媒介。例如,微博服务Twitter拥有超过2亿的注册用户,他们每天交换超过6500万个职位。用户通过这些推文表达了他们的想法,想法,甚至他们的意图。大多数推文都是非正式地写的,通常是用语言写的,其中包含拼写错误和缩写的单词。本文研究了选择影响用文本挖掘技术从Twitter提要提取用户意图的特征的问题。它首先介绍我们用来从提取的Twitter feed构建自己的数据集的方法。在此之后,我们提出了两种特征选择技术,然后进行分类。在第一种技术中,我们将信息增益用作单相特征选择,然后使用监督分类算法。在第二种技术中,我们使用基于正面特征选择算法的混合方法,其中采用了两种特征选择技术,然后使用分类算法。我们使用四种分类算法检查了这两种技术。我们使用自己的数据集对它们进行评估,并对结果进行严格的审查。

In the last decade, social networks became most popular medium for communication and interaction. As an example, micro-blogging service Twitter has more than 200 million registered users who exchange more than 65 million posts per day. Users express their thoughts, ideas, and even their intentions through these tweets. Most of the tweets are written informally and often in slang language, that contains misspelt and abbreviated words. This paper investigates the problem of selecting features that affect extracting user's intention from Twitter feeds based on text mining techniques. It starts by presenting the method we used to construct our own dataset from extracted Twitter feeds. Following that, we present two techniques of feature selection followed by classification. In the first technique, we use Information Gain as a one-phase feature selection, followed by supervised classification algorithms. In the second technique, we use a hybrid approach based on forward feature selection algorithm in which two feature selection techniques employed followed by classification algorithms. We examine these two techniques with four classification algorithms. We evaluate them using our own dataset, and we critically review the results.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源