使用预训练的多语言嵌入和细分的代码切换文本的情感分类

论文标题

使用预训练的多语言嵌入和细分的代码切换文本的情感分类

Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation

论文作者

Aryal, Saurav K., Prioleau, Howard, Washington, Gloria

论文摘要

随着全球化和移民的越来越多，各种研究估计，大约一半的世界人口是双语的。因此，个人在随意的对话设置中同时使用两种或更多语言或方言。但是，大多数研究是自然语言处理的重点是单语文本。为了进一步进行代码转换情感分析的工作，我们提出了一种多步自然语言处理算法，利用混合文本中的代码转换点并围绕这些确定的观点进行情感分析。提出的情感分析算法使用的语义相似性来自大型预训练的多语言模型，该模型带有手工制作的正面和负面词，以确定代码开关文本的极性。所提出的方法的准确性优于可比的基线模型，而在西班牙语英语数据集上的F1得分为11.64％。从理论上讲，拟议的算法可以扩展，以分析人类专业知识有限的多种语言。

With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is focused on monolingual text. To further the work in code-switched sentiment analysis, we propose a multi-step natural language processing algorithm utilizing points of code-switching in mixed text and conduct sentiment analysis around those identified points. The proposed sentiment analysis algorithm uses semantic similarity derived from large pre-trained multilingual models with a handcrafted set of positive and negative words to determine the polarity of code-switched text. The proposed approach outperforms a comparable baseline model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English dataset. Theoretically, the proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题