5Q032E@SMM4H'22：与COVID-19相关的推文中的基于变压器的前提分类

论文标题

5Q032E@SMM4H'22：与COVID-19相关的推文中的基于变压器的前提分类

5q032e@SMM4H'22: Transformer-based classification of premise in tweets related to COVID-19

论文作者

Porvatov, Vadim, Semenova, Natalia

论文摘要

社交网络数据评估的自动化是自然语言处理的经典挑战之一。在Covid-19的大流行期间，关于了解对健康秩序的态度，公共信息中的采矿人们的立场变得至关重要。在本文中，作者提出了基于变压器体系结构的预测模型，以对Twitter文本中的前提进行分类。这项工作是作为2022年社交媒体挖掘（SMM4H）研讨会的一部分完成的。我们探索了基于现代变压器的分类器，以便有效地构建管道来有效捕获推文语义。我们在Twitter数据集上的实验表明，在前提预测任务的情况下，罗伯塔（Roberta）优于其他变压器模型。该模型在ROC AUC值0.807方面实现了竞争性能，而F1得分为0.7648。

Automation of social network data assessment is one of the classic challenges of natural language processing. During the COVID-19 pandemic, mining people's stances from public messages have become crucial regarding understanding attitudes towards health orders. In this paper, the authors propose the predictive model based on transformer architecture to classify the presence of premise in Twitter texts. This work is completed as part of the Social Media Mining for Health (SMM4H) Workshop 2022. We explored modern transformer-based classifiers in order to construct the pipeline efficiently capturing tweets semantics. Our experiments on a Twitter dataset showed that RoBERTa is superior to the other transformer models in the case of the premise prediction task. The model achieved competitive performance with respect to ROC AUC value 0.807, and 0.7648 for the F1 score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题