论文标题
社交媒体作为水质反馈的即时来源
Social Media as an Instant Source of Feedback on Water Quality
论文作者
论文摘要
本文着重于重要的环境挑战。也就是说,通过分析社交媒体作为直接反馈来源的潜力,水质。这项工作的主要目的是自动分析和检索与水质相关的社交媒体帖子,特别关注描述水质各个方面的文章,例如水彩,气味,味觉和相关疾病。为此,我们提出了一个新型框架,其中包含不同的预处理,数据增强和分类技术。总共有三个不同的神经网络(NNS)体系结构,即(i)来自变形金刚(BERT)的双向编码器表示,(ii)可靠地优化的BERT BERT预训练方法(XLM-ROBERTA)和(III)自定义长期记忆(LSTM)模型(LSTM)模型,用于优点基于绩效的Fifusion Fusion计划。对于基于绩效的重量分配到模型,比较了几种优化和搜索技术,包括粒子群优化(PSO),遗传算法(GA),蛮力(BF),Nelder-Mead和Powell的优化方法。我们还提供了单个模型的评估,其中使用BERT模型获得了最高的F1得分为0.81。在基于绩效的融合中,BF以F1得分得分为0.852,可以获得总体更好的结果。 我们还提供了与现有方法的比较,在该方法中,我们提出的解决方案得到了重大改进。我们认为对这个相对新主题的严格分析将为未来的研究提供基准。
This paper focuses on an important environmental challenge; namely, water quality by analyzing the potential of social media as an immediate source of feedback. The main goal of the work is to automatically analyze and retrieve social media posts relevant to water quality with particular attention to posts describing different aspects of water quality, such as watercolor, smell, taste, and related illnesses. To this aim, we propose a novel framework incorporating different preprocessing, data augmentation, and classification techniques. In total, three different Neural Networks (NNs) architectures, namely (i) Bidirectional Encoder Representations from Transformers (BERT), (ii) Robustly Optimized BERT Pre-training Approach (XLM-RoBERTa), and (iii) custom Long short-term memory (LSTM) model, are employed in a merit-based fusion scheme. For merit-based weight assignment to the models, several optimization and search techniques are compared including a Particle Swarm Optimization (PSO), a Genetic Algorithm (GA), Brute Force (BF), Nelder-Mead, and Powell's optimization methods. We also provide an evaluation of the individual models where the highest F1-score of 0.81 is obtained with the BERT model. In merit-based fusion, overall better results are obtained with BF achieving an F1-score score of 0.852. We also provide comparison against existing methods, where a significant improvement for our proposed solutions is obtained. We believe such rigorous analysis of this relatively new topic will provide a baseline for future research.