论文标题
用于抑郁症和赌博障碍用户级分类的端到端集合变压器
An End-to-End Set Transformer for User-Level Classification of Depression and Gambling Disorder
论文作者
论文摘要
这项工作提出了一种用于赌博成瘾和抑郁症的用户级分类的变压器体系结构,可训练端到端。与在邮政级别上运行的其他方法相反,我们处理来自特定个人的一组社交媒体帖子,以利用帖子之间的交互并消除邮政级别的标签噪声。我们利用了这样一个事实,即通过不注入位置编码,多头注意是置换不变的,并且在用现代预审预周句的编码器(Roberta / Minilm)编码后,我们会从用户中随机处理的文本集。此外,我们的体系结构可以使用现代功能归因方法来解释,并通过识别用户文本集中的区分帖子来自动创建自动数据集。我们对超参数进行消融研究,并评估我们的ERISK 2022 LAB的方法,以早期发现病理赌博的迹象和抑郁症的早期风险检测。我们的团队Blue提出的方法获得了最佳的ERDE5分数为0.015,而病理赌博检测的第二高ERDE50分数为0.009。为了早期检测到抑郁症,我们获得了0.027的第二好的ERDE50。
This work proposes a transformer architecture for user-level classification of gambling addiction and depression that is trainable end-to-end. As opposed to other methods that operate at the post level, we process a set of social media posts from a particular individual, to make use of the interactions between posts and eliminate label noise at the post level. We exploit the fact that, by not injecting positional encodings, multi-head attention is permutation invariant and we process randomly sampled sets of texts from a user after being encoded with a modern pretrained sentence encoder (RoBERTa / MiniLM). Moreover, our architecture is interpretable with modern feature attribution methods and allows for automatic dataset creation by identifying discriminating posts in a user's text-set. We perform ablation studies on hyper-parameters and evaluate our method for the eRisk 2022 Lab on early detection of signs of pathological gambling and early risk detection of depression. The method proposed by our team BLUE obtained the best ERDE5 score of 0.015, and the second-best ERDE50 score of 0.009 for pathological gambling detection. For the early detection of depression, we obtained the second-best ERDE50 of 0.027.