论文标题
基于文本的心理健康访谈分类 - 并置了最新的状态
Text-based classification of interviews for mental health -- juxtaposing the state of the art
论文作者
论文摘要
目前,精神病分类的艺术状况基于基于音频的分类。该论文旨在在此挑战上设计和评估最先进的文本分类网络。假设是,设计精良的基于文本的方法对基于最先进的音频的方法产生了激烈的竞争。荷兰自然语言模型受到预先训练的单语NLP模型的稀缺性的限制,因此,荷兰自然语言模型对远程语义依赖性的捕获较低。对于这个问题,本文提出了贝拉伯特(Belabbert),这是一种扩展罗伯塔(Roberta)[15]建筑的新型荷兰语模型。贝拉伯特(Belabbert)接受了一个大型荷兰语语料库(+32GB)的网络爬行文本培训。本文评估了基于文本的分类的强度后,进行了简要的探索,将框架扩展到了混合文本和基于音频的分类。该混合动力框架的目的是通过非常基本的音频分类网络来显示混合化原理。总体目标是通过证明新的基于文本的分类已经是一个强大的独立解决方案来创造混合精神病分类的基础。
Currently, the state of the art for classification of psychiatric illness is based on audio-based classification. This thesis aims to design and evaluate a state of the art text classification network on this challenge. The hypothesis is that a well designed text-based approach poses a strong competition against the state-of-the-art audio based approaches. Dutch natural language models are being limited by the scarcity of pre-trained monolingual NLP models, as a result Dutch natural language models have a low capture of long range semantic dependencies over sentences. For this issue, this thesis presents belabBERT, a new Dutch language model extending the RoBERTa[15] architecture. belabBERT is trained on a large Dutch corpus (+32GB) of web crawled texts. After this thesis evaluates the strength of text-based classification, a brief exploration is done, extending the framework to a hybrid text- and audio-based classification. The goal of this hybrid framework is to show the principle of hybridisation with a very basic audio-classification network. The overall goal is to create the foundations for a hybrid psychiatric illness classification, by proving that the new text-based classification is already a strong stand-alone solution.