MinTrec：用于多模式意图识别的新数据集

论文标题

MinTrec：用于多模式意图识别的新数据集

MIntRec: A New Dataset for Multimodal Intent Recognition

论文作者

Zhang, Hanlei, Xu, Hua, Wang, Xin, Zhou, Qianrui, Zhao, Shaojie, Teng, Jiayan

论文摘要

多模式意图识别是理解现实世界多模式场景中人类语言的重要任务。由于基准数据集的限制，大多数现有意图识别方法在利用多模式信息方面存在局限性。本文介绍了一个新颖的数据集，用于多模式意图识别（MinTrec）来解决此问题。它根据电视连续剧收集的数据制定了粗粒和细粒的意图分类法。该数据集由2,224个具有文本，视频和音频方式的高质量样本组成，并在二十个意图类别中具有多模式注释。此外，我们在每个视频片段中提供带注释的扬声器框架盒，并为扬声器注释实现自动过程。 MinTrec对研究人员有助于挖掘不同方式之间的关系，以增强意图识别的能力。我们通过适应三种强大的多模式融合方法来构建基准，从每种模态和模型跨模式相互作用中提取特征。广泛的实验表明，使用非语言方式与仅文本模式相比，实现了实质性改进，这表明使用多模式信息以识别意图。表现最佳的方法与人类之间的差距表明了这项任务对社区的挑战和重要性。完整的数据集和代码可在https://github.com/thuiar/mintrec上使用。

Multimodal intent recognition is a significant task for understanding human language in real-world multimodal scenes. Most existing intent recognition methods have limitations in leveraging the multimodal information due to the restrictions of the benchmark datasets with only text information. This paper introduces a novel dataset for multimodal intent recognition (MIntRec) to address this issue. It formulates coarse-grained and fine-grained intent taxonomies based on the data collected from the TV series Superstore. The dataset consists of 2,224 high-quality samples with text, video, and audio modalities and has multimodal annotations among twenty intent categories. Furthermore, we provide annotated bounding boxes of speakers in each video segment and achieve an automatic process for speaker annotation. MIntRec is helpful for researchers to mine relationships between different modalities to enhance the capability of intent recognition. We extract features from each modality and model cross-modal interactions by adapting three powerful multimodal fusion methods to build baselines. Extensive experiments show that employing the non-verbal modalities achieves substantial improvements compared with the text-only modality, demonstrating the effectiveness of using multimodal information for intent recognition. The gap between the best-performing methods and humans indicates the challenge and importance of this task for the community. The full dataset and codes are available for use at https://github.com/thuiar/MIntRec.

下载PDF全文

下载文献需遵守相关版权规定

论文标题