论文标题
辅助跨模式表示学习具有三重损失功能的在线手写识别
Auxiliary Cross-Modal Representation Learning with Triplet Loss Functions for Online Handwriting Recognition
论文作者
论文摘要
与仅使用一种模式相比,跨模式表示学习学习了两种或多种方式之间的共享嵌入,以提高给定任务中的性能。跨模式表示从不同的数据类型中学习(例如图像和时间序列数据(例如,音频或文本数据))需要深度的度量学习损失,以最大程度地减少模态嵌入之间的距离。在本文中,我们建议使用对比度或三胞胎损失,该损失使用正面和负面身份来创建具有不同标签的示例对,以在图像和时间序列模态之间学习(CMR-IS)之间的跨模式表示。通过适应跨模式表示学习的三胞胎损失,可以通过利用辅助(图像分类)任务的其他信息来实现主要(时间序列分类)任务的更高精度。我们提出了单个标签和序列到序列分类任务的动态余量的三重损失。我们对合成图像和时间序列数据以及离线手写识别(HWR)的数据以及从传感器增强笔的在线HWR进行了广泛的评估,用于对书面单词进行分类。我们的实验表明,由于改进的跨模式表示,分类的精度提高了,更快的收敛性和更好的概括性。此外,更合适的概括性会导致在线HWR的作者之间具有更好的适应性。
Cross-modal representation learning learns a shared embedding between two or more modalities to improve performance in a given task compared to using only one of the modalities. Cross-modal representation learning from different data types -- such as images and time-series data (e.g., audio or text data) -- requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the contrastive or triplet loss, which uses positive and negative identities to create sample pairs with different labels, for cross-modal representation learning between image and time-series modalities (CMR-IS). By adapting the triplet loss for cross-modal representation learning, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. We present a triplet loss with a dynamic margin for single label and sequence-to-sequence classification tasks. We perform extensive evaluations on synthetic image and time-series data, and on data for offline handwriting recognition (HWR) and on online HWR from sensor-enhanced pens for classifying written words. Our experiments show an improved classification accuracy, faster convergence, and better generalizability due to an improved cross-modal representation. Furthermore, the more suitable generalizability leads to a better adaptability between writers for online HWR.