IMU2CLIP：IMU运动传感器的多模式对比度学习来自以自我为中心的视频和文字

论文标题

IMU2CLIP：IMU运动传感器的多模式对比度学习来自以自我为中心的视频和文字

IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

论文作者

Moon, Seungwhan, Madotto, Andrea, Lin, Zhaojiang, Dirafzoon, Alireza, Saraf, Aparajita, Bearman, Amy, Damavandi, Babak

论文摘要

我们提出了IMU2CLIP，这是一种新型的预训练方法，用于通过将它们投影到对比的语言图像预训练（剪辑）的联合表示空间中，以使惯性测量单元（IMU）运动传感器记录与视频和文本相结合。所提出的方法允许IMU2CLIP将人类动作（按IMU传感器衡量）转化为相应的文本描述和视频，同时保留了这些模式之间的传递性。我们探索了IMU2CLIP启用的几个新的基于IMU的应用程序，例如具有运动数据的基于运动的媒体检索和自然语言推理任务。此外，我们表明，对于每个应用程序进行微调（例如活动识别），IMU2CLIP可以显着改善下游性能，这证明了IMU2CLIP作为新的预培训资源的普遍用法。我们的代码将公开可用。

We present IMU2CLIP, a novel pre-training approach to align Inertial Measurement Unit (IMU) motion sensor recordings with video and text, by projecting them into the joint representation space of Contrastive Language-Image Pre-training (CLIP). The proposed approach allows IMU2CLIP to translate human motions (as measured by IMU sensors) into their corresponding textual descriptions and videos -- while preserving the transitivity across these modalities. We explore several new IMU-based applications that IMU2CLIP enables, such as motion-based media retrieval and natural language reasoning tasks with motion data. In addition, we show that IMU2CLIP can significantly improve the downstream performance when fine-tuned for each application (e.g. activity recognition), demonstrating the universal usage of IMU2CLIP as a new pre-trained resource. Our code will be made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题