快速学习动态的手势识别几乎没有射击的学习模型

论文标题

快速学习动态的手势识别几乎没有射击的学习模型

Fast Learning of Dynamic Hand Gesture Recognition with Few-Shot Learning Models

论文作者

Schlüsener, Niels, Bücker, Michael

论文摘要

我们开发了几乎没有训练的学习模型，分别识别五到十个不同的动态手势，通过为模型提供一个，两个或五个示例，它们可以随意互换。所有模型均建立在关系网络（RN）的少量学习架构中，其中长短记忆单元形成主链。这些模型使用从小丑数据集的RGB-VIDEO序列提取的手参考点，该数据集修改为包含190种不同类型的手势。结果表明，对于十个动态手势，识别五个，高达81.2％的准确性高达88.8％。这项研究还阐明了使用几次学习方法的潜在努力节省，而不是传统的深度学习方法来检测动态手势。储蓄被定义为当以新的手势训练深度学习模型而不是一些射击学习模型时，所需的其他观察值数量。关于实现大致相同精度所需的观察总数的差异表明，可以识别出五个手势的五个和1260个观测值的潜在节省。由于标记手势的视频录制意味着巨大的努力，因此可以认为这些节省很大。

We develop Few-Shot Learning models trained to recognize five or ten different dynamic hand gestures, respectively, which are arbitrarily interchangeable by providing the model with one, two, or five examples per hand gesture. All models were built in the Few-Shot Learning architecture of the Relation Network (RN), in which Long-Short-Term Memory cells form the backbone. The models use hand reference points extracted from RGB-video sequences of the Jester dataset which was modified to contain 190 different types of hand gestures. Result show accuracy of up to 88.8% for recognition of five and up to 81.2% for ten dynamic hand gestures. The research also sheds light on the potential effort savings of using a Few-Shot Learning approach instead of a traditional Deep Learning approach to detect dynamic hand gestures. Savings were defined as the number of additional observations required when a Deep Learning model is trained on new hand gestures instead of a Few Shot Learning model. The difference with respect to the total number of observations required to achieve approximately the same accuracy indicates potential savings of up to 630 observations for five and up to 1260 observations for ten hand gestures to be recognized. Since labeling video recordings of hand gestures implies significant effort, these savings can be considered substantial.

下载PDF全文

下载文献需遵守相关版权规定

论文标题