segcodeNet：颜色编码的分割面具，可从可穿戴摄像机进行活动检测

论文标题

segcodeNet：颜色编码的分割面具，可从可穿戴摄像机进行活动检测

SegCodeNet: Color-Coded Segmentation Masks for Activity Detection from Wearable Cameras

论文作者

Sushmit, Asif Shahriyar, Ghosh, Partho, Istiak, Md. Abrar, Rashid, Nayeeb, Akash, Ahsan Habib, Hasan, Taufiq

论文摘要

使用可穿戴摄像机捕获的第一人称视频（FPV）的活动检测是一个活跃的研究领域，在许多领域，包括医疗保健，执法和康复。最先进的方法使用基于光流的混合技术，该技术依赖于从连续帧的对象运动中得出的特征。在这项工作中，我们开发了一个两流网络，即\ emph {segcodeNet}，该网络使用了一个网络分支，该网络包含带有颜色编码的语义分割掩码的视频流，除原始的RGB视频流外，还使用了相关对象的颜色编码语义分割掩码。我们还提供了一个方面的注意门控，该门控优先在两个流和一个框架的注意模块之间，该模块优先考虑包含相关功能的视频帧。实验是在FPV数据集上进行的，该数据集包含$ 18 $的活动类别中的活动类别。与单个流网络相比，提出的两流方法的绝对提高$ 14.366 \％$ \％$和10.324 \％\％\％\％$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $，当比较三个不同的框架的平均结果，三个不同的框架的平均结果被比较三个不同的框架，$ 224 \ \ \ \ \ \ \ \ \ \ \ \ \ \ $ $ 64 $ 644444444444444444444444444444444444444.44 $。所提出的方法为低分辨率图像提供了显着的性能提高，其绝对改善为$ 17 \％$和$ 26 \％的F1分数，分别为$ 112 \ times112 $和$ 64 \ times64 $。最佳性能是获得$ 224 \ times224 $的帧尺寸，获得F1的得分和$ 90.176 \％$和$ 90.799 \％$，它优于最先进的3D Convnet（I3D）\ cite {carreira2017quo $ $ 4.5 $ 4.529 $ 2.419 \％$。

Activity detection from first-person videos (FPV) captured using a wearable camera is an active research field with potential applications in many sectors, including healthcare, law enforcement, and rehabilitation. State-of-the-art methods use optical flow-based hybrid techniques that rely on features derived from the motion of objects from consecutive frames. In this work, we developed a two-stream network, the \emph{SegCodeNet}, that uses a network branch containing video-streams with color-coded semantic segmentation masks of relevant objects in addition to the original RGB video-stream. We also include a stream-wise attention gating that prioritizes between the two streams and a frame-wise attention module that prioritizes the video frames that contain relevant features. Experiments are conducted on an FPV dataset containing $18$ activity classes in office environments. In comparison to a single-stream network, the proposed two-stream method achieves an absolute improvement of $14.366\%$ and $10.324\%$ for averaged F1 score and accuracy, respectively, when average results are compared for three different frame sizes $224\times224$, $112\times112$, and $64\times64$. The proposed method provides significant performance gains for lower-resolution images with absolute improvements of $17\%$ and $26\%$ in F1 score for input dimensions of $112\times112$ and $64\times64$, respectively. The best performance is achieved for a frame size of $224\times224$ yielding an F1 score and accuracy of $90.176\%$ and $90.799\%$ which outperforms the state-of-the-art Inflated 3D ConvNet (I3D) \cite{carreira2017quo} method by an absolute margin of $4.529\%$ and $2.419\%$, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题