论文标题

segcodeNet:颜色编码的分割面具,可从可穿戴摄像机进行活动检测

SegCodeNet: Color-Coded Segmentation Masks for Activity Detection from Wearable Cameras

论文作者

Sushmit, Asif Shahriyar, Ghosh, Partho, Istiak, Md. Abrar, Rashid, Nayeeb, Akash, Ahsan Habib, Hasan, Taufiq

论文摘要

使用可穿戴摄像机捕获的第一人称视频(FPV)的活动检测是一个活跃的研究领域,在许多领域,包括医疗保健,执法和康复。最先进的方法使用基于光流的混合技术,该技术依赖于从连续帧的对象运动中得出的特征。在这项工作中,我们开发了一个两流网络,即\ emph {segcodeNet},该网络使用了一个网络分支,该网络包含带有颜色编码的语义分割掩码的视频流,除原始的RGB视频流外,还使用了相关对象的颜色编码语义分割掩码。我们还提供了一个方面的注意门控,该门控优先在两个流和一个框架的注意模块之间,该模块优先考虑包含相关功能的视频帧。实验是在FPV数据集上进行的,该数据集包含$ 18 $的活动类别中的活动类别。与单个流网络相比,提出的两流方法的绝对提高$ 14.366 \%$ \%$和10.324 \%\%\%\%$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $,当比较三个不同的框架的平均结果,三个不同的框架的平均结果被比较三个不同的框架,$ 224 \ \ \ \ \ \ \ \ \ \ \ \ \ \ $ $ 64 $ 644444444444444444444444444444444444444.44 $。所提出的方法为低分辨率图像提供了显着的性能提高,其绝对改善为$ 17 \%$和$ 26 \%的F1分数,分别为$ 112 \ times112 $和$ 64 \ times64 $。最佳性能是获得$ 224 \ times224 $的帧尺寸,获得F1的得分和$ 90.176 \%$和$ 90.799 \%$,它优于最先进的3D Convnet(I3D)\ cite {carreira2017quo $ $ 4.5 $ 4.529 $ 2.419 \%$。

Activity detection from first-person videos (FPV) captured using a wearable camera is an active research field with potential applications in many sectors, including healthcare, law enforcement, and rehabilitation. State-of-the-art methods use optical flow-based hybrid techniques that rely on features derived from the motion of objects from consecutive frames. In this work, we developed a two-stream network, the \emph{SegCodeNet}, that uses a network branch containing video-streams with color-coded semantic segmentation masks of relevant objects in addition to the original RGB video-stream. We also include a stream-wise attention gating that prioritizes between the two streams and a frame-wise attention module that prioritizes the video frames that contain relevant features. Experiments are conducted on an FPV dataset containing $18$ activity classes in office environments. In comparison to a single-stream network, the proposed two-stream method achieves an absolute improvement of $14.366\%$ and $10.324\%$ for averaged F1 score and accuracy, respectively, when average results are compared for three different frame sizes $224\times224$, $112\times112$, and $64\times64$. The proposed method provides significant performance gains for lower-resolution images with absolute improvements of $17\%$ and $26\%$ in F1 score for input dimensions of $112\times112$ and $64\times64$, respectively. The best performance is achieved for a frame size of $224\times224$ yielding an F1 score and accuracy of $90.176\%$ and $90.799\%$ which outperforms the state-of-the-art Inflated 3D ConvNet (I3D) \cite{carreira2017quo} method by an absolute margin of $4.529\%$ and $2.419\%$, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源