论文标题
影子师:移动视频DNN推断的分布式部分蒸馏
ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference
论文作者
论文摘要
在深度神经网络(DNN)在视频计算机视觉任务上取得成功之后,对源自移动设备的视频进行DNN推论具有实际意义。因此,以前的方法开发了用于卸载DNN推理计算的方法,用于云服务器管理移动设备的资源约束。但是,在视频数据方面,传达每个帧的信息会消耗过多的网络带宽,并使整个系统容易受到不利网络条件(例如拥塞)的影响。因此,在这项工作中,我们试图利用视频流的附近帧之间的时间连贯性来减轻网络压力。也就是说,我们提出了ShadowTutor,这是一个分布式视频DNN推理框架,可通过间歇性知识蒸馏到学生模型来减少网络传输的数量。此外,我们仅更新学生参数的一个子集,该参数称为部分蒸馏,以减少每个网络传输的数据大小。具体而言,服务器运行了一个大型且通用的教师模型,而移动设备仅运行一个极其小但专业的学生模型。在稀疏选择的关键帧上,服务器通过针对教师的响应并将更新的零件发送到移动设备来部分训练学生模型。我们通过高清视频语义分段研究了影子师的有效性。评估表明,网络数据传输平均减少了95%。此外,系统的吞吐量改善了三次以上,并显示了网络带宽变化的鲁棒性。
Following the recent success of deep neural networks (DNN) on video computer vision tasks, performing DNN inferences on videos that originate from mobile devices has gained practical significance. As such, previous approaches developed methods to offload DNN inference computations for images to cloud servers to manage the resource constraints of mobile devices. However, when it comes to video data, communicating information of every frame consumes excessive network bandwidth and renders the entire system susceptible to adverse network conditions such as congestion. Thus, in this work, we seek to exploit the temporal coherence between nearby frames of a video stream to mitigate network pressure. That is, we propose ShadowTutor, a distributed video DNN inference framework that reduces the number of network transmissions through intermittent knowledge distillation to a student model. Moreover, we update only a subset of the student's parameters, which we call partial distillation, to reduce the data size of each network transmission. Specifically, the server runs a large and general teacher model, and the mobile device only runs an extremely small but specialized student model. On sparsely selected key frames, the server partially trains the student model by targeting the teacher's response and sends the updated part to the mobile device. We investigate the effectiveness of ShadowTutor with HD video semantic segmentation. Evaluations show that network data transfer is reduced by 95% on average. Moreover, the throughput of the system is improved by over three times and shows robustness to changes in network bandwidth.