潜伏意见的空间动态网络

论文标题

潜伏意见的空间动态网络

Latency-aware Spatial-wise Dynamic Networks

论文作者

Han, Yizeng, Yuan, Zhihang, Pu, Yifan, Xue, Chenhao, Song, Shiji, Sun, Guangyu, Huang, Gao

论文摘要

空间动态卷积已成为提高深网推理效率的有希望的方法。通过将更多的计算分配给最有用的像素，这种自适应推理范例可降低图像特征的空间冗余，并节省大量不必要的计算。但是，以前方法实现的理论效率几乎不能转化为现实的加速，尤其是在多核处理器（例如GPU）上。关键的挑战是，现有文献仅专注于以最少的计算设计算法，而忽略了实际延迟也可能受到调度策略和硬件属性影响的事实。为了弥合理论计算与实践效率之间的差距，我们提出了一个潜伏意见的空间动态网络（LASNET），该网络（LASNET）在新的延迟预测模型的指导下执行粗粒的空间自适应推断。延迟预测模型可以通过同时考虑算法，调度策略和硬件属性来有效估计动态网络的推理潜伏期。我们使用延迟预测器指导各种硬件平台上的算法设计和计划优化。图像分类，对象检测和实例分割的实验表明，所提出的框架显着提高了深网的实际推断效率。例如，在服务器GPU（NVIDIA TESLA-V100）和Edge设备（NVIDIA JETSON TX2 GPU）上，RESNET-101在ImageNet验证集上的平均潜伏期可以降低36％和46％，而无需牺牲准确性。代码可在https://github.com/leaplabthu/lasnet上找到。

Spatial-wise dynamic convolution has become a promising approach to improving the inference efficiency of deep networks. By allocating more computation to the most informative pixels, such an adaptive inference paradigm reduces the spatial redundancy in image features and saves a considerable amount of unnecessary computation. However, the theoretical efficiency achieved by previous methods can hardly translate into a realistic speedup, especially on the multi-core processors (e.g. GPUs). The key challenge is that the existing literature has only focused on designing algorithms with minimal computation, ignoring the fact that the practical latency can also be influenced by scheduling strategies and hardware properties. To bridge the gap between theoretical computation and practical efficiency, we propose a latency-aware spatial-wise dynamic network (LASNet), which performs coarse-grained spatially adaptive inference under the guidance of a novel latency prediction model. The latency prediction model can efficiently estimate the inference latency of dynamic networks by simultaneously considering algorithms, scheduling strategies, and hardware properties. We use the latency predictor to guide both the algorithm design and the scheduling optimization on various hardware platforms. Experiments on image classification, object detection and instance segmentation demonstrate that the proposed framework significantly improves the practical inference efficiency of deep networks. For example, the average latency of a ResNet-101 on the ImageNet validation set could be reduced by 36% and 46% on a server GPU (Nvidia Tesla-V100) and an edge device (Nvidia Jetson TX2 GPU) respectively without sacrificing the accuracy. Code is available at https://github.com/LeapLabTHU/LASNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题