Spinn：神经网络对设备和云的协同渐进推断

论文标题

Spinn：神经网络对设备和云的协同渐进推断

SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

论文作者

Laskaridis, Stefanos, Venieris, Stylianos I., Almeida, Mario, Leontiadis, Ilias, Lane, Nicholas D.

论文摘要

尽管在移动应用中使用了卷积神经网络（CNN）的飙升，但由于现代CNN的过度计算需求以及已部署的设备的多样性，对移动的高性能推断均匀持续了高性能。一种流行的替代方案包括将CNN处理的卸载到强大的基于云的服务器。然而，通过依靠云来产生输出，新兴任务 - 关键和高机动性应用（例如无人机避免障碍物或交互式应用）可能会遭受动态连通性条件和云的不确定性的影响。在本文中，我们提出了Spinn，这是一种分布式推理系统，该系统采用协同的设备云计算以及渐进推理方法，以在各种环境中提供快速且强大的CNN推理。拟议的系统介绍了一个新颖的调度程序，该调度程序在运行时将早期淘汰策略和CNN分割，以适应动态条件并满足用户定义的服务级别的要求。定量评估表明，Spinn优于其最先进的合作推理对应物，在不同的网络条件下达到的吞吐量高达2倍，将服务器的成本降低了6.8倍，在延迟限制下的准确性将20.7％提高到20.7％，同时在不确定的连接条件下与众多的能源节省相比，与云中相比，可实现稳健的操作。

Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题