增强DNN在边缘设备上的冷推断

论文标题

增强DNN在边缘设备上的冷推断

Boosting DNN Cold Inference on Edge Devices

论文作者

Yi, Rongjie, Cao, Ting, Zhou, Ao, Ma, Xiao, Wang, Shangguang, Xu, Mengwei

论文摘要

如今，DNN在边缘设备上无处不在。随着其重要性和用例的越来越多，它不太可能将所有DNN包装到设备内存中，并期望每个推断都被加热。因此，冷推断，读取，初始化和执行DNN模型的过程变得司空见惯，并且迫切要求优化其性能。为此，我们提出了NNV12，这是第一个为冷推理NNV12优化的设备推理引擎是在3个新颖的优化旋钮上构建的：为每个DNN操作员选择一个适当的内核（实现），绕过权重转换过程，通过缓存DISK上的后置量来执行，并通过在DISK上进行了调整，并与许多Kernelsections一起执行。为了解决庞大的搜索空间，NNV12采用了一种基于启发式的计划来获得近乎最佳的内核计划计划。我们完全实施了NNV12的原型，并在广泛的实验中评估了其性能。它表明，与Edge CPU和GPU上的最先进的DNN发动机相比，NNV12的达到15.2倍和401.5倍。

DNNs are ubiquitous on edge devices nowadays. With its increasing importance and use cases, it's not likely to pack all DNNs into device memory and expect that each inference has been warmed up. Therefore, cold inference, the process to read, initialize, and execute a DNN model, is becoming commonplace and its performance is urgently demanded to be optimized. To this end, we present NNV12, the first on-device inference engine that optimizes for cold inference NNV12 is built atop 3 novel optimization knobs: selecting a proper kernel (implementation) for each DNN operator, bypassing the weights transformation process by caching the post-transformed weights on disk, and pipelined execution of many kernels on asymmetric processors. To tackle with the huge search space, NNV12 employs a heuristic-based scheme to obtain a near-optimal kernel scheduling plan. We fully implement a prototype of NNV12 and evaluate its performance across extensive experiments. It shows that NNV12 achieves up to 15.2x and 401.5x compared to the state-of-the-art DNN engines on edge CPUs and GPUs, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题