论文标题
基准测试时间无监督的深度神经网络适应边缘设备
Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices
论文作者
论文摘要
在边缘部署后,深神经网络(DNN)的预测准确性可能会因新数据的分布的变化而随着时间的流逝而遭受。为了提高DNN的鲁棒性,他们必须能够更新自己以提高预测准确性。在资源受限的边缘上的这种适应很具有挑战性,因为:(i)可能不存在新标记的数据; (ii)需要在设备上进行适应,因为与云的连接可能不可用; (iii)该过程不仅必须快速,而且还必须是记忆和节能效率。最近,已经引入了轻巧的预测时间无监督的DNN适应技术,以通过重新调整批处理归一化(BN)参数来提高模型的预测准确性。本文首次对此类技术进行了全面的测量研究,以量化其在各种边缘设备上的性能和能量,并找到瓶颈并提出优化机会。 In particular, this study considers CIFAR-10-C image classification dataset with corruptions, three robust DNNs (ResNeXt, Wide-ResNet, ResNet-18), two BN adaptation algorithms (one that updates normalization statistics and the other that also optimizes transformation parameters), and three edge devices (FPGA, Raspberry-Pi, and Nvidia Xavier NX).我们发现,仅在Xavier GPU上运行的广泛分网更新归一化参数的方法在平衡多个成本指标方面是有效的。但是,适应开销仍然可能很重要(约213毫秒左右)。结果强烈激发了对算法 - 硬件共同设计的需求,以进行有效的启动evice DNN适应。
The prediction accuracy of the deep neural networks (DNNs) after deployment at the edge can suffer with time due to shifts in the distribution of the new data. To improve robustness of DNNs, they must be able to update themselves to enhance their prediction accuracy. This adaptation at the resource-constrained edge is challenging as: (i) new labeled data may not be present; (ii) adaptation needs to be on device as connections to cloud may not be available; and (iii) the process must not only be fast but also memory- and energy-efficient. Recently, lightweight prediction-time unsupervised DNN adaptation techniques have been introduced that improve prediction accuracy of the models for noisy data by re-tuning the batch normalization (BN) parameters. This paper, for the first time, performs a comprehensive measurement study of such techniques to quantify their performance and energy on various edge devices as well as find bottlenecks and propose optimization opportunities. In particular, this study considers CIFAR-10-C image classification dataset with corruptions, three robust DNNs (ResNeXt, Wide-ResNet, ResNet-18), two BN adaptation algorithms (one that updates normalization statistics and the other that also optimizes transformation parameters), and three edge devices (FPGA, Raspberry-Pi, and Nvidia Xavier NX). We find that the approach that only updates the normalization parameters with Wide-ResNet, running on Xavier GPU, to be overall effective in terms of balancing multiple cost metrics. However, the adaptation overhead can still be significant (around 213 ms). The results strongly motivate the need for algorithm-hardware co-design for efficient on-device DNN adaptation.