论文标题

两者中最好的:CNN及其硬件加速器的Automl CodeSign

Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator

论文作者

Abdelfattah, Mohamed S., Dudziak, Łukasz, Chau, Thomas, Lee, Royson, Kim, Hyeji, Lane, Nicholas D.

论文摘要

神经体系结构搜索(NAS)在精确度上超过了人类设计的卷积神经网络(CNN),并且在存在硬件信息时也非常成功。但是,NAS设计的CNN通常具有复杂的拓扑结构,因此,为此类CNN设计定制硬件(HW)加速器可能很难。我们使用NAS自动化HW-CNN代码,包括来自CNN模型和HW加速器的参数,并共同搜索提高准确性和效率的最佳模型加速器对。我们称此CodeSign-NAS。在本文中,我们着重于定义CodeSign-NAS多目标优化问题,证明其有效性,并探索导航CodeSign搜索空间的不同方式。对于CIFAR-10图像分类,我们列举了近40亿个模型加速器对,并在该较大的搜索空间中找到帕累托前沿。这使我们能够评估三种不同的基于强化学习的搜索策略。最后,与从HW设计空间内的最佳HW加速器重新连接相比,我们将CIFAR-100分类精度提高了1.3%,而同时在约1000 GPU小时运行的CodeSign-NAS中,同时将性能/面积提高了41%。

Neural architecture search (NAS) has been very successful at outperforming human-designed convolutional neural networks (CNN) in accuracy, and when hardware information is present, latency as well. However, NAS-designed CNNs typically have a complicated topology, therefore, it may be difficult to design a custom hardware (HW) accelerator for such CNNs. We automate HW-CNN codesign using NAS by including parameters from both the CNN model and the HW accelerator, and we jointly search for the best model-accelerator pair that boosts accuracy and efficiency. We call this Codesign-NAS. In this paper we focus on defining the Codesign-NAS multiobjective optimization problem, demonstrating its effectiveness, and exploring different ways of navigating the codesign search space. For CIFAR-10 image classification, we enumerate close to 4 billion model-accelerator pairs, and find the Pareto frontier within that large search space. This allows us to evaluate three different reinforcement-learning-based search strategies. Finally, compared to ResNet on its most optimal HW accelerator from within our HW design space, we improve on CIFAR-100 classification accuracy by 1.3% while simultaneously increasing performance/area by 41% in just~1000 GPU-hours of running Codesign-NAS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源