论文标题
HCM:神经网络架构的硬件感知复杂度度量
HCM: Hardware-Aware Complexity Metric for Neural Network Architectures
论文作者
论文摘要
卷积神经网络(CNN)在许多领域都很普遍,包括计算机视觉,语音识别和自然语言处理。尽管CNN硬件加速器已经作为许多SOC架构的一部分,但在资源限制设备上实现高精度的任务仍然被认为是具有挑战性的,这主要是由于需要平衡的大量设计参数才能实现有效的解决方案。量化技术在应用于网络参数时会导致功率和区域的降低,并且还可能改变通信和计算之间的比率。结果,某些算法解决方案可能缺乏内存带宽或计算资源,并且由于硬件约束而无法实现预期的性能。因此,系统设计师和微构造需要在早期开发阶段了解其高级决策的影响(例如,CNN的架构以及用于代表其参数的位数)对最终产品(例如,预期的电力节省,区域和准确性)。不幸的是,现有工具无法支持此类决策。 本文介绍了一个硬件感知的复杂性指标,旨在通过整个项目的寿命(尤其是在早期阶段)来帮助神经网络体系结构的系统设计师,通过预测建筑和微实验性决策对最终产品的影响。我们演示了拟议的指标如何帮助评估有关资源限制设备(例如实时嵌入式系统)的神经网络模型的不同设计替代方法,并避免在早期阶段犯设计错误。
Convolutional Neural Networks (CNNs) have become common in many fields including computer vision, speech recognition, and natural language processing. Although CNN hardware accelerators are already included as part of many SoC architectures, the task of achieving high accuracy on resource-restricted devices is still considered challenging, mainly due to the vast number of design parameters that need to be balanced to achieve an efficient solution. Quantization techniques, when applied to the network parameters, lead to a reduction of power and area and may also change the ratio between communication and computation. As a result, some algorithmic solutions may suffer from lack of memory bandwidth or computational resources and fail to achieve the expected performance due to hardware constraints. Thus, the system designer and the micro-architect need to understand at early development stages the impact of their high-level decisions (e.g., the architecture of the CNN and the amount of bits used to represent its parameters) on the final product (e.g., the expected power saving, area, and accuracy). Unfortunately, existing tools fall short of supporting such decisions. This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures, through the entire project lifetime (especially at its early stages) by predicting the impact of architectural and micro-architectural decisions on the final product. We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices such as real-time embedded systems, and to avoid making design mistakes at early stages.