剪力器：通过软件硬件启用的多重近似值高效的高度计算

论文标题

剪力器：通过软件硬件启用的多重近似值高效的高度计算

SHEARer: Highly-Efficient Hyperdimensional Computing by Software-Hardware Enabled Multifold Approximation

论文作者

Khaleghi, Behnam, Salamat, Sahand, Thomas, Anthony, Asgarinejad, Fatemeh, Kim, Yeseong, Rosing, Tajana

论文摘要

高维计算（HD）是基于大脑根据数据的高维，分布式，数据表示的证据，是机器学习的新兴范式。 HD的主要操作是编码，该操作通过将每个输入功能映射到HyperVector，将输入数据传输到超空间，并伴随着所谓的捆绑过程，该过程只需添加过度向量即可实现编码Hyperypoodector。尽管HD的操作高度可行，但大量操作阻碍了嵌入式域中HD的效率。在本文中，我们提出了Shearer，这是一种算法 - 硬件合作式化，以提高HD计算的性能和能耗。我们从一个审慎的方案中获得洞察力，该方案近似于高清固有的错误弹性，该方案对准确性的影响很小，同时为硬件优化提供了很高的前景。与以前的作品相反，该作品以完全精确地生成编码过量向量，然后量化后量，我们以近似方式计算编码的过度向量，从而节省了大量资源，但具有很高的精度。我们还提出了一种新颖的FPGA实施，该实施通过低功耗的大量并行性来实现惊人的性能。此外，我们开发了一个软件框架，该框架可以通过模拟所提出的近似编码来培训HD模型。 Shearer的FPGA实施实现了104,904X（15.7倍）的平均吞吐量提升，并且使用实用的机器学习数据集合使用了Raspberry Pi 3（GeForce GTX 1080 TI），与在Raspberry Pi 3（GeForce GTX 1080 TI）上实施的最先进的编码方法相比，能源节省高达56,044 x（301x）。

Hyperdimensional computing (HD) is an emerging paradigm for machine learning based on the evidence that the brain computes on high-dimensional, distributed, representations of data. The main operation of HD is encoding, which transfers the input data to hyperspace by mapping each input feature to a hypervector, accompanied by so-called bundling procedure that simply adds up the hypervectors to realize encoding hypervector. Although the operations of HD are highly parallelizable, the massive number of operations hampers the efficiency of HD in embedded domain. In this paper, we propose SHEARer, an algorithm-hardware co-optimization to improve the performance and energy consumption of HD computing. We gain insight from a prudent scheme of approximating the hypervectors that, thanks to inherent error resiliency of HD, has minimal impact on accuracy while provides high prospect for hardware optimization. In contrast to previous works that generate the encoding hypervectors in full precision and then ex-post quantizing, we compute the encoding hypervectors in an approximate manner that saves a significant amount of resources yet affords high accuracy. We also propose a novel FPGA implementation that achieves striking performance through massive parallelism with low power consumption. Moreover, we develop a software framework that enables training HD models by emulating the proposed approximate encodings. The FPGA implementation of SHEARer achieves an average throughput boost of 104,904x (15.7x) and energy savings of up to 56,044x (301x) compared to state-of-the-art encoding methods implemented on Raspberry Pi 3 (GeForce GTX 1080 Ti) using practical machine learning datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题