论文标题

Simdram:使用DRAM进行位simd处理的框架

SIMDRAM: A Framework for Bit-Serial SIMD Processing Using DRAM

论文作者

Hajinazar, Nastaran, Oliveira, Geraldo F., Gregorio, Sven, Ferreira, João Dinis, Ghiasi, Nika Mansouri, Patel, Minesh, Alser, Mohammed, Ghose, Saugata, Gómez-Luna, Juan, Mutlu, Onur

论文摘要

已经提出了一组有限的基本操作(即逻辑操作,加法),提出了使用加工-DRAM。但是,为了使使用加工 - DRAM的全面采用,有必要为更复杂的操作提供支持。在本文中,我们提出了SIMDRAM,这是一种灵活的通用通用处理 - 使用-DRAM框架,可以通过使用每个DRAM列作为独立的Simd Lane来实现广泛的操作进行大规模平行的计算,以执行位串行操作。 SIMDRAM包括三个关键步骤,以在DRAM中启用所需的操作:(1)建立有效的基于多数的代表所需操作的多数表示,(2)将操作输入和输出操作数映射到DRAM行,并将其产生所需操作的所需的DRAM命令绘制,以及(3)执行该操作。这三个步骤确保了DRAM中任何任意和复杂操作的有效计算。前两个步骤使用户可以灵活地有效地实施和计算DRAM中任何所需的操作。第三步控制从用户透明的DRAM计算的执行流。我们全面评估Simdram的可靠性,面积开销,操作吞吐量和能源效率,并使用广泛的操作和七种不同的现实世界内核来证明其通用性。我们的结果表明,SIMDRAM的运行吞吐量高达5.1倍,比最先进的DRAM计算机制高2.5倍,而现实世界内核的速度最高为2.5倍,同时导致小于1%的DRAM芯片区域的头顶。与CPU和高端GPU相比,SIMDRAM的能源有效效率更高,同时分别提供93倍和6倍的操作吞吐量。

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable the full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that enables massively-parallel computation of a wide range of operations by using each DRAM column as an independent SIMD lane to perform bit-serial operations. SIMDRAM consists of three key steps to enable a desired operation in DRAM: (1) building an efficient majority-based representation of the desired operation, (2) mapping the operation input and output operands to DRAM rows and to the required DRAM commands that produce the desired operation, and (3) executing the operation. These three steps ensure efficient computation of any arbitrary and complex operation in DRAM. The first two steps give users the flexibility to efficiently implement and compute any desired operation in DRAM. The third step controls the execution flow of the in-DRAM computation, transparently from the user. We comprehensively evaluate SIMDRAM's reliability, area overhead, operation throughput, and energy efficiency using a wide range of operations and seven diverse real-world kernels to demonstrate its generality. Our results show that SIMDRAM provides up to 5.1x higher operation throughput and 2.5x higher energy efficiency than a state-of-the-art in-DRAM computing mechanism, and up to 2.5x speedup for real-world kernels while incurring less than 1% DRAM chip area overhead. Compared to a CPU and a high-end GPU, SIMDRAM is 257x and 31x more energy-efficient, while providing 93x and 6x higher operation throughput, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源