部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

PiDRAM: An FPGA-based Framework for End-to-end Evaluation of Processing-in-DRAM Techniques

论文作者

Olgun, Ataberk, Luna, Juan Gomez, Kanellopoulos, Konstantinos, Salami, Behzad, Hassan, Hasan, Ergin, Oguz, Mutlu, Onur

论文摘要

基于DRAM的主内存几乎在所有计算系统中都用作主要组件。克服主内存瓶颈的一种方法是将计算移动在存储器附近，这是一种称为内存处理（PIM）的范式。最近的PIM技术为提高现有系统和未来系统的性能和能源效率提供了一种有希望的方法，而无需额外的DRAM硬件成本。我们开发了DRAM中的处理（PIDRAM）框架，这是第一个灵活，端到端和开源框架，该框架可以使用真实的DRAM芯片来实现系统集成研究和对真实PIM技术的评估。我们在基于FPGA的平台（Xilinx ZC706）上演示了PIDRAM的原型，该平台实现了开源RISC-V系统（Rocket Chip）。 To demonstrate the flexibility and ease of use of PiDRAM, we implement two PiM techniques: (1) RowClone, an in-DRAM copy and initialization mechanism (using command sequences proposed by ComputeDRAM), and (2) D-RaNGe, an in-DRAM true random number generator based on DRAM activation-latency failures. 我们对划船的端到端评估副本最多显示14.6倍的加速度，以及通过CPU副本（即传统的memcpy）和初始化（即传统calloc）操作进行的12.6倍初始化操作。我们对D范围的实现提供了高吞吐量真实的随机数，达到8.30 MB/s的吞吐量。在Pidram提供的Verilog和C ++基础上，实施所需的硬件和软件组件，实现rowclone端到端的端到端为198（565），实现D-Range端到端的端到端需要190（78）行Verilog（C ++）代码。 Pidram在GitHub上开放源泉：https：//github.com/cmu-safari/pidram。

DRAM-based main memory is used in nearly all computing systems as a major component. One way of overcoming the main memory bottleneck is to move computation near memory, a paradigm known as processing-in-memory (PiM). Recent PiM techniques provide a promising way to improve the performance and energy efficiency of existing and future systems at no additional DRAM hardware cost. We develop the Processing-in-DRAM (PiDRAM) framework, the first flexible, end-to-end, and open source framework that enables system integration studies and evaluation of real PiM techniques using real DRAM chips. We demonstrate a prototype of PiDRAM on an FPGA-based platform (Xilinx ZC706) that implements an open-source RISC-V system (Rocket Chip). To demonstrate the flexibility and ease of use of PiDRAM, we implement two PiM techniques: (1) RowClone, an in-DRAM copy and initialization mechanism (using command sequences proposed by ComputeDRAM), and (2) D-RaNGe, an in-DRAM true random number generator based on DRAM activation-latency failures. Our end-to-end evaluation of RowClone shows up to 14.6X speedup for copy and 12.6X initialization operations over CPU copy (i.e., conventional memcpy) and initialization (i.e., conventional calloc) operations. Our implementation of D-RaNGe provides high throughput true random numbers, reaching 8.30 Mb/s throughput. Over the Verilog and C++ basis provided by PiDRAM, implementing the required hardware and software components, implementing RowClone end-to-end takes 198 (565) and implementing D-RaNGe end-to-end takes 190 (78) lines of Verilog (C++) code. PiDRAM is open sourced on Github: https://github.com/CMU-SAFARI/PiDRAM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题