在性能的位置：量化大量3D堆叠缓存对HPC工作负载的影响

论文标题

在性能的位置：量化大量3D堆叠缓存对HPC工作负载的影响

At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

论文作者

Domke, Jens, Vatai, Emil, Gerofi, Balazs, Kodama, Yuetsu, Wahib, Mohamed, Podobas, Artur, Mittal, Sparsh, Pericàs, Miquel, Zhang, Lingqi, Chen, Peng, Drozd, Aleksandr, Matsuoka, Satoshi

论文摘要

在过去的三十年中，内存子系统中的创新主要针对克服数据运动瓶颈。在本文中，我们关注内存技术的特定市场趋势：3D堆叠的内存和缓存。我们研究了将未来HPC的处理器（尤其是3D堆叠的SRAM）扩展在未来HPC的处理器中扩展片上存储器功能的影响。首先，我们提出了一种忽略内存子系统的方法，以在消除数据移动成本时评估性能改进的上限。然后，使用GEM5模拟器，我们对假设的大型缓存处理器（LARC）的两个变体进行建模，该变体以1.5 nm制造，并具有高容量的3D堆叠缓存。通过一系列涉及一系列代理应用和基准的实验，我们旨在揭示HPC CPU性能将如何发展，并以每芯片为基础，以缓存敏感的HPC应用的平均提升为9.56倍。此外，我们详尽地记录了我们的方法论探索，以激励HPC中心通过增强的共同设计来推动自己的技术议程。

Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of a hypothetical LARge Cache processor (LARC), fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a broad set of proxy-applications and benchmarks, we aim to reveal how HPC CPU performance will evolve, and conclude an average boost of 9.56x for cache-sensitive HPC applications, on a per-chip basis. Additionally, we exhaustively document our methodological exploration to motivate HPC centers to drive their own technological agenda through enhanced co-design.

下载PDF全文

下载文献需遵守相关版权规定

论文标题