论文标题

边缘附近的效率:提高FFT在GPU上的能源效率以进行实时边缘计算

Efficiency Near the Edge: Increasing the Energy Efficiency of FFTs on GPUs for Real-time Edge Computing

论文作者

Adámek, Karel, Novotný, Jan, Thiyagalingam, Jeyarajan, Armour, Wesley

论文摘要

平方公里阵列(SKA)是一项国际倡议,用于开发全球最大的射电望远镜,总收集面积超过一百万平方米。操作的规模,结合望远镜的远程位置,需要使用节能计算算法。这是SKA将产生的极端数据速率以及对实时观察功能的要求,需要在边缘样式计算解决方案中使用原位数据处理。更一般而言,现代计算环境中的能源效率正成为首要关注的问题。无论是电力预算可以限制世界上一些最大的超级计算机,还是最小的The-Internet设备可用的有限电力。在本文中,我们使用Cufft库研究了硬件频率缩放对快速傅立叶变换(FFT)对NVIDIA GPU的能耗和执行时间的影响。 FFT用于许多科学领域,它是射电天文学数据处理管道中使用的关键算法之一。通过使用频率缩放,我们表明,与Boost Cloess频率相比,将FFT计算高达60%时,我们可以降低NVIDIA V100 GPU的功耗,执行时间增加了10%。此外,使用一个常见的核心时钟频率,用于所有测试的FFT长度,与增强核心时钟频率相比,我们平均显示功率消耗50%,执行时间的增加仍低于10%。我们演示了如何使用这些结果来降低现有数据处理管道的功耗。当考虑多年的运营时,这些节省可以节省大量资金,但也可能导致温室气体排放的大幅减少。

The Square Kilometre Array (SKA) is an international initiative for developing the world's largest radio telescope with a total collecting area of over a million square meters. The scale of the operation, combined with the remote location of the telescope, requires the use of energy-efficient computational algorithms. This, along with the extreme data rates that will be produced by the SKA and the requirement for a real-time observing capability, necessitates in-situ data processing in an edge style computing solution. More generally, energy efficiency in the modern computing landscape is becoming of paramount concern. Whether it be the power budget that can limit some of the world's largest supercomputers, or the limited power available to the smallest Internet-of-Things devices. In this paper, we study the impact of hardware frequency scaling on the energy consumption and execution time of the Fast Fourier Transform (FFT) on NVIDIA GPUs using the cuFFT library. The FFT is used in many areas of science and it is one of the key algorithms used in radio astronomy data processing pipelines. Through the use of frequency scaling, we show that we can lower the power consumption of the NVIDIA V100 GPU when computing the FFT by up to 60% compared to the boost clock frequency, with less than a 10% increase in the execution time. Furthermore, using one common core clock frequency for all tested FFT lengths, we show on average a 50% reduction in power consumption compared to the boost core clock frequency with an increase in the execution time still below 10%. We demonstrate how these results can be used to lower the power consumption of existing data processing pipelines. These savings, when considered over years of operation, can yield significant financial savings, but can also lead to a significant reduction of greenhouse gas emissions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源