论文标题

Scrooge:用于CPU,GPU和ASIC的快速和记忆果实基因组序列对齐器

Scrooge: A Fast and Memory-Frugal Genomic Sequence Aligner for CPUs, GPUs, and ASICs

论文作者

Lindegger, Joël, Cali, Damla Senol, Alser, Mohammed, Gómez-Luna, Juan, Ghiasi, Nika Mansouri, Mutlu, Onur

论文摘要

成对序列比对是通用生物信息学管道中非常耗时的一步。加快此步骤需要启发式,有效的实现和/或硬件加速度。上述所有的有前途的候选人是最近提出的基因算法。我们在基因算法中识别并解决了三个效率低下的效率:它具有大量的数据运动,大量的记忆足迹,并且做了一些不必要的工作。我们提出了Scrooge,这是一种快速和记忆的果实基因组序列对准器。 Scrooge包括三种新型算法改进,可减少数据运动,记忆足迹和基因算法中的操作数量。我们为CPU和GPU提供了有效的Scrooge算法的开源实现,这证明了我们算法改进的重大好处。对于长期的读取,Scrooge的CPU版本分别在KSW2,Edlib和CPU实现基因时实现了20.1倍,1.7倍和2.1倍的速度。 Scrooge的GPU版本在CPU版本的Scrooge,KSW2,Edlib,Darwin-GPU和GPU实现的GPU版本上获得了4.0倍,6.8倍,12.6倍和5.9倍的速度。我们估计,Scrooge的ASIC实施是在维持相同的吞吐量的同时,使用芯片面积少3.6倍,功率少2.1倍。此外,我们系统地分析了各种构型下的吞吐量和准确性行为。由于Scrooge的最佳配置取决于计算平台,因此我们进行了几种观察,可以帮助指导Scrooge的未来实现。可用性:https://github.com/cmu-safari/scrooge

Pairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this step requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above is the recently proposed GenASM algorithm. We identify and address three inefficiencies in the GenASM algorithm: it has a high amount of data movement, a large memory footprint, and does some unnecessary work. We propose Scrooge, a fast and memory-frugal genomic sequence aligner. Scrooge includes three novel algorithmic improvements which reduce the data movement, memory footprint, and the number of operations in the GenASM algorithm. We provide efficient open-source implementations of the Scrooge algorithm for CPUs and GPUs, which demonstrate the significant benefits of our algorithmic improvements. For long reads, the CPU version of Scrooge achieves a 20.1x, 1.7x, and 2.1x speedup over KSW2, Edlib, and a CPU implementation of GenASM, respectively. The GPU version of Scrooge achieves a 4.0x 80.4x, 6.8x, 12.6x and 5.9x speedup over the CPU version of Scrooge, KSW2, Edlib, Darwin-GPU, and a GPU implementation of GenASM, respectively. We estimate an ASIC implementation of Scrooge to use 3.6x less chip area and 2.1x less power than a GenASM ASIC while maintaining the same throughput. Further, we systematically analyze the throughput and accuracy behavior of GenASM and Scrooge under various configurations. As the best configuration of Scrooge depends on the computing platform, we make several observations that can help guide future implementations of Scrooge. Availability: https://github.com/CMU-SAFARI/Scrooge

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源