论文标题
用于详尽蛋白质蛋白对接的高性能云计算
High-performance cloud computing for exhaustive protein-protein docking
论文作者
论文摘要
近年来,公共云计算环境(例如Amazon AWS,Microsoft Azure和Google Cloud Platform)在计算性能方面取得了显着改善,并且预计还可以执行大量并行计算。由于云使用户可以随便使用数千个CPU内核和GPU加速器,并且可以通过云图像很容易地使用各种软件类型,因此云开始在生物信息学领域开始使用。在这项研究中,我们将原始的蛋白质 - 蛋白质相互作用预测(蛋白质 - 蛋白质对接)软件移植到Microsoft Azure中,作为HPC云环境的一个例子。使用四种CPU实例类型和两种GPU实例类型构建了最多1,600个CPU核心和960 GPU的云并行计算环境,并评估了并行计算性能。当使用100个实例的H16实例与50相比,我们的Azure系统上的巨型库克在CPU实例中显示出0.93的尺度值为0.93,而GPU实例的较高缩放值为0.89,而使用20个实例的NC24实例相比,使用了5个实例。与使用GPU的计算时间相比,使用gpu yousge compul sull Compul sull comply comply comply commud coull comply commud coull comply coumple coumple siper commen coumple siper complate complate coumple sipers complate coumple sipers counter coumple又有5个实例。在云上部署的开发环境非常便宜,使其适用于需要按需和大规模的HPC环境的应用。
Public cloud computing environments, such as Amazon AWS, Microsoft Azure, and the Google Cloud Platform, have achieved remarkable improvements in computational performance in recent years, and are also expected to be able to perform massively parallel computing. As the cloud enables users to use thousands of CPU cores and GPU accelerators casually, and various software types can be used very easily by cloud images, the cloud is beginning to be used in the field of bioinformatics. In this study, we ported the original protein-protein interaction prediction (protein-protein docking) software, MEGADOCK, into Microsoft Azure as an example of an HPC cloud environment. A cloud parallel computing environment with up to 1,600 CPU cores and 960 GPUs was constructed using four CPU instance types and two GPU instance types, and the parallel computing performance was evaluated. Our MEGADOCK on Azure system showed a strong scaling value of 0.93 for the CPU instance when H16 instance with 100 instances were used compared to 50, and a strong scaling value of 0.89 for the GPU instance when NC24 instance with 20 were used compared to 5. Moreover, the results of the usage fee and total computation time supported that using a GPU instance reduced the computation time of MEGADOCK and the cloud usage fee required for the computation. The developed environment deployed on the cloud is highly portable, making it suitable for applications in which an on-demand and large-scale HPC environment is desirable.