$ k $ - 竞争与$ k $ -lane广播，散布和Alltoall算法

论文标题

$ k $ - 竞争与$ k $ -lane广播，散布和Alltoall算法

$k$-ported vs. $k$-lane Broadcast, Scatter, and Alltoall Algorithms

论文作者

Träff, Jesper Larsson

论文摘要

在$ k $ ported的消息通讯系统中，处理器可以同时从$ k $其他处理器接收$ k $不同的消息，并向$ k $不同的消息发送给$ k $ k $其他处理器，这些消息可能与收到消息的处理器有所不同。现代聚类系统可能没有这样的功能。取而代之的是，通过让节点上的$ k $处理器同时发送和接收到最多一条消息，可以同时发送和接收由$ n $处理器组成的计算节点同时发送和接收来自其他节点的$ K $消息。我们提出了一个问题，即如何为这种$ k $ lane型号设计好算法，这可能是通过适应为传统$ k $ port的模型设计的算法。我们讨论并比较了许多（非最佳）$ k $ lane算法，用于广播，散布和Alltoall集体操作（如MPI中所示），并在实验中评估了这些操作，并在一个小$ 36 \ times 32 $ -NODE群集上评估它们，并带有双重omnipath网络（对应于$ omnipath网络（对应于$ k = 2 $ k = 2 $）。结果是初步的。

In $k$-ported message-passing systems, a processor can simultaneously receive $k$ different messages from $k$ other processors, and send $k$ different messages to $k$ other processors that may or may not be different from the processors from which messages are received. Modern clustered systems may not have such capabilities. Instead, compute nodes consisting of $n$ processors can simultaneously send and receive $k$ messages from other nodes, by letting $k$ processors on the nodes concurrently send and receive at most one message. We pose the question of how to design good algorithms for this $k$-lane model, possibly by adapting algorithms devised for the traditional $k$-ported model. We discuss and compare a number of (non-optimal) $k$-lane algorithms for the broadcast, scatter and alltoall collective operations (as found in, e.g., MPI), and experimentally evaluate these on a small $36\times 32$-node cluster with a dual OmniPath network (corresponding to $k=2$). Results are preliminary.

下载PDF全文

下载文献需遵守相关版权规定

论文标题