论文标题
自动化统一的可靠广播
Self-stabilizing Uniform Reliable Broadcast
论文作者
论文摘要
我们研究了一个著名的交流抽象,称为统一可靠广播(URB)。 URB在设计和实施耐故障分布式系统的核心是中心,因为许多非平凡的耐故障分布式应用程序都需要通信,并在消息传递中可证明保证。我们的研究着重于容易出现节点失败的无时间消息的系统的容忍度实现。此外,我们旨在设计更强大的沟通抽象。我们通过自动化的镜头来做到这一点 - - 容忍断层的非常强烈的概念。除了节点和通信故障外,自稳定算法在发生任意瞬态故障后还可以恢复;这些故障代表对系统设计为操作的任何假设的任何违反(只要算法代码保持完好无损)。 这项工作提出了第一个容易出现节点失败的无时消息的自动化的URB解决方案。所提出的算法具有来自任意瞬态故障的O(BufferunitSize)稳定时间(根据异步循环),其中BufferRunitSize是一个可以根据可用内存设置的预定义常数。此外,我们的算法的沟通成本与非自我稳定的最先进的沟通成本相似。主要区别在于,我们的提案考虑了o(1)位消息和处理有限空间的重复闲聊(这是自稳定的先决条件)。具体而言,每个节点都需要存储为bufferunitsize n记录,并且每个记录都是o(v + n log n)位的大小(v + n log n)位,其中n是系统中的节点的数量,v是编码单个urb实例所需的位数。
We study a well-known communication abstraction called Uniform Reliable Broadcast (URB). URB is central in the design and implementation of fault-tolerant distributed systems, as many non-trivial fault-tolerant distributed applications require communication with provable guarantees on message deliveries. Our study focuses on fault-tolerant implementations for time-free message-passing systems that are prone to node-failures. Moreover, we aim at the design of an even more robust communication abstraction. We do so through the lenses of self-stabilization---a very strong notion of fault-tolerance. In addition to node and communication failures, self-stabilizing algorithms can recover after the occurrence of arbitrary transient faults; these faults represent any violation of the assumptions according to which the system was designed to operate (as long as the algorithm code stays intact). This work proposes the first self-stabilizing URB solution for time-free message-passing systems that are prone to node-failures. The proposed algorithm has an O(bufferUnitSize) stabilization time (in terms of asynchronous cycles) from arbitrary transient faults, where bufferUnitSize is a predefined constant that can be set according to the available memory. Moreover, the communication costs of our algorithm are similar to the ones of the non-self-stabilizing state-of-the-art. The main differences are that our proposal considers repeated gossiping of O(1) bits messages and deals with bounded space (which is a prerequisite for self-stabilization). Specifically, each node needs to store up to bufferUnitSize n records and each record is of size O(v + n log n) bits, where n is the number of nodes in the system and v is the number of bits needed to encode a single URB instance.