论文标题
微秒应用的微秒共识
Microsecond Consensus for Microsecond Applications
论文作者
论文摘要
我们考虑通过复制使应用程序容忍的问题,当应用按微秒刻度运行时,例如在金融,嵌入式计算和微服务应用程序中。这些应用程序需要一个复制方案,该方案也以微秒尺度运行,否则复制将成为负担。我们提出了MU,该系统需要小于1.3微秒的系统来复制内存中的A(小)请求,而小于毫秒的系统来使系统失败 - 这将先前系统的复制和失败潜伏期减少了至少61%和90%。 MU实现了善意的状态机器复制/共识(SMR),具有强大的一致性,但它确实在微秒的应用程序上发挥了一致,即使最小的开销也很重要。为了提供此性能,MU引入了一种新的SMR协议,该协议仔细利用RDMA。粗略地,在MU中,领导者只需将请求直接写入使用RDMA的其他副本的日志来复制一个请求,而无需任何其他交流。但是,这样做引入了处理并发领导者,不断变化的领导者,收集日志的垃圾以及更多的挑战的挑战,这些挑战是通过RDMA权限和分布式算法设计的明智结合而解决的。 我们实施了MU,并用它来复制多个系统:一个名为Liquibook,Redis,Memcached和Herd的金融交换应用程序。我们的评估表明,MU会产生较小的复制延迟,在某些情况下是唯一可行的可接受开销的可行复制系统。
We consider the problem of making apps fault-tolerant through replication, when apps operate at the microsecond scale, as in finance, embedded computing, and microservices apps. These apps need a replication scheme that also operates at the microsecond scale, otherwise replication becomes a burden. We propose Mu, a system that takes less than 1.3 microseconds to replicate a (small) request in memory, and less than a millisecond to fail-over the system - this cuts the replication and fail-over latencies of the prior systems by at least 61% and 90%. Mu implements bona fide state machine replication/consensus (SMR) with strong consistency for a generic app, but it really shines on microsecond apps, where even the smallest overhead is significant. To provide this performance, Mu introduces a new SMR protocol that carefully leverages RDMA. Roughly, in Mu a leader replicates a request by simply writing it directly to the log of other replicas using RDMA, without any additional communication. Doing so, however, introduces the challenge of handling concurrent leaders, changing leaders, garbage collecting the logs, and more - challenges that we address in this paper through a judicious combination of RDMA permissions and distributed algorithmic design. We implemented Mu and used it to replicate several systems: a financial exchange app called Liquibook, Redis, Memcached, and HERD. Our evaluation shows that Mu incurs a small replication latency, in some cases being the only viable replication system that incurs an acceptable overhead.