论文标题
带有循环存储的分布式数据存储系统的编码数据重新平衡
Coded Data Rebalancing for Distributed Data Storage Systems with Cyclic Storage
论文作者
论文摘要
我们考虑基于复制的分布式存储系统,其中每个节点存储相同的量子数据,并且每个存储的每个数据位在整个节点上具有相同的复制因子。此类系统称为平衡分布式数据库。当现有节点休假或将新节点添加到该系统中时,数据库的平衡性质会丢失,要么是由于复制因子的降低,要么是在节点处存储的不均匀性。这触发了一种重新平衡的算法,该算法在节点之间交换数据,从而恢复了数据库的余额。然后,目标是设计以最小的通信负载来设计重平衡方案。在Krishnan等人最近的一项工作中,使用编码的传输来重新平衡从节点删除或添加中精心设计的分布式数据库。但是,这些编码的重新平衡方案具有最佳的通信负载,但是,文件大小至少在系统参数中为指数。在这项工作中,我们考虑了一个环状平衡数据库(在系统节点中循环放置数据),并在此数据库中呈现编码的重新平衡方案,以删除节点和添加。这些数据库(以及相关的重新平衡方案)要求文件大小仅在系统中的节点数量中是立方体。我们将节点去除重新平衡方案的优势限制在未编码方案中,并表明我们的方案具有较小的通信负载。在节点添加方案中,提出的重新平衡方案是一个简单的未编码方案,我们显示的具有最佳的负载。最后,我们为单个节点驱动的重新平衡得出了一个下限,用于通过我们可实现的重新平衡方案指定的数据放置的特定选择,并表明我们可实现的重新平衡载荷来自获得的下限的乘法间隙。
We consider replication-based distributed storage systems in which each node stores the same quantum of data and each data bit stored has the same replication factor across the nodes. Such systems are referred to as balanced distributed databases. When existing nodes leave or new nodes are added to this system, the balanced nature of the database is lost, either due to the reduction in the replication factor, or the non-uniformity of the storage at the nodes. This triggers a rebalancing algorithm, that exchanges data between the nodes so that the balance of the database is reinstated. The goal is then to design rebalancing schemes with minimal communication load. In a recent work by Krishnan et al., coded transmissions were used to rebalance a carefully designed distributed database from a node removal or addition. These coded rebalancing schemes have optimal communication load, however, require the file-size to be at least exponential in the system parameters. In this work, we consider a cyclic balanced database (where data is cyclically placed in the system nodes) and present coded rebalancing schemes for node removal and addition in such a database. These databases (and the associated rebalancing schemes) require the file-size to be only cubic in the number of nodes in the system. We bound the advantage of our node removal rebalancing scheme over the uncoded scheme, and show that our scheme has a smaller communication load. In the node addition scenario, the rebalancing scheme presented is a simple uncoded scheme, which we show has optimal load. Finally, we derive a lower bound for the single node-removal rebalancing for the specific choice of data placements specified by our achievable rebalancing schemes, and show that our achievable rebalancing loads are within a multiplicative gap from the lower bound obtained.