论文标题

测试数据binnings

Testing Data Binnings

论文作者

Canonne, Clément L., Wimmer, Karl

论文摘要

由数据量化问题和“ binning”的问题,我们重新审视了离散概率分布的身份测试问题。身份测试(又称一个样本测试),一个基本和现在的分配测试问题,问,给定参考分布(型号)$ \ mathbf {q} $,以及来自未知分布$ \ mathbf {p {p {p} $的样本,均超过$ [n]等于$ \ mathbf {q} $,或与之显着不同。 In this paper, we introduce the related question of 'identity up to binning,' where the reference distribution $\mathbf{q}$ is over $k \ll n$ elements: the question is then whether there exists a suitable binning of the domain $[n]$ into $k$ intervals such that, once "binned," $\mathbf{p}$ is equal to $\mathbf{q}$.我们在这个新问题的样本复杂性上提供了几乎紧密的上限和下限,这在Vanilla身份测试中既显示了定量和定性的差异,又回答了Canonne(2019)的开放问题。最后,我们讨论了几个扩展和相关的研究方向。

Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) $\mathbf{q}$ and samples from an unknown distribution $\mathbf{p}$, both over $[n]=\{1,2,\dots,n\}$, whether $\mathbf{p}$ equals $\mathbf{q}$, or is significantly different from it. In this paper, we introduce the related question of 'identity up to binning,' where the reference distribution $\mathbf{q}$ is over $k \ll n$ elements: the question is then whether there exists a suitable binning of the domain $[n]$ into $k$ intervals such that, once "binned," $\mathbf{p}$ is equal to $\mathbf{q}$. We provide nearly tight upper and lower bounds on the sample complexity of this new question, showing both a quantitative and qualitative difference with the vanilla identity testing one, and answering an open question of Canonne (2019). Finally, we discuss several extensions and related research directions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源