论文标题
部分可观测时空混沌系统的无模型预测
Near-Optimal Bounds for Testing Histogram Distributions
论文作者
论文摘要
我们研究了测试有序域上的离散概率分布是否是指定数量的箱子的直方图。 $ k $的简洁近似值的最常见工具之一是$ k $ [n] $的$ k $组织图,是概率分布,它们在一组$ k $间隔上是分段恒定的。直方图测试问题如下:从$ [n] $上的未知分布中给定样品$ \ mathbf {p} $,我们想区分$ \ mathbf {p} $是$ k $ - 历史图是$ \ varepsilon $ - 从任何$ k $ -k $ -far中的$ k $ -far,总差异。我们的主要结果是针对此测试问题的样本接近最佳和计算有效的算法,以及几乎匹配的(在对数因素内)样品复杂性下限。具体而言,我们表明直方图测试问题具有样本复杂性$ \widetildeθ(\ sqrt {nk} / \ varepsilon + k / \ varepsilon^2 + \ sqrt {n} / \ varepsilon^2)$。
We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$-histograms over $[n]$, are probability distributions that are piecewise constant over a set of $k$ intervals. The histogram testing problem is the following: Given samples from an unknown distribution $\mathbf{p}$ on $[n]$, we want to distinguish between the cases that $\mathbf{p}$ is a $k$-histogram versus $\varepsilon$-far from any $k$-histogram, in total variation distance. Our main result is a sample near-optimal and computationally efficient algorithm for this testing problem, and a nearly-matching (within logarithmic factors) sample complexity lower bound. Specifically, we show that the histogram testing problem has sample complexity $\widetilde Θ(\sqrt{nk} / \varepsilon + k / \varepsilon^2 + \sqrt{n} / \varepsilon^2)$.