论文标题
广义熵和统计学习的连续性
Continuity of Generalized Entropy and Statistical Learning
论文作者
论文摘要
我们研究了广义熵的连续性特性是基本概率分布的函数,该概率分布的函数是通过动作空间和损失功能定义的,并使用此属性回答统计学习理论中的基本问题:各种学习方法的过多风险分析。我们首先根据几个常用的F-Diverence,Wasserstein距离,距离动作空间和损耗函数以及熵产生的Bregman差异的距离,该距离的上限和下限是两个分布的上限和下限,这也诱导了两种分布之间的欧核距离。给出了示例,以及对每个一般结果的讨论,与现有的熵差界限进行比较,并根据新结果得出新的共同信息上限。然后,我们将熵差界限应用于统计学习理论。结果表明,在两个流行的学习范式(频繁学习和贝叶斯学习)中,多余的风险都可以通过不同形式的广义熵的连续性进行研究。然后将分析扩展到广义条件熵的连续性。该扩展名为贝叶斯决策制定的性能范围与不匹配的分布提供。这也导致了第三个学习范式的过度风险范围,在该决策规则在经验分布向预定义分布家族的投影下进行了最佳设计。因此,我们通过广义熵的连续性为统计学习的三个主要范式建立了一种统一的过多风险分析方法。
We study the continuity property of the generalized entropy as a function of the underlying probability distribution, defined with an action space and a loss function, and use this property to answer the basic questions in statistical learning theory: the excess risk analyses for various learning methods. We first derive upper and lower bounds for the entropy difference of two distributions in terms of several commonly used f-divergences, the Wasserstein distance, a distance that depends on the action space and the loss function, and the Bregman divergence generated by the entropy, which also induces bounds in terms of the Euclidean distance between the two distributions. Examples are given along with the discussion of each general result, comparisons are made with the existing entropy difference bounds, and new mutual information upper bounds are derived based on the new results. We then apply the entropy difference bounds to the theory of statistical learning. It is shown that the excess risks in the two popular learning paradigms, the frequentist learning and the Bayesian learning, both can be studied with the continuity property of different forms of the generalized entropy. The analysis is then extended to the continuity of generalized conditional entropy. The extension provides performance bounds for Bayes decision making with mismatched distributions. It also leads to excess risk bounds for a third paradigm of learning, where the decision rule is optimally designed under the projection of the empirical distribution to a predefined family of distributions. We thus establish a unified method of excess risk analysis for the three major paradigms of statistical learning, through the continuity of generalized entropy.