论文标题

多场分类数据的现场学习

Field-wise Learning for Multi-field Categorical Data

论文作者

Li, Zhibin, Zhang, Jian, Gong, Yongshun, Yao, Yazhou, Wu, Qiang

论文摘要

我们提出了一种使用多场分类数据学习的新方法。多场分类数据通常是在许多异质组上收集的。这些组可以在一个领域的类别中反映。现有的方法试图学习适合所有数据的通用模型,这具有挑战性,不可避免地会导致学习复杂的模型。相比之下,我们提出了一种利用数据的自然结构来学习具有适当限制的简单但有效的一对一的以野外为中心的模型来学习一种实地学习方法。在此过程中,模型可以安装到每个类别中,因此可以更好地捕获数据中的潜在差异。我们提出了一个模型,该模型利用具有方差和低级别约束的线性模型,以帮助其更好地推广并减少参数的数量。该模型也可以以现场方式解释。由于多场分类数据的维度可能很高,因此应用于此类数据的模型大多被过度参数化。我们的理论分析可以潜在地解释过度参数化对模型概括的影响。它还支持学习目标中的方差约束。两个大规模数据集的实验结果显示了我们模型的出色性能,概括误差的趋势以及学习成果的可解释性。我们的代码可在https://github.com/lzb5600/field-wise-learning上找到。

We propose a new method for learning with multi-field categorical data. Multi-field categorical data are usually collected over many heterogeneous groups. These groups can reflect in the categories under a field. The existing methods try to learn a universal model that fits all data, which is challenging and inevitably results in learning a complex model. In contrast, we propose a field-wise learning method leveraging the natural structure of data to learn simple yet efficient one-to-one field-focused models with appropriate constraints. In doing this, the models can be fitted to each category and thus can better capture the underlying differences in data. We present a model that utilizes linear models with variance and low-rank constraints, to help it generalize better and reduce the number of parameters. The model is also interpretable in a field-wise manner. As the dimensionality of multi-field categorical data can be very high, the models applied to such data are mostly over-parameterized. Our theoretical analysis can potentially explain the effect of over-parametrization on the generalization of our model. It also supports the variance constraints in the learning objective. The experiment results on two large-scale datasets show the superior performance of our model, the trend of the generalization error bound, and the interpretability of learning outcomes. Our code is available at https://github.com/lzb5600/Field-wise-Learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源