非序列数据的监督学习：一种规范的多层分解方法

论文标题

非序列数据的监督学习：一种规范的多层分解方法

Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition Approach

论文作者

Haliassos, Alexandros, Konstantinidis, Kriton, Mandic, Danilo P.

论文摘要

特征相互作用的有效建模是针对非顺序任务的监督学习，其特征是缺乏特征（变量）的固有排序。每个顺序相互作用的参数的蛮力方法都以指数计算和内存成本（维度的诅咒）为例。为了减轻此问题，已提议隐式将模型参数表示为张量，其顺序等于功能的数量。为了提高效率，可以将其进一步分解为紧凑型张量列（TT）格式。但是，TT和其他张量网络（TNS）（例如Tensor环和分层Tucker）都对其索引的排序敏感（因此对特征）敏感。为了建立所需的不变性以特征排序，我们建议通过规范多核（CP）分解（CPD）表示重量张量，并引入相关的推理和学习算法，包括合适的正则化和初始化方案。已经证明，所提出的基于CP的预测器在稀疏数据上显着优于其他基于TN的预测指标，同时在密集的非顺序任务上表现出可比的性能。此外，为了提高表现力，我们将框架概括为允许特征映射到任意高维特征向量。结合特征矢量归一化，这证明可以为密集的非顺序任务，匹配模型（例如完全连接的神经网络）带来巨大的性能改善。

Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks, characterized by a lack of inherent ordering of features (variables). The brute force approach of learning a parameter for each interaction of every order comes at an exponential computational and memory cost (Curse of Dimensionality). To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor, the order of which is equal to the number of features; for efficiency, it can be further factorized into a compact Tensor Train (TT) format. However, both TT and other Tensor Networks (TNs), such as Tensor Ring and Hierarchical Tucker, are sensitive to the ordering of their indices (and hence to the features). To establish the desired invariance to feature ordering, we propose to represent the weight tensor through the Canonical Polyadic (CP) Decomposition (CPD), and introduce the associated inference and learning algorithms, including suitable regularization and initialization schemes. It is demonstrated that the proposed CP-based predictor significantly outperforms other TN-based predictors on sparse data while exhibiting comparable performance on dense non-sequential tasks. Furthermore, for enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors. In conjunction with feature vector normalization, this is shown to yield dramatic improvements in performance for dense non-sequential tasks, matching models such as fully-connected neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题