具有非线性深编码器框架的混合数据类型的学习表示形式

论文标题

具有非线性深编码器框架的混合数据类型的学习表示形式

Learning Representation for Mixed Data Types with a Nonlinear Deep Encoder-Decoder Framework

论文作者

Sahoo, Saswata, Chakraborty, Souradip

论文摘要

在混合变量，数值和分类类型上获得合适特征图的数据表示是一项艰巨的任务，因为重要信息在于复杂的非线性歧管。特征转换应能够同时合并各个变量的边际信息和复杂的交叉依赖性结构。在这项工作中，我们提出了一个新型的非线性深层编码器框架，以捕获混合数据类型的跨域信息。网络的隐藏层通过各种非线性变换连接两种变量，以提供潜在的特征图。我们在许多隐藏的非线性单元中编码有关数值变量的信息。我们使用这些单元通过进一步的非线性转换来重新创建分类变量。开发了一个单独的类似网络，以切换数值和分类变量的作用。隐藏的代表单元旁边堆叠一个旁边，并使用保留局部性投影转变为公共空间。派生的特征图用于探索数据中的簇。研究了各种标准数据集，以使用简单的K-均值聚类的特征图显示聚类中几乎最先进的性能。

Representation of data on mixed variables, numerical and categorical types to get suitable feature map is a challenging task as important information lies in a complex non-linear manifold. The feature transformation should be able to incorporate marginal information of the individual variables and complex cross-dependence structure among the mixed type of variables simultaneously. In this work, we propose a novel nonlinear Deep Encoder-Decoder framework to capture the cross-domain information for mixed data types. The hidden layers of the network connect the two types of variables through various non-linear transformations to give latent feature maps. We encode the information on the numerical variables in a number of hidden nonlinear units. We use these units to recreate categorical variables through further nonlinear transformations. A separate and similar network is developed switching the roles of the numerical and categorical variables. The hidden representational units are stacked one next to the others and transformed into a common space using a locality preserving projection. The derived feature maps are used to explore the clusters in the data. Various standard datasets are investigated to show nearly the state of the art performance in clustering using the feature maps with simple K-means clustering.

下载PDF全文

下载文献需遵守相关版权规定

论文标题