参数和非参数方法对高维稀疏基质表示的影响

论文标题

参数和非参数方法对高维稀疏基质表示的影响

Effects of Parametric and Non-Parametric Methods on High Dimensional Sparse Matrix Representations

论文作者

Tambe, Sayali, Joshi, Raunak, Gupta, Abhishek, Kanvinde, Nandan, Chitre, Vidya

论文摘要

语义是从文本数据中得出的，这些数据为机器学习算法提供了表示。这些表示是高维稀疏矩阵的可解释形式，作为机器学习算法的输入。由于学习方法被广泛地归类为参数和非参数学习方法，因此在本文中，我们提供了这些类型的算法对高维稀疏矩阵表示的影响。为了从文本数据中得出表示形式，我们在本文中考虑了具有正当理由的TF-IDF表示形式。我们已经形成了50、100、500、1000和5000维度的表示，在这些维度上，我们使用线性判别分析进行了分类，将天真的贝叶斯作为参数学习方法，决策树和支持向量机器作为非参数学习方法。后来，我们在本文中详细介绍的每种算法的表示和效果的每个维度上提供了指标。

The semantics are derived from textual data that provide representations for Machine Learning algorithms. These representations are interpretable form of high dimensional sparse matrix that are given as an input to the machine learning algorithms. Since learning methods are broadly classified as parametric and non-parametric learning methods, in this paper we provide the effects of these type of algorithms on the high dimensional sparse matrix representations. In order to derive the representations from the text data, we have considered TF-IDF representation with valid reason in the paper. We have formed representations of 50, 100, 500, 1000 and 5000 dimensions respectively over which we have performed classification using Linear Discriminant Analysis and Naive Bayes as parametric learning method, Decision Tree and Support Vector Machines as non-parametric learning method. We have later provided the metrics on every single dimension of the representation and effect of every single algorithm detailed in this paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题