用于聚类时间序列数据的K-ARMA模型

论文标题

用于聚类时间序列数据的K-ARMA模型

K-ARMA Models for Clustering Time Series Data

论文作者

Hoare, Derek O., Matteson, David S., Wells, Martin T.

论文摘要

我们使用基于模型的K-均值算法的概括来提出一种聚类时间序列数据的方法，我们称之为K模型。我们证明了该一般算法的收敛性，并将其与用于混合建模的硬EM算法相关联。然后，我们首先使用AR（$ P $）聚类示例应用我们的方法，并展示如何使用最小值降低偏差标准使离群值可使群集算法变得强大。然后，我们为ARMA（$ P，Q $）构建了聚类算法，并将其扩展到Arima（$ p，d，q $）。我们为基于Ljung-box统计量拟合的模型开发了拟合统计量的优点。我们使用模拟数据执行实验，以说明如何将算法用于异常检测，检测分布漂移以及讨论初始化方法对空簇的影响。我们还对真实数据进行实验，这表明我们的方法与其他现有方法竞争类似的时间序列聚类任务。

We present an approach to clustering time series data using a model-based generalization of the K-Means algorithm which we call K-Models. We prove the convergence of this general algorithm and relate it to the hard-EM algorithm for mixture modeling. We then apply our method first with an AR($p$) clustering example and show how the clustering algorithm can be made robust to outliers using a least-absolute deviations criteria. We then build our clustering algorithm up for ARMA($p,q$) models and extend this to ARIMA($p,d,q$) models. We develop a goodness of fit statistic for the models fitted to clusters based on the Ljung-Box statistic. We perform experiments with simulated data to show how the algorithm can be used for outlier detection, detecting distributional drift, and discuss the impact of initialization method on empty clusters. We also perform experiments on real data which show that our method is competitive with other existing methods for similar time series clustering tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题