论文标题

时间序列聚类的弹性距离函数的综述和评估

A Review and Evaluation of Elastic Distance Functions for Time Series Clustering

论文作者

Holder, Chris, Middlehurst, Matthew, Bagnall, Anthony

论文摘要

时间序列聚类是分组时间序列数据而无需求助于标签的行为。聚集时间序列的算法可以分为两组:使用时间序列的特定距离度量的算法;以及那些从时间序列中获得特征的人。两种方法通常都依赖于传统的聚类算法,例如$ k $ -MEANS。我们的重点是采用弹性距离度量的基于距离的时间序列,即在测量距离时执行某种重组的距离。我们描述了九种常用的弹性距离测量方法,并将其性能与K-均值和K-模拟体聚类进行比较。我们的发现令人惊讶。最受欢迎的技术是动态的时间扭曲(DTW),其性能比欧几里得距离的距离更糟,即使在调谐时,也没有更好的表现。使用K-Medoids而不是K-均值改善了所有九个距离测量的聚类。 DTW与K-Medoids的欧几里得距离不高。通常,使用与扭曲结合使用编辑的距离测量表现更好,而一项距离测量方法是移动分解合并(MSM)方法,是这项研究的最佳性能度量。我们还使用Barycentre平均(DBA)将聚类与DTW进行比较。我们发现DBA确实改善了DTW K均值,但是标准DBA仍然比使用MSM更糟糕。我们的结论是,使用K-Medoids推荐MSM作为用于弹性距离测量的聚类时间序列的基准算法。我们在AEEON工具包中提供实现,结果和指南,以重现关联的GitHub存储库。

Time series clustering is the act of grouping time series data without recourse to a label. Algorithms that cluster time series can be classified into two groups: those that employ a time series specific distance measure; and those that derive features from time series. Both approaches usually rely on traditional clustering algorithms such as $k$-means. Our focus is on distance based time series that employ elastic distance measures, i.e. distances that perform some kind of realignment whilst measuring distance. We describe nine commonly used elastic distance measures and compare their performance with k-means and k-medoids clustering. Our findings are surprising. The most popular technique, dynamic time warping (DTW), performs worse than Euclidean distance with k-means, and even when tuned, is no better. Using k-medoids rather than k-means improved the clusterings for all nine distance measures. DTW is not significantly better than Euclidean distance with k-medoids. Generally, distance measures that employ editing in conjunction with warping perform better, and one distance measure, the move-split-merge (MSM) method, is the best performing measure of this study. We also compare to clustering with DTW using barycentre averaging (DBA). We find that DBA does improve DTW k-means, but that the standard DBA is still worse than using MSM. Our conclusion is to recommend MSM with k-medoids as the benchmark algorithm for clustering time series with elastic distance measures. We provide implementations in the aeon toolkit, results and guidance on reproducing results on the associated GitHub repository.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源