多元时间序列分类的可扩展分类器 - 敏捷通道选择

论文标题

多元时间序列分类的可扩展分类器 - 敏捷通道选择

Scalable Classifier-Agnostic Channel Selection for Multivariate Time Series Classification

论文作者

Dhariyal, Bhaskar, Nguyen, Thach Le, Ifrim, Georgiana

论文摘要

准确性是当前工作的重点，用于时间序列分类。但是，许多应用程序中的速度和数据降低同样重要，尤其是当数据量表和存储需求迅速增加时。当前的MTSC算法需要数百个计算小时才能完成培训和预测。这是由于多元时间序列数据的性质，该数据随时间序列，其长度和通道数量而增长。在许多应用程序中，并非所有渠道都对分类任务有用。因此，我们需要可以有效选择有用的渠道并节省计算资源的方法。我们提出并评估两种用于渠道选择的方法。我们的技术通过由原型时间序列表示每个类，并根据类之间的原型距离执行通道选择。主要假设是有用的通道可以在类之间进行更好的分离。因此，类原型之间具有较高距离的通道更有用。在UEA多元时间序列分类（MTSC）基准中，我们表明这些技术可实现显着的数据降低和分类器加速，以相似的分类准确性。在培训最先进的MTSC算法之前，将通道选择作为预处理步骤，并节省了约70 \％的计算时间和数据存储，并保留了精确度。此外，我们的方法使甚至没有使用通道选择或前向通道选择的有效分类器（例如Rocket）获得了更好的准确性。为了进一步研究我们的技术的影响，我们介绍了对具有100多个渠道的合成多元时间序列数据集进行分类的实验，以及在具有50个渠道的数据集上进行的真实世界案例研究。我们的渠道选择方法可通过保留或提高的精度可显着减少数据。

Accuracy is a key focus of current work in time series classification. However, speed and data reduction in many applications is equally important, especially when the data scale and storage requirements increase rapidly. Current MTSC algorithms need hundreds of compute hours to complete training and prediction. This is due to the nature of multivariate time series data, which grows with the number of time series, their length and the number of channels. In many applications, not all the channels are useful for the classification task; hence we require methods that can efficiently select useful channels and thus save computational resources. We propose and evaluate two methods for channel selection. Our techniques work by representing each class by a prototype time series and performing channel selection based on the prototype distance between classes. The main hypothesis is that useful channels enable better separation between classes; hence, channels with the higher distance between class prototypes are more useful. On the UEA Multivariate Time Series Classification (MTSC) benchmark, we show that these techniques achieve significant data reduction and classifier speedup for similar levels of classification accuracy. Channel selection is applied as a pre-processing step before training state-of-the-art MTSC algorithms and saves about 70\% of computation time and data storage, with preserved accuracy. Furthermore, our methods enable even efficient classifiers, such as ROCKET, to achieve better accuracy than using no channel selection or forward channel selection. To further study the impact of our techniques, we present experiments on classifying synthetic multivariate time series datasets with more than 100 channels, as well as a real-world case study on a dataset with 50 channels. Our channel selection methods lead to significant data reduction with preserved or improved accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题