使用贝叶斯上下文树对离散数据的更改点检测和分割

论文标题

使用贝叶斯上下文树对离散数据的更改点检测和分割

Change-point Detection and Segmentation of Discrete Data using Bayesian Context Trees

论文作者

Lungu, Valentinian, Papageorgiou, Ioannis, Kontoyiannis, Ioannis

论文摘要

引入了一个新的贝叶斯建模框架，用于零件均匀的可变记忆马尔可夫链，以及一系列有效的算法工具，用于更改点检测和离散时间序列的分割。在最近引入的贝叶斯上下文树（BCT）框架的基础上，离散时间序列中不同片段的分布被描述为可变的记忆马尔可夫链。然后，通过马尔可夫链蒙特卡洛采样进行变更点的存在和位置的推断。促进有效抽样的关键观察结果是，使用BCT算法之一，可以准确计算数据的先前预测可能性，从而整合每个段中的所有模型和参数。这使得可以直接从变更点的数量和位置的后验分布中进行采样，从而导致准确的估计，并提供结果中不确定性的自然定量度量。也可以以其他额外的计算成本获得每个细分市场中实际模型的估计。对模拟和现实世界数据的结果表明，所提出的方法的性能比或最新技术更好。

A new Bayesian modelling framework is introduced for piece-wise homogeneous variable-memory Markov chains, along with a collection of effective algorithmic tools for change-point detection and segmentation of discrete time series. Building on the recently introduced Bayesian Context Trees (BCT) framework, the distributions of different segments in a discrete time series are described as variable-memory Markov chains. Inference for the presence and location of change-points is then performed via Markov chain Monte Carlo sampling. The key observation that facilitates effective sampling is that, using one of the BCT algorithms, the prior predictive likelihood of the data can be computed exactly, integrating out all the models and parameters in each segment. This makes it possible to sample directly from the posterior distribution of the number and location of the change-points, leading to accurate estimates and providing a natural quantitative measure of uncertainty in the results. Estimates of the actual model in each segment can also be obtained, at essentially no additional computational cost. Results on both simulated and real-world data indicate that the proposed methodology performs better than or as well as state-of-the-art techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题