论文标题
结构化数据中多尺度周期的超图
Hypergraphs for multiscale cycles in structured data
论文作者
论文摘要
在现代物理,工程,生活和社会科学中,科学数据的规模和复杂性都在增长。例如,空间结构是许多最重要的现实世界复杂系统的标志,但其分析充满了统计挑战。拓扑数据分析可以为复杂系统提供强大的计算窗口。在这里,我们提出了一个框架,以扩展和解释持续的同源摘要,以分析多个尺度的空间数据。我们介绍了Hypertda,这是一种拓扑管道,该管道统一了局部(例如,大地测量)和全局(例如欧几里得)指标,而不会丢失空间信息,即使在存在噪声的情况下也是如此。同源性发电机提供了对空间结构的优雅而灵活的描述,并可以以可解释的方式捕获持续的同源性计算的信息。在这里,由持久性同源性计算的信息转化为加权超图,在其中,Hyperedges对应于同源性发生器。我们考虑发电机的不同选择(例如,矩阵或最小),发现中心性和社区检测对于任何一种选择都是可靠的。我们将Hypertda与现有的几何措施进行比较,并验证其对噪声的鲁棒性。我们证明了在生态学,生物物理学和生物学以及高维财务数据集中经常出现的空间曲线上计算高阶拓扑结构的力量。我们发现,Hypertda可以在2020年地标和I挑战的合成轨迹之间进行选择,并量化不同动物物种的运动,即使数据受到限制。
Scientific data has been growing in both size and complexity across the modern physical, engineering, life and social sciences. Spatial structure, for example, is a hallmark of many of the most important real-world complex systems, but its analysis is fraught with statistical challenges. Topological data analysis can provide a powerful computational window on complex systems. Here we present a framework to extend and interpret persistent homology summaries to analyse spatial data across multiple scales. We introduce hyperTDA, a topological pipeline that unifies local (e.g. geodesic) and global (e.g. Euclidean) metrics without losing spatial information, even in the presence of noise. Homology generators offer an elegant and flexible description of spatial structures and can capture the information computed by persistent homology in an interpretable way. Here the information computed by persistent homology is transformed into a weighted hypergraph, where hyperedges correspond to homology generators. We consider different choices of generators (e.g. matroid or minimal) and find that centrality and community detection are robust to either choice. We compare hyperTDA to existing geometric measures and validate its robustness to noise. We demonstrate the power of computing higher-order topological structures on spatial curves arising frequently in ecology, biophysics, and biology, but also in high-dimensional financial datasets. We find that hyperTDA can select between synthetic trajectories from the landmark 2020 AnDi challenge and quantifies movements of different animal species, even when data is limited.