论文标题
数据,几何和同源性
Data, geometry and homology
论文作者
论文摘要
基于同源的不变性可以用来表征数据集的几何形状,从而获得对生成这些数据集的过程的一些了解。在这项工作中,我们研究了数据集以各种方式进行亚采样时的几何形状如何变化。在我们的框架中,数据集用作参考对象。然后,我们在环境空间中考虑不同的点,并以与参考对象相关的几何形状赋予它们,例如,通过与其元素和所考虑点之间的距离进行比例采样。我们说明了如何使用此过程来提取丰富的几何信息,例如,可以将来自不同数据分布的点进行分类。
Homology-based invariants can be used to characterize the geometry of datasets and thereby gain some understanding of the processes generating those datasets. In this work we investigate how the geometry of a dataset changes when it is subsampled in various ways. In our framework the dataset serves as a reference object; we then consider different points in the ambient space and endow them with a geometry defined in relation to the reference object, for instance by subsampling the dataset proportionally to the distance between its elements and the point under consideration. We illustrate how this process can be used to extract rich geometrical information, allowing for example to classify points coming from different data distributions.