论文标题
重新予以释放为社区检测,重点是拓扑结构
Reformulating Speaker Diarization as Community Detection With Emphasis On Topological Structure
论文作者
论文摘要
尽管最近在端到端诊断方面发展了,但基于聚类的说话者诊断还是现实中的主要方法之一。但是,尚未广泛探索聚类方法以探索说话者诊断。通常使用的方法,例如K-均值,光谱聚类和聚集层次聚类,仅考虑到邻近性和相对密度等属性。在本文中,我们建议将基于聚类的诊断视为社区检测问题。通过这样做,考虑了拓扑结构。这项工作有四个主要贡献。首先,表明莱顿社区检测算法明显优于先前关于说话者分解的方法。其次,我们建议使用统一的歧管近似来降低维度,同时保留全球和局部拓扑结构。第三,引入了蒙版的过滤方法来提取“干净”扬声器嵌入。最后,将社区结构应用于端到端的后处理网络,以获得诊断结果。最终系统的相对减少高达70%。分析每个组件的分解贡献。
Clustering-based speaker diarization has stood firm as one of the major approaches in reality, despite recent development in end-to-end diarization. However, clustering methods have not been explored extensively for speaker diarization. Commonly-used methods such as k-means, spectral clustering, and agglomerative hierarchical clustering only take into account properties such as proximity and relative densities. In this paper we propose to view clustering-based diarization as a community detection problem. By doing so the topological structure is considered. This work has four major contributions. First it is shown that Leiden community detection algorithm significantly outperforms the previous methods on the clustering of speaker-segments. Second, we propose to use uniform manifold approximation to reduce dimension while retaining global and local topological structure. Third, a masked filtering approach is introduced to extract "clean" speaker embeddings. Finally, the community structure is applied to an end-to-end post-processing network to obtain diarization results. The final system presents a relative DER reduction of up to 70 percent. The breakdown contribution of each component is analyzed.