论文标题
无监督的分层语义分割,具有多视图的coseposepation和群集变压器
Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers
论文作者
论文摘要
无监督的语义细分旨在发现在没有外部监督的情况下捕获对象和观察不变的图像内部和跨越图像中的分组。自然分组具有粒度水平,在无监督的分割中产生了歧义。现有方法避免了这种歧义,并将其视为外部建模的一个因素,而我们拥抱它并希望对无监督的细分进行层次分组的一致性。 我们将无监督的细分视为像素的特征学习问题。我们的想法是,良好的表示不仅应揭示特定的分组水平,而且还应以一致且可预测的方式揭示任何级别的分组。我们在同一图像的多个视图之间进行分组和引导特征学习的空间一致性,并在整个分组层次结构之间执行语义一致性,并在粗粒和细粒度之间进行聚类变压器。 我们提供了第一个数据驱动的无监督分层语义分割方法,称为层次段分组(HSG)。捕获视觉相似性和统计共发生,HSG还通过在五个主要的对象和场景中心的基准上的差距大幅度优于现有的无监督分割方法。我们的代码可在https://github.com/twke18/hsg上公开获取。
Unsupervised semantic segmentation aims to discover groupings within and across images that capture object and view-invariance of a category without external supervision. Grouping naturally has levels of granularity, creating ambiguity in unsupervised segmentation. Existing methods avoid this ambiguity and treat it as a factor outside modeling, whereas we embrace it and desire hierarchical grouping consistency for unsupervised segmentation. We approach unsupervised segmentation as a pixel-wise feature learning problem. Our idea is that a good representation shall reveal not just a particular level of grouping, but any level of grouping in a consistent and predictable manner. We enforce spatial consistency of grouping and bootstrap feature learning with co-segmentation among multiple views of the same image, and enforce semantic consistency across the grouping hierarchy with clustering transformers between coarse- and fine-grained features. We deliver the first data-driven unsupervised hierarchical semantic segmentation method called Hierarchical Segment Grouping (HSG). Capturing visual similarity and statistical co-occurrences, HSG also outperforms existing unsupervised segmentation methods by a large margin on five major object- and scene-centric benchmarks. Our code is publicly available at https://github.com/twke18/HSG .