论文标题

通过在野外群集未标记的面孔来改善面部识别

Improving Face Recognition by Clustering Unlabeled Faces in the Wild

论文作者

RoyChowdhury, Aruni, Yu, Xiang, Sohn, Kihyuk, Learned-Miller, Erik, Chandraker, Manmohan

论文摘要

尽管深层识别已从大型标签数据中受益匪浅,但当前的研究重点是利用未标记的数据来进一步提高性能,从而降低了人类注释的成本。先前的工作主要是在受控设置中,在该设置中,标记和未标记的数据集没有构造的重叠身份。这在大规模面部识别中是不现实的,在大规模的面部识别中,必须与这种重叠抗争,其频率随数据量的增加而增加。忽略身份重叠会导致明显的标记噪声,因为来自相同身份的数据分为多个群集。为了解决这个问题,我们提出了一种基于极值理论的新型身份分离方法。它被配制为分布外检测算法,并大大减少了由重叠的身份标签噪声引起的问题。将群集分配视为伪标记,我们还必须克服聚类误差的标记噪声。我们提出了对余弦损失的调制,其中调制权重对应于聚类不确定性的估计值。对受控和实际设置的广泛实验证明了我们方法对监督基线的一致改进,例如,IJB-A验证提高了11.6%。

While deep face recognition has benefited significantly from large-scale labeled data, current research is focused on leveraging unlabeled data to further boost performance, reducing the cost of human annotation. Prior work has mostly been in controlled settings, where the labeled and unlabeled data sets have no overlapping identities by construction. This is not realistic in large-scale face recognition, where one must contend with such overlaps, the frequency of which increases with the volume of data. Ignoring identity overlap leads to significant labeling noise, as data from the same identity is split into multiple clusters. To address this, we propose a novel identity separation method based on extreme value theory. It is formulated as an out-of-distribution detection algorithm, and greatly reduces the problems caused by overlapping-identity label noise. Considering cluster assignments as pseudo-labels, we must also overcome the labeling noise from clustering errors. We propose a modulation of the cosine loss, where the modulation weights correspond to an estimate of clustering uncertainty. Extensive experiments on both controlled and real settings demonstrate our method's consistent improvements over supervised baselines, e.g., 11.6% improvement on IJB-A verification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源