论文标题

多标签图像分类的深层语义词典学习

Deep Semantic Dictionary Learning for Multi-label Image Classification

论文作者

Zhou, Fengtao, Huang, Sheng, Xing, Yun

论文摘要

与单标签图像分类相比,多标签图像分类更加实用和具有挑战性。最近的一些研究试图利用类别的语义信息来改善多标签图像分类性能。但是,这些基于语义的方法仅将语义信息作为视觉表示的补充类型,而无需进一步剥削。在本文中,我们提出了通往多标签图像分类解决方案的创新途径,该途径将其视为词典学习任务。设计了一种名为“深语义词典学习”(DSDL)的新型端到端模型。在DSDL中,应用自动编码器来从类级语义上生成语义词典,然后使用该字典来表示用标记嵌入的卷积神经网络(CNN)提取的视觉特征。 DSDL提供了一种简单而优雅的方式,可以通过在其中同时进行字典学习来利用和调和标签,语义和视觉空间。此外,受到传统词典学习的迭代优化的启发,我们进一步制定了一种名为“参数更新策略(APU)的新颖培训策略,以优化DSDL,该策略在远期和后退传播中交替优化了表示的表示系数和语义词典。对三个流行基准测试的广泛实验结果表明,与最先进的方法相比,我们的方法实现了有希望的表演。我们的代码和模型已在{https://github.com/zft-cqu/dsdl}上发布。

Compared with single-label image classification, multi-label image classification is more practical and challenging. Some recent studies attempted to leverage the semantic information of categories for improving multi-label image classification performance. However, these semantic-based methods only take semantic information as type of complements for visual representation without further exploitation. In this paper, we present an innovative path towards the solution of the multi-label image classification which considers it as a dictionary learning task. A novel end-to-end model named Deep Semantic Dictionary Learning (DSDL) is designed. In DSDL, an auto-encoder is applied to generate the semantic dictionary from class-level semantics and then such dictionary is utilized for representing the visual features extracted by Convolutional Neural Network (CNN) with label embeddings. The DSDL provides a simple but elegant way to exploit and reconcile the label, semantic and visual spaces simultaneously via conducting the dictionary learning among them. Moreover, inspired by iterative optimization of traditional dictionary learning, we further devise a novel training strategy named Alternately Parameters Update Strategy (APUS) for optimizing DSDL, which alternately optimizes the representation coefficients and the semantic dictionary in forward and backward propagation. Extensive experimental results on three popular benchmarks demonstrate that our method achieves promising performances in comparison with the state-of-the-arts. Our codes and models have been released at {https://github.com/ZFT-CQU/DSDL}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源