论文标题
元构造器:统一的元元框架,用于细粒度识别
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
论文作者
论文摘要
细粒度的视觉分类(FGVC)是需要识别超级类别多个从属类别的对象的任务。最新的最新方法通常设计复杂的学习管道来解决此任务。但是,单独的视觉信息通常不足以准确区分细粒度的视觉类别。如今,元信息(例如时空先验,属性和文本描述)通常与图像一起出现。这激发了我们提出一个问题:是否有可能使用统一而简单的框架利用各种元信息来协助良好的鉴定?为了回答这个问题,我们探索了一个统一且强大的元框架(元图),以进行细粒度的视觉分类。在实践中,元构造者提供了一种简单而有效的方法来解决视力和各种元信息的联合学习。此外,Metaformer还为没有铃铛和哨声的FGVC提供了强大的基线。广泛的实验表明,元图可以有效地使用各种元信息来提高细粒识别的性能。在公平的比较中,Metaformer可以胜过当前SOTA的方法,仅在Inaturalist2017和Inaturalist2018数据集上使用视觉信息。添加元信息,元构造器可以分别超过当前的SOTA方法5.9%和5.3%。此外,Metaformer在CUB-200-2011和Nabirds上可以达到92.3%和92.7%,这极大地超过了SOTA接近。源代码和预训练的模型已发布athttps://github.com/dqshuai/metaformer。
Fine-Grained Visual Classification(FGVC) is the task that requires recognizing the objects belonging to multiple subordinate categories of a super-category. Recent state-of-the-art methods usually design sophisticated learning pipelines to tackle this task. However, visual information alone is often not sufficient to accurately differentiate between fine-grained visual categories. Nowadays, the meta-information (e.g., spatio-temporal prior, attribute, and text description) usually appears along with the images. This inspires us to ask the question: Is it possible to use a unified and simple framework to utilize various meta-information to assist in fine-grained identification? To answer this problem, we explore a unified and strong meta-framework(MetaFormer) for fine-grained visual classification. In practice, MetaFormer provides a simple yet effective approach to address the joint learning of vision and various meta-information. Moreover, MetaFormer also provides a strong baseline for FGVC without bells and whistles. Extensive experiments demonstrate that MetaFormer can effectively use various meta-information to improve the performance of fine-grained recognition. In a fair comparison, MetaFormer can outperform the current SotA approaches with only vision information on the iNaturalist2017 and iNaturalist2018 datasets. Adding meta-information, MetaFormer can exceed the current SotA approaches by 5.9% and 5.3%, respectively. Moreover, MetaFormer can achieve 92.3% and 92.7% on CUB-200-2011 and NABirds, which significantly outperforms the SotA approaches. The source code and pre-trained models are released athttps://github.com/dqshuai/MetaFormer.