论文标题
通过上下文组装和强大的数据增强来提高图像垫的鲁棒性
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation
论文作者
论文摘要
深度图像贴件方法在基准上取得了越来越好的结果(例如Coption-1K/Alphamatting.com)。但是,鲁棒性,包括对三杆的鲁棒性和对来自不同域的图像的概括,仍然尚未探索。尽管有些作品建议通过额外的数据增强来完善构图或将算法调整为现实世界图像,但它们都没有考虑到两者,更不用说在使用这些数据增强的同时,在基准上的重大性能恶化。为了填补这一空白,我们提出了一种图像矩阵方法,该方法通过多层次上下文组装和强数据增强靶向矩阵来实现较高的鲁棒性(RMAT)。具体来说,我们首先通过用编码器中的变压器块对充分的全局信息进行建模,并专注于卷积层的详细信息以及在解码器中的低级功能组装注意力块。然后,基于这种强大的基线,我们分析了当前数据的扩展,并探索简单但有效的强数据增强,以增强基线模型并贡献更具概括性的效果方法。与以前的方法相比,所提出的方法不仅在组成1K基准测试基准(SAD的提高11%,而GRAD上提高了11%)的最先进结果,模型尺寸较小,而且还显示了对其他基准测试,对现实世界图像的其他基准的更强的概括性结果,以及与我们的广泛实验的不同基础介入。
Deep image matting methods have achieved increasingly better results on benchmarks (e.g., Composition-1k/alphamatting.com). However, the robustness, including robustness to trimaps and generalization to images from different domains, is still under-explored. Although some works propose to either refine the trimaps or adapt the algorithms to real-world images via extra data augmentation, none of them has taken both into consideration, not to mention the significant performance deterioration on benchmarks while using those data augmentation. To fill this gap, we propose an image matting method which achieves higher robustness (RMat) via multilevel context assembling and strong data augmentation targeting matting. Specifically, we first build a strong matting framework by modeling ample global information with transformer blocks in the encoder, and focusing on details in combination with convolution layers as well as a low-level feature assembling attention block in the decoder. Then, based on this strong baseline, we analyze current data augmentation and explore simple but effective strong data augmentation to boost the baseline model and contribute a more generalizable matting method. Compared with previous methods, the proposed method not only achieves state-of-the-art results on the Composition-1k benchmark (11% improvement on SAD and 27% improvement on Grad) with smaller model size, but also shows more robust generalization results on other benchmarks, on real-world images, and also on varying coarse-to-fine trimaps with our extensive experiments.