通过分解数据集蒸馏

论文标题

通过分解数据集蒸馏

Dataset Distillation via Factorization

论文作者

Liu, Songhua, Wang, Kai, Yang, Xingyi, Ye, Jingwen, Wang, Xinchao

论文摘要

在本文中，我们从新的角度研究\ xw {数据集蒸馏（dd）}，并引入\ emph {dataset cressization}方法，称为\ emph {haba}，这是可移植到任何现有DD基线的插件策略。与旨在生成蒸馏性样本的常规DD方法不同，\ emph {haba}探索将数据集分解为两个组件：data \ emph {ha} llucination网络和\ emph {ba} ses，而后者则将其送入前者，以重建图像样品。因此，碱基和幻觉网络之间的柔性组合配备了蒸馏数据的指数信息增益，这在很大程度上增加了蒸馏数据集的表示能力。此外，为了提高压缩结果的数据效率，我们进一步在最终的幻觉网络和碱基上引入了一对对抗性对比限制，这增加了生成的图像的多样性，并将更多的判别信息注入分解。广泛的比较和实验表明，与先前的艺术状态相比，我们的方法可以对下游分类任务产生显着改善，同时将压缩参数的总数最多减少了65 \％。此外，通过我们的方法，蒸馏的数据集还达到了\ textasciitilde10 \％的精度，比跨架构概括中的基线方法高。我们的代码可用\ href {https://github.com/huage001/datasetFactorization} {there}。

In this paper, we study \xw{dataset distillation (DD)}, from a novel perspective and introduce a \emph{dataset factorization} approach, termed \emph{HaBa}, which is a plug-and-play strategy portable to any existing DD baseline. Unlike conventional DD approaches that aim to produce distilled and representative samples, \emph{HaBa} explores decomposing a dataset into two components: data \emph{Ha}llucination networks and \emph{Ba}ses, where the latter is fed into the former to reconstruct image samples. The flexible combinations between bases and hallucination networks, therefore, equip the distilled data with exponential informativeness gain, which largely increase the representation capability of distilled datasets. To furthermore increase the data efficiency of compression results, we further introduce a pair of adversarial contrastive constraints on the resultant hallucination networks and bases, which increase the diversity of generated images and inject more discriminant information into the factorization. Extensive comparisons and experiments demonstrate that our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65\%. Moreover, distilled datasets by our approach also achieve \textasciitilde10\% higher accuracy than baseline methods in cross-architecture generalization. Our code is available \href{https://github.com/Huage001/DatasetFactorization}{here}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题