几乎没有原型的超晶体建模

论文标题

几乎没有原型的超晶体建模

Few-shot Classification with Hypersphere Modeling of Prototypes

论文作者

Ding, Ning, Chen, Yulin, Cui, Ganqu, Wang, Xiaobin, Zheng, Hai-Tao, Liu, Zhiyuan, Xie, Pengjun

论文摘要

基于公制的元学习是几次学习的事实上标准之一。它组成了表示学习和指标计算设计。以前的工作以不同的方式构建类表示，从平均输出嵌入到协方差和分布不同。但是，在空间中使用嵌入缺乏表达性，无法稳健地捕获类信息，而统计复杂建模对公制设计构成困难。在这项工作中，我们使用张量字段（``区域''）从几何学的角度来对类建模，以进行几次学习。我们提出了一种简单有效的方法，称为Hypersphere原型（HyperProto），其中类信息由具有两组可学习参数的动态大小的高孔表示：Hypersphere的中心和半径。从点到区域延伸，超球比嵌入更具表现力。此外，与统计建模相比，用超晶原型进行基于度量的分类更方便，因为我们只需要计算从数据点到Hypersphere表面的距离即可。遵循这个想法，我们还在其他测量值下开发了两种原型。对NLP和CV的几次学习任务进行的广泛实验和分析以及与20+竞争基线的比较证明了我们方法的有效性。

Metric-based meta-learning is one of the de facto standards in few-shot learning. It composes of representation learning and metrics calculation designs. Previous works construct class representations in different ways, varying from mean output embedding to covariance and distributions. However, using embeddings in space lacks expressivity and cannot capture class information robustly, while statistical complex modeling poses difficulty to metric designs. In this work, we use tensor fields (``areas'') to model classes from the geometrical perspective for few-shot learning. We present a simple and effective method, dubbed hypersphere prototypes (HyperProto), where class information is represented by hyperspheres with dynamic sizes with two sets of learnable parameters: the hypersphere's center and the radius. Extending from points to areas, hyperspheres are much more expressive than embeddings. Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere. Following this idea, we also develop two variants of prototypes under other measurements. Extensive experiments and analysis on few-shot learning tasks across NLP and CV and comparison with 20+ competitive baselines demonstrate the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题