学习针对程序化弱监督的超标签模型

论文标题

学习针对程序化弱监督的超标签模型

Learning Hyper Label Model for Programmatic Weak Supervision

论文作者

Wu, Renzhi, Chen, Shen-En, Zhang, Jieyu, Chu, Xu

论文摘要

为了减少人类注释努力，程序化弱监督（PWS）范式将弱监督源作为标记功能（LFS）摘要，并涉及标签模型，以汇总多个LFS的输出以产生培训标签。大多数现有标签模型都需要每个数据集的参数学习步骤。在这项工作中，我们提出了一个超级标签模型，该模型（一旦学习）在没有数据集特定参数学习的情况下，将每个数据集的地面真实标签渗透。超级标签模型近似地面真相标签的最佳分析（但计算上棘手）的解决方案。我们以确保模型近似分析最佳解决方案的方式生成的合成数据来训练模型，并在图形神经网络（GNN）上构建模型，以确保模型预测是不变（或等效）到LFS（或数据点）的允许的。在14个现实世界数据集中，我们的超标签模型以准确性（平均为1.4点）和效率（平均降低六倍）以优于现有方法。我们的代码可从https://github.com/wurenzhi/hyper_label_model获得

To reduce the human annotation efforts, the programmatic weak supervision (PWS) paradigm abstracts weak supervision sources as labeling functions (LFs) and involves a label model to aggregate the output of multiple LFs to produce training labels. Most existing label models require a parameter learning step for each dataset. In this work, we present a hyper label model that (once learned) infers the ground-truth labels for each dataset in a single forward pass without dataset-specific parameter learning. The hyper label model approximates an optimal analytical (yet computationally intractable) solution of the ground-truth labels. We train the model on synthetic data generated in the way that ensures the model approximates the analytical optimal solution, and build the model upon Graph Neural Network (GNN) to ensure the model prediction being invariant (or equivariant) to the permutation of LFs (or data points). On 14 real-world datasets, our hyper label model outperforms the best existing methods in both accuracy (by 1.4 points on average) and efficiency (by six times on average). Our code is available at https://github.com/wurenzhi/hyper_label_model

下载PDF全文

下载文献需遵守相关版权规定

论文标题