通过帕累托测试有效控制多种风险

论文标题

通过帕累托测试有效控制多种风险

Efficiently Controlling Multiple Risks with Pareto Testing

论文作者

Laufer-Goldshtein, Bracha, Fisch, Adam, Barzilay, Regina, Jaakkola, Tommi

论文摘要

机器学习应用程序经常带有多种不同的目标和约束，这些目标可能会随着时间而变化。因此，训练有素的模型可以用影响其预测行为的一组超参数（例如，其运行时效率与错误率）。随着约束数量和超参数维度的增加，天真选择的设置可能会导致次优和/或不可靠的结果。我们开发了一种有效的方法来校准模型，以便它们的预测可以满足多个明确和同时的统计保证（例如，上限错误率），同时还可以优化任意数量的任何其他额外的，无约束的目标（例如，总运行时间成本）。在最新的无分配，有限样本风险控制的结果基础上，我们提出了帕累托测试：一个两阶段的过程，将多目标优化与多个假设测试相结合。优化阶段在帕累托边境上构建了一组有希望的组合。然后，我们将统计测试应用于此边界，仅识别具有（i）具有（i）高实用性的配置，以及（ii）对我们的约束保证风险水平，具有特定的高概率。我们证明了我们的方法可靠地加速自然语言处理（NLP）应用中的大规模变压器模型的有效性。特别是，我们展示了如何使用帕累托测试来动态配置多个相互依赖的模型属性 - 包括在退出之前计算的层数，修剪的注意力头数或所考虑的文本令牌数量 - 以同时控制和优化各种准确性和优化各种准确性和成本度量。

Machine learning applications frequently come with multiple diverse objectives and constraints that can change over time. Accordingly, trained models can be tuned with sets of hyper-parameters that affect their predictive behavior (e.g., their run-time efficiency versus error rate). As the number of constraints and hyper-parameter dimensions grow, naively selected settings may lead to sub-optimal and/or unreliable results. We develop an efficient method for calibrating models such that their predictions provably satisfy multiple explicit and simultaneous statistical guarantees (e.g., upper-bounded error rates), while also optimizing any number of additional, unconstrained objectives (e.g., total run-time cost). Building on recent results in distribution-free, finite-sample risk control for general losses, we propose Pareto Testing: a two-stage process which combines multi-objective optimization with multiple hypothesis testing. The optimization stage constructs a set of promising combinations on the Pareto frontier. We then apply statistical testing to this frontier only to identify configurations that have (i) high utility with respect to our objectives, and (ii) guaranteed risk levels with respect to our constraints, with specifiable high probability. We demonstrate the effectiveness of our approach to reliably accelerate the execution of large-scale Transformer models in natural language processing (NLP) applications. In particular, we show how Pareto Testing can be used to dynamically configure multiple inter-dependent model attributes -- including the number of layers computed before exiting, number of attention heads pruned, or number of text tokens considered -- to simultaneously control and optimize various accuracy and cost metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题