基于生成对抗网络的非线性上下文匪徒的快速在线推断

论文标题

基于生成对抗网络的非线性上下文匪徒的快速在线推断

Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network

论文作者

Da Tsai, Yun, De Lin, Shou

论文摘要

这项工作解决了当武器$ n $的数量很大时推断非线性上下文匪徒的效率关注点。我们提出了一个具有端到端训练过程的神经匪徒模型，以在推理过程中有效地执行汤普森采样和UCB等匪徒。我们将最新的时间复杂性提高到$ o（\ log n）$，具有近似贝叶斯推断，神经随机特征映射，近似全局最大值和近似最近的邻居搜索。我们进一步提出了一个生成的对抗网络，以将最大化最佳手臂从推理时间到训练时间选择最大化目标的瓶颈，享受大幅度的加速，并获得批量和并行处理的其他优势。％生成模型可以通过近似最近的邻居搜索来推断对数时间复杂性的后验抽样的近似值。关于分类和建议任务的广泛实验表明，推理时间的震颤顺序改善了绩效的显着降解。

This work addresses the efficiency concern on inferring a nonlinear contextual bandit when the number of arms $n$ is very large. We propose a neural bandit model with an end-to-end training process to efficiently perform bandit algorithms such as Thompson Sampling and UCB during inference. We advance state-of-the-art time complexity to $O(\log n)$ with approximate Bayesian inference, neural random feature mapping, approximate global maxima and approximate nearest neighbor search. We further propose a generative adversarial network to shift the bottleneck of maximizing the objective for selecting optimal arms from inference time to training time, enjoying significant speedup with additional advantage of enabling batch and parallel processing. %The generative model can inference an approximate argmax of the posterior sampling in logarithmic time complexity with the help of approximate nearest neighbor search. Extensive experiments on classification and recommendation tasks demonstrate order-of-magnitude improvement in inference time no significant degradation on the performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题