探索的超级名模

论文标题

探索的超级名模

Hypermodels for Exploration

论文作者

Dwaracherla, Vikranth, Lu, Xiuyuan, Ibrahimi, Morteza, Osband, Ian, Wen, Zheng, Van Roy, Benjamin

论文摘要

我们研究了代表认知不确定性和指导探索的使用超模的使用。这将概括并扩展使用合奏来近似汤普森采样。训练的计算成本随着其大小而增长，因此，先前的工作通常仅限于具有数十个要素的合奏。我们表明，替代性超模可以享受巨大的效率提高，从而实现否则需要数百或数千个要素的行为，甚至在合奏方法无法学习的情况下，无论大小如何，都可以成功。这允许更准确地近似汤普森采样以及使用更复杂的探索方案。特别是，我们考虑了以信息指导的采样的近似形式，并证明了相对于汤普森采样的性能提高。作为合奏的替代方案，我们考虑线性和神经网络超模，也称为超核。我们证明，使用神经网络基本模型，线性超模可以代表任何功能上的任何分布，因此，超网络不再具有表达性。

We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its size, and as such, prior work has typically been limited to ensembles with tens of elements. We show that alternative hypermodels can enjoy dramatic efficiency gains, enabling behavior that would otherwise require hundreds or thousands of elements, and even succeed in situations where ensemble methods fail to learn regardless of size. This allows more accurate approximation of Thompson sampling as well as use of more sophisticated exploration schemes. In particular, we consider an approximate form of information-directed sampling and demonstrate performance gains relative to Thompson sampling. As alternatives to ensembles, we consider linear and neural network hypermodels, also known as hypernetworks. We prove that, with neural network base models, a linear hypermodel can represent essentially any distribution over functions, and as such, hypernetworks are no more expressive.

下载PDF全文

下载文献需遵守相关版权规定

论文标题