论文标题
关于上下文线性匪徒中表示的复杂性
On the Complexity of Representation Learning in Contextual Linear Bandits
论文作者
论文摘要
在上下文线性匪徒中,假定奖励函数是未知奖励向量和上下文臂对嵌入的线性组合。实际上,嵌入通常与奖励向量同时学习,从而导致在线表示学习问题。现有的在上下文匪徒中的表示学习方法是非常通用的(例如,使用任意功能类别的模型选择技术或学习算法),或者专门针对特定结构(例如,具有某些光谱属性的嵌套特征或表示)。结果,在上下文线性匪徒中对表示成本学习的理解仍然有限。在本文中,我们采用系统的方法来解决该问题,并通过实例依赖性观点提供全面的研究。我们表明,表示学习在根本上比线性匪徒(即用给定表示形式学习)更为复杂。特别是,用给定的一组表示的学习从来没有比在集合中使用最糟糕的可实现表示形式学习更简单,而我们显示出可能更困难的情况。我们通过广泛讨论与现有文献之间的关系进行了广泛的讨论,并说明了代表学习与以固定表示的学习一样复杂的积极实例,以及可以实现次数遗憾的情况。
In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs. In practice, the embedding is often learned at the same time as the reward vector, thus leading to an online representation learning problem. Existing approaches to representation learning in contextual bandits are either very generic (e.g., model-selection techniques or algorithms for learning with arbitrary function classes) or specialized to particular structures (e.g., nested features or representations with certain spectral properties). As a result, the understanding of the cost of representation learning in contextual linear bandit is still limited. In this paper, we take a systematic approach to the problem and provide a comprehensive study through an instance-dependent perspective. We show that representation learning is fundamentally more complex than linear bandits (i.e., learning with a given representation). In particular, learning with a given set of representations is never simpler than learning with the worst realizable representation in the set, while we show cases where it can be arbitrarily harder. We complement this result with an extensive discussion of how it relates to existing literature and we illustrate positive instances where representation learning is as complex as learning with a fixed representation and where sub-logarithmic regret is achievable.