论文标题
学习团队的决策
Learning Team Decisions
论文作者
论文摘要
在本文中,我们处理线性二次团队的决策问题,其中一组代理将凸二次成本功能最小化,超过$ t $ t $时间步长,但可能是对自然状态的不同线性测量。我们假设自然状态是高斯随机变量,并且代理不知道成本函数,也不知道将自然状态映射到其测量的线性函数。我们提出了一种基于梯度的算法,预期的是$ o(\ log(t))$,用于完整信息梯度反馈,$ o(\ sqrt(t))$(用于强盗反馈)。在匪徒反馈的情况下,预期的遗憾具有额外的乘法项$ o(d)$,其中$ d $反映了学习的参数的数量。
In this paper, we treat linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over $T$ time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions mapping the state of nature to their measurements. We present a gradient-descent based algorithm with an expected regret of $O(\log(T))$ for full information gradient feedback and $O(\sqrt(T))$ for bandit feedback. In the case of bandit feedback, the expected regret has an additional multiplicative term $O(d)$ where $d$ reflects the number of learned parameters.