论文标题

学习团队的决策

Learning Team Decisions

论文作者

Kjellqvist, Olle, Gattami, Ather

论文摘要

在本文中,我们处理线性二次团队的决策问题,其中一组代理将凸二次成本功能最小化,超过$ t $ t $时间步长,但可能是对自然状态的不同线性测量。我们假设自然状态是高斯随机变量,并且代理不知道成本函数,也不知道将自然状态映射到其测量的线性函数。我们提出了一种基于梯度的算法,预期的是$ o(\ log(t))$,用于完整信息梯度反馈,$ o(\ sqrt(t))$(用于强盗反馈)。在匪徒反馈的情况下,预期的遗憾具有额外的乘法项$ o(d)$,其中$ d $反映了学习的参数的数量。

In this paper, we treat linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over $T$ time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions mapping the state of nature to their measurements. We present a gradient-descent based algorithm with an expected regret of $O(\log(T))$ for full information gradient feedback and $O(\sqrt(T))$ for bandit feedback. In the case of bandit feedback, the expected regret has an additional multiplicative term $O(d)$ where $d$ reflects the number of learned parameters.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源