学习团队的决策

论文标题

学习团队的决策

Learning Team Decisions

论文作者

Kjellqvist, Olle, Gattami, Ather

论文摘要

在本文中，我们处理线性二次团队的决策问题，其中一组代理将凸二次成本功能最小化，超过$ t $ t $时间步长，但可能是对自然状态的不同线性测量。我们假设自然状态是高斯随机变量，并且代理不知道成本函数，也不知道将自然状态映射到其测量的线性函数。我们提出了一种基于梯度的算法，预期的是$ o（\ log（t））$，用于完整信息梯度反馈，$ o（\ sqrt（t））$（用于强盗反馈）。在匪徒反馈的情况下，预期的遗憾具有额外的乘法项$ o（d）$，其中$ d $反映了学习的参数的数量。

In this paper, we treat linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over $T$ time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions mapping the state of nature to their measurements. We present a gradient-descent based algorithm with an expected regret of $O(\log(T))$ for full information gradient feedback and $O(\sqrt(T))$ for bandit feedback. In the case of bandit feedback, the expected regret has an additional multiplicative term $O(d)$ where $d$ reflects the number of learned parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题