自适应风险趋势：纳米无人机在杂乱的环境中进行分配加固学习

论文标题

自适应风险趋势：纳米无人机在杂乱的环境中进行分配加固学习

Adaptive Risk-Tendency: Nano Drone Navigation in Cluttered Environments with Distributional Reinforcement Learning

论文作者

Liu, Cheng, van Kampen, Erik-Jan, de Croon, Guido C. H. E.

论文摘要

使能够评估风险和做出风险意识的决策的能力对于将强化学习应用于无人机等安全性机器人至关重要。在本文中，我们研究了一种特定情况，即纳米四轮驱动机器人学会在部分可观察性下浏览杂乱无章的环境。我们提出了一个分配加强学习框架，以制定自适应风险趋势政策。具体而言，我们建议将学习回报分布的较低尾巴条件差异作为内在的不确定性估计，并使用指数加权的平均预测（EWAF）根据估计的不确定性调整风险趋势。在模拟和现实世界的经验结果中，我们表明（1）（1）最有效的风险趋势在各州各不相同，（2）具有自适应风险趋势的代理人与风险中立的政策或避开风险的政策基准相比，具有较高的绩效。

Enabling the capability of assessing risk and making risk-aware decisions is essential to applying reinforcement learning to safety-critical robots like drones. In this paper, we investigate a specific case where a nano quadcopter robot learns to navigate an apriori-unknown cluttered environment under partial observability. We present a distributional reinforcement learning framework to generate adaptive risk-tendency policies. Specifically, we propose to use lower tail conditional variance of the learnt return distribution as intrinsic uncertainty estimation, and use exponentially weighted average forecasting (EWAF) to adapt the risk-tendency in accordance with the estimated uncertainty. In simulation and real-world empirical results, we show that (1) the most effective risk-tendency vary across states, (2) the agent with adaptive risk-tendency achieves superior performance compared to risk-neutral policy or risk-averse policy baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题