通过增强学习

论文标题

通过增强学习

Autonomous Six-Degree-of-Freedom Spacecraft Docking Maneuvers via Reinforcement Learning

论文作者

Oestreich, Charles E., Linares, Richard, Gondhalekar, Ravi

论文摘要

通过强化学习制定了六度对接演习的政策，并作为反馈控制法实施。强化学习为在板上计算成本低的不确定环境中提供了一个潜在的框架。具体而言，近端策略优化用于制定对接策略，该策略在六度的状态空间的一部分中有效，同时努力最大程度地降低性能和控制成本。使用模拟Apollo换位和对接操作的实验表现出策略的功能，并与标准最佳控制技术进行了比较。此外，讨论了对停靠政策的强化学习的好处和缺点的讨论，以促进未来的研究。因此，这项工作将成为对不确定环境中航天器接近操作的基于学习的控制法的进一步研究的基础。

A policy for six-degree-of-freedom docking maneuvers is developed through reinforcement learning and implemented as a feedback control law. Reinforcement learning provides a potential framework for robust, autonomous maneuvers in uncertain environments with low on-board computational cost. Specifically, proximal policy optimization is used to produce a docking policy that is valid over a portion of the six-degree-of-freedom state-space while striving to minimize performance and control costs. Experiments using the simulated Apollo transposition and docking maneuver exhibit the policy's capabilities and provide a comparison with standard optimal control techniques. Furthermore, specific challenges and work-arounds, as well as a discussion on the benefits and disadvantages of reinforcement learning for docking policies, are discussed to facilitate future research. As such, this work will serve as a foundation for further investigation of learning-based control laws for spacecraft proximity operations in uncertain environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题