笔迹的奖励是什么？ - 通过模仿学习产生手写

论文标题

笔迹的奖励是什么？ - 通过模仿学习产生手写

What is the Reward for Handwriting? -- Handwriting Generation by Imitation Learning

论文作者

Kanda, Keisuke, Iwana, Brian Kenji, Uchida, Seiichi

论文摘要

分析手写生成过程是一个重要的问题，已经通过各种生成模型（例如基于运动学的模型和随机模型）解决了问题。在这项研究中，我们使用强化学习（RL）框架以仔细的未来计划能力来实现手写生成。实际上，人类的笔迹过程也得到了未来的计划能力的支持。例如，该功能对于生成诸如“ 0”之类的封闭轨迹的能力是必要的，因为任何短视模型（例如马尔可夫模型）都无法生成它。对于算法，我们采用生成的对抗模仿学习（Gail）。典型的RL算法需要手动定义奖励功能，这对于控制生成过程至关重要。相比之下，盖尔（Gail）与框架的其他模块一起训练奖励功能。换句话说，通过盖尔，我们可以通过手写示例理解手写生成过程的回报。我们的实验结果定性和定量表明，学习的奖励抓住了手写生成的趋势，因此盖尔非常适合获取手写行为。

Analyzing the handwriting generation process is an important issue and has been tackled by various generation models, such as kinematics based models and stochastic models. In this study, we use a reinforcement learning (RL) framework to realize handwriting generation with the careful future planning ability. In fact, the handwriting process of human beings is also supported by their future planning ability; for example, the ability is necessary to generate a closed trajectory like '0' because any shortsighted model, such as a Markovian model, cannot generate it. For the algorithm, we employ generative adversarial imitation learning (GAIL). Typical RL algorithms require the manual definition of the reward function, which is very crucial to control the generation process. In contrast, GAIL trains the reward function along with the other modules of the framework. In other words, through GAIL, we can understand the reward of the handwriting generation process from handwriting examples. Our experimental results qualitatively and quantitatively show that the learned reward catches the trends in handwriting generation and thus GAIL is well suited for the acquisition of handwriting behavior.

下载PDF全文

下载文献需遵守相关版权规定

论文标题