论文标题
基于加强学习的投资组合管理具有增强资产运动预测状态
Reinforcement-Learning based Portfolio Management with Augmented Asset Movement Prediction States
论文作者
论文摘要
投资组合管理(PM)是一项基本的财务计划任务,旨在实现最大利润或最小风险等投资目标。它的决策过程涉及从各种数据源和顺序决策优化的有价值信息的连续推导,这是强化学习的前瞻性研究方向(RL)。在本文中,我们提出了SARL,这是一个新颖的国家调查的RL框架。我们的框架旨在应对财务PM中的两个独特挑战:(1)数据异质性 - 每个资产的收集信息通常是多种多样的,嘈杂的和不平衡的(例如,新闻文章); (2)环境不确定性 - 金融市场用途广泛且非平稳。为了纳入异质数据并增强针对环境不确定性的鲁棒性,我们的SARL将资产信息及其价格移动预测作为其他状态增强,该州可以仅基于财务数据(例如资产价格)或源自新闻等替代来源而得出的预测。 (i)比特币市场和(ii)拥有7年路透社新闻文章的Hightech股票市场的实验,验证了SARL对现有PM方法的有效性,无论是在累计利润和风险调整后的利润方面。此外,进行了广泛的仿真,以证明我们提出的州扩展的重要性,提供了新的见解,并在基于标准的RL基于标准的PM方法和其他基线方面显着提高了性能。
Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinforcement learning (RL). In this paper, we propose SARL, a novel State-Augmented RL framework for PM. Our framework aims to address two unique challenges in financial PM: (1) data heterogeneity -- the collected information for each asset is usually diverse, noisy and imbalanced (e.g., news articles); and (2) environment uncertainty -- the financial market is versatile and non-stationary. To incorporate heterogeneous data and enhance robustness against environment uncertainty, our SARL augments the asset information with their price movement prediction as additional states, where the prediction can be solely based on financial data (e.g., asset prices) or derived from alternative sources such as news. Experiments on two real-world datasets, (i) Bitcoin market and (ii) HighTech stock market with 7-year Reuters news articles, validate the effectiveness of SARL over existing PM approaches, both in terms of accumulated profits and risk-adjusted profits. Moreover, extensive simulations are conducted to demonstrate the importance of our proposed state augmentation, providing new insights and boosting performance significantly over standard RL-based PM method and other baselines.