新颖的增强学习算法，用于抑制封闭环的深脑刺激剂中的同步

论文标题

新颖的增强学习算法，用于抑制封闭环的深脑刺激剂中的同步

Novel Reinforcement Learning Algorithm for Suppressing Synchronization in Closed Loop Deep Brain Stimulators

论文作者

Agarwal, Harsh, Rathore, Heena

论文摘要

帕金森氏病的特征是大脑病理振荡的发射特征改变和增加。换句话说，它会在神经系统处理过程中引起异常的同步振荡和抑制。为了检查和调节运动电路中的同步和病理振荡，使用了深脑刺激器（DBS）。尽管已经将机器学习方法应用于抑制作用，但这些模型需要大量的培训数据和计算能力，这两者都对资源受限的DBS构成了挑战。这项研究提出了一种新颖的增强学习（RL）框架，用于抑制神经元疾病发作中神经元活性的同步，功耗较少。所提出的RL算法包括刺激的时间表示和双重延迟的深层确定性（TD3）策略梯度算法的合奏。我们量化了提出的框架对噪声的稳定性，并使用RL对三种病理信号传导状态进行了降低的同步：常规，混乱和破裂，并进一步消除了不良的振荡。此外，使用了诸如评估奖励，提供给合奏的能量的指标以及融合的平均值，并将其与其他RL算法，特别是优势演员评论家（A2C），Kronecker具有Kronecker fealured Trust Region（ACKTR）的演员评论家以及近代政策优化（PPO）。

Parkinson's disease is marked by altered and increased firing characteristics of pathological oscillations in the brain. In other words, it causes abnormal synchronous oscillations and suppression during neurological processing. In order to examine and regulate the synchronization and pathological oscillations in motor circuits, deep brain stimulators (DBS) are used. Although machine learning methods have been applied for the investigation of suppression, these models require large amounts of training data and computational power, both of which pose challenges to resource-constrained DBS. This research proposes a novel reinforcement learning (RL) framework for suppressing the synchronization in neuronal activity during episodes of neurological disorders with less power consumption. The proposed RL algorithm comprises an ensemble of a temporal representation of stimuli and a twin-delayed deep deterministic (TD3) policy gradient algorithm. We quantify the stability of the proposed framework to noise and reduced synchrony using RL for three pathological signaling regimes: regular, chaotic, and bursting, and further eliminate the undesirable oscillations. Furthermore, metrics such as evaluation rewards, energy supplied to the ensemble, and the mean point of convergence were used and compared to other RL algorithms, specifically the Advantage actor critic (A2C), the Actor critic with Kronecker-featured trust region (ACKTR), and the Proximal policy optimization (PPO).

下载PDF全文

下载文献需遵守相关版权规定

论文标题