物联网网络的深入强化学习：信息和能源成本的年龄折衷

论文标题

物联网网络的深入强化学习：信息和能源成本的年龄折衷

Deep Reinforcement Learning for IoT Networks: Age of Information and Energy Cost Tradeoff

论文作者

Wu, Xiongwei, Li, Xiuhua, Li, Jun, Ching, P. C., Poor, H. Vincent

论文摘要

在大多数物联网（IoT）网络中，Edge节点通常用于继电器与IoT传感器生成的缓存传感数据以及为数据消费者提供通信服务。但是，物联网感知的一个关键问题是数据通常是瞬态的，这需要对缓存内容项进行时间更新，而频繁的缓存更新可能会导致可观的能源成本并挑战物联网传感器的寿命。为了解决这个问题，我们采用信息时代（AOI）来量化数据的新鲜度，并提出在线缓存更新计划，以在平均AOI和能源成本之间进行有效的权衡。具体而言，我们首先通过合并成功的传输条件来开发物联网传感器的传输能量消耗的表征。然后，我们将缓存更新建模为马尔可夫决策过程，以最大程度地降低平均加权成本，并明智地定义国家，行动和奖励。由于用户对内容项的偏好通常是未知的，并且通常在时间上发展，因此我们开发了深入的增强学习（DRL）算法来启用智能缓存更新。通过反复试验，可以学习有效的缓存政策，而无需确切了解内容受欢迎程度。仿真结果证明了所提出的框架的优越性。

In most Internet of Things (IoT) networks, edge nodes are commonly used as to relays to cache sensing data generated by IoT sensors as well as provide communication services for data consumers. However, a critical issue of IoT sensing is that data are usually transient, which necessitates temporal updates of caching content items while frequent cache updates could lead to considerable energy cost and challenge the lifetime of IoT sensors. To address this issue, we adopt the Age of Information (AoI) to quantify data freshness and propose an online cache update scheme to obtain an effective tradeoff between the average AoI and energy cost. Specifically, we first develop a characterization of transmission energy consumption at IoT sensors by incorporating a successful transmission condition. Then, we model cache updating as a Markov decision process to minimize average weighted cost with judicious definitions of state, action, and reward. Since user preference towards content items is usually unknown and often temporally evolving, we therefore develop a deep reinforcement learning (DRL) algorithm to enable intelligent cache updates. Through trial-and-error explorations, an effective caching policy can be learned without requiring exact knowledge of content popularity. Simulation results demonstrate the superiority of the proposed framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题