论文标题

通过使用生成的对抗网络在数据中心中通过多元方案进行数据增强

Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks

论文作者

Pérez, Jaime, Arroba, Patricia, Moya, José M.

论文摘要

云范式正处于现有能源效率技术达到高原的关键点,而数据中心设施的计算资源需求仍在呈指数增长。实现基于人工智能的全球能源效率策略的主要挑战是,我们需要大量数据来喂养算法。本文提出了基于数据中心内的综合场景预测的时间序列数据增强方法。为此,我们将实施一种强大的生成算法:生成对抗网络(GAN)。具体而言,我们的工作结合了基于GAN的数据增强和方案预测的学科,从而填补了DC中合成数据的生成的空白。此外,我们提出了一种方法,以通过引入按需异常而没有额外的努力或专家知识来提高生成数据的可变性和异质性。我们还建议使用Kullback-Leibler差异和平方误差作为合成时间序列生成验证的新指标,因为它们可以更好地比较多变量数据分布。我们使用在操作数据中心中收集的实际数据验证我们的方法,成功地生成了对预测和优化模型有助于的合成数据。我们的研究将有助于优化数据中心消耗的能量,尽管该方法可以在任何类似时间序列的问题中使用。

The Cloud paradigm is at a critical point in which the existing energy-efficiency techniques are reaching a plateau, while the computing resources demand at Data Center facilities continues to increase exponentially. The main challenge in achieving a global energy efficiency strategy based on Artificial Intelligence is that we need massive amounts of data to feed the algorithms. This paper proposes a time-series data augmentation methodology based on synthetic scenario forecasting within the Data Center. For this purpose, we will implement a powerful generative algorithm: Generative Adversarial Networks (GANs). Specifically, our work combines the disciplines of GAN-based data augmentation and scenario forecasting, filling the gap in the generation of synthetic data in DCs. Furthermore, we propose a methodology to increase the variability and heterogeneity of the generated data by introducing on-demand anomalies without additional effort or expert knowledge. We also suggest the use of Kullback-Leibler Divergence and Mean Squared Error as new metrics in the validation of synthetic time series generation, as they provide a better overall comparison of multivariate data distributions. We validate our approach using real data collected in an operating Data Center, successfully generating synthetic data helpful for prediction and optimization models. Our research will help optimize the energy consumed in Data Centers, although the proposed methodology can be employed in any similar time-series-like problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源