深度学习射频系统的培训数据增强

论文标题

深度学习射频系统的培训数据增强

Training Data Augmentation for Deep Learning Radio Frequency Systems

论文作者

Clark IV, William H., Hauser, Steven, Headley, William C., Michaels, Alan J.

论文摘要

机器学习的应用受三个主要组成部分，这些组件有助于最终性能指标。在神经网络的类别中，具体来说，前两个是正在训练的模型和所使用的训练方法的架构。这项工作着重于第三部分，即培训期间使用的数据。出现的主要问题是``数据中的内容''和``数据中的重要性？''，探讨了自动调制分类（AMC）的射频机器学习（RFML）字段（AMC）的一个示例，以作为情境意识的工具，作为综合，捕获的数据的使用以及可提供的量级效果和优质的数据，并获得了良好的范围，并提供了相比的范围。这项工作中讨论了三个问题：（1）在不考虑合成中的环境时，预期由合成训练的系统有效，（2）如何在RFML域内利用增强性，而在RFML域内，（3）最后，（3）降解对传输通道造成的信号的知识有助于传输通道对系统性能的传输渠道产生的影响。通常，所检查的数据类型对最终应用具有有用的贡献，但是对预期用例的捕获数据将始终提供更重要的信息并实现最大的性能。尽管被捕获的数据受益，但实时收集带来的困难和成本通常会使达到峰值性能不切实际所需的数据数量。本文有助于量化真实数据和合成数据之间的平衡，从而提供具体示例，其中训练数据的大小和源源在参数方面有所不同。

Applications of machine learning are subject to three major components that contribute to the final performance metrics. Within the category of neural networks, and deep learning specifically, the first two are the architecture for the model being trained and the training approach used. This work focuses on the third component, the data used during training. The primary questions that arise are ``what is in the data'' and ``what within the data matters?'' Looking into the Radio Frequency Machine Learning (RFML) field of Automatic Modulation Classification (AMC) as an example of a tool used for situational awareness, the use of synthetic, captured, and augmented data are examined and compared to provide insights about the quantity and quality of the available data necessary to achieve desired performance levels. There are three questions discussed within this work: (1) how useful a synthetically trained system is expected to be when deployed without considering the environment within the synthesis, (2) how can augmentation be leveraged within the RFML domain, and lastly, (3) what impact knowledge of degradations to the signal caused by the transmission channel contributes to the performance of a system. In general, the examined data types each have useful contributions to a final application, but captured data germane to the intended use case will always provide more significant information and enable the greatest performance. Despite the benefit of captured data, the difficulties and costs that arise from live collection often make the quantity of data needed to achieve peak performance impractical. This paper helps quantify the balance between real and synthetic data, offering concrete examples where training data is parametrically varied in size and source.

下载PDF全文

下载文献需遵守相关版权规定

论文标题