论文标题

综合数据:打开数据闸门以使机器学习方法更快,更有方向发展

Synthetic Data: Opening the data floodgates to enable faster, more directed development of machine learning methods

论文作者

Jordon, James, Wilson, Alan, van der Schaar, Mihaela

论文摘要

机器学习中许多开创性的进步可以归因于大量丰富数据的可用性。不幸的是,许多大规模数据集都非常敏感,例如医疗保健数据,并且没有可用于机器学习社区。使用隐私保证生成合成数据提供了一种这样的解决方案,可以通过允许整个机器学习社区的整个机器学习社区在给定领域内的进步加速,从而可以“大规模”进行有意义的研究。在本文中,我们提供了合成数据的高级视图:它的含义,如何评估它以及如何使用它。

Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data. Unfortunately, many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community. Generating synthetic data with privacy guarantees provides one such solution, allowing meaningful research to be carried out "at scale" - by allowing the entirety of the machine learning community to potentially accelerate progress within a given field. In this article, we provide a high-level view of synthetic data: what it means, how we might evaluate it and how we might use it.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源