论文标题
很棒的数据以及如何查询它们
Fantastic Data and How to Query Them
论文作者
论文摘要
人们普遍承认,大量(培训)数据的可用性是许多人工智能(AI)最近进步的最重要因素之一。但是,数据集通常是为狭窄的AI子区域中特定任务而设计的,并且没有统一的管理和访问它们的方法。这不仅在培训或部署机器学习模型时会创建不必要的开销,而且还限制了对数据的理解,这对于以数据为中心的AI非常重要。在本文中,我们介绍了有关不同数据集的统一框架的愿景,以便可以轻松地集成和查询它们,例如,使用标准查询语言。我们在正在进行的工作中证明了这一点,以在计算机视觉中为数据集创建一个框架,并在不同的情况下显示其优势。我们的演示可在https://vision.semkg.org上获得。
It is commonly acknowledged that the availability of the huge amount of (training) data is one of the most important factors for many recent advances in Artificial Intelligence (AI). However, datasets are often designed for specific tasks in narrow AI sub areas and there is no unified way to manage and access them. This not only creates unnecessary overheads when training or deploying Machine Learning models but also limits the understanding of the data, which is very important for data-centric AI. In this paper, we present our vision about a unified framework for different datasets so that they can be integrated and queried easily, e.g., using standard query languages. We demonstrate this in our ongoing work to create a framework for datasets in Computer Vision and show its advantages in different scenarios. Our demonstration is available at https://vision.semkg.org.