论文标题
从图表图像中提取数据的张量字段:条形图和散点图
Tensor Fields for Data Extraction from Chart Images: Bar Charts and Scatter Plots
论文作者
论文摘要
图表是图形(图形素养)和统计素养的重要组成部分。随着图表的理解在数据科学中变得越来越重要,通过处理图表的栅格图像来自动化图表分析已成为一个重要的问题。自动图读数涉及数据提取和对数据图像中数据的上下文理解。在本文中,我们执行了确定所选图表类型数据提取的图表图像的计算模型的第一步,即条形图和散点图。我们证明了阳性半限定二阶张量场作为有效模型。我们将适当的张量字段确定为模型,并提出了一种方法,用于使用其简并提取从图表图像中提取数据的方法。我们的结果表明,作为条形图的特殊情况,张量投票可有效从条形图和散点图和直方图中提取数据。
Charts are an essential part of both graphicacy (graphical literacy), and statistical literacy. As chart understanding has become increasingly relevant in data science, automating chart analysis by processing raster images of the charts has become a significant problem. Automated chart reading involves data extraction and contextual understanding of the data from chart images. In this paper, we perform the first step of determining the computational model of chart images for data extraction for selected chart types, namely, bar charts, and scatter plots. We demonstrate the use of positive semidefinite second-order tensor fields as an effective model. We identify an appropriate tensor field as the model and propose a methodology for the use of its degenerate point extraction for data extraction from chart images. Our results show that tensor voting is effective for data extraction from bar charts and scatter plots, and histograms, as a special case of bar charts.