论文标题

通过Wikipedia通过基于句子的监督学习的全球城市运输类型学预测

Worldwide city transport typology prediction with sentence-BERT based supervised learning via Wikipedia

论文作者

Rath, Srushti, Chow, Joseph Y. J.

论文摘要

世界上绝大多数人口都生活在城市和城市。了解城市的运输类型学对于计划者和决策者的决定可能会影响数百万的城市居民,这是非常有价值的。尽管有理解城市类型学的价值,但标记的数据(城市及其类型学)还是很少的,并且在当前运输文献中最多有几百个城市。为了打破这一障碍,我们提出了一种监督的机器学习方法,以预测城市的类型学,鉴于其Wikipedia页面中的信息。我们的方法利用自然语言处理的最新突破,即句子 - 伯伯特,并显示了如何有效地将来自Wikipedia的基于文本的信息用作城市类型学预测任务的数据源,这些任务可用于全球2000多个城市。我们提出了一种使用城市的Wikipedia页面来实现低维修城市代表的新方法,该页面即使有几百个标记的样本也可以监督对城市类型学标签的监督学习。这些功能与标签的城市样本一起用于训练四种不同的城市类型的二进制分类器(逻辑回归):(i)充血,(ii)自动繁殖,(iii)过渡性繁殖,以及(iv)自行车友好型城市,导致AUC的高度高度高0.87,0.86,0.86,0.61,0.61和0.94和0.94和0.94。我们的方法提供了足够的灵活性,可以在城市类型学模型中纳入其他变量,并且还可以应用于其他城市类型。我们的发现可以为运输和城市规划领域的各种利益相关者提供帮助,并为使用Wikipedia(或类似平台)的基于文本的信息作为此类领域的数据源开辟了新的机会。

An overwhelming majority of the world's human population lives in urban areas and cities. Understanding a city's transportation typology is immensely valuable for planners and policy makers whose decisions can potentially impact millions of city residents. Despite the value of understanding a city's typology, labeled data (city and it's typology) is scarce, and spans at most a few hundred cities in the current transportation literature. To break this barrier, we propose a supervised machine learning approach to predict a city's typology given the information in its Wikipedia page. Our method leverages recent breakthroughs in natural language processing, namely sentence-BERT, and shows how the text-based information from Wikipedia can be effectively used as a data source for city typology prediction tasks that can be applied to over 2000 cities worldwide. We propose a novel method for low-dimensional city representation using a city's Wikipedia page, which makes supervised learning of city typology labels tractable even with a few hundred labeled samples. These features are used with labeled city samples to train binary classifiers (logistic regression) for four different city typologies: (i) congestion, (ii) auto-heavy, (iii) transit-heavy, and (iv) bike-friendly cities resulting in reasonably high AUC scores of 0.87, 0.86, 0.61 and 0.94 respectively. Our approach provides sufficient flexibility for incorporating additional variables in the city typology models and can be applied to study other city typologies as well. Our findings can assist a diverse group of stakeholders in transportation and urban planning fields, and opens up new opportunities for using text-based information from Wikipedia (or similar platforms) as data sources in such fields.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源