论文标题
可行和多样化人群综合的深层生成模型
A Deep Generative Model for Feasible and Diverse Population Synthesis
论文作者
论文摘要
理想的合成人群,这是基于活动模型的关键输入,模仿了实际人群中个体和家庭水平属性的分布。由于整个人群的属性通常不可用,因此使用家庭旅行调查(HTS)样本进行人口综合。通过直接从HTS取样来合成种群,忽略了HTS样本中未观察到但存在于种群中的属性组合,称为“采样零”。深层生成模型(DGM)可以潜在地综合采样零,但要以产生“结构零”(即,种群中不存在的属性组合)为代价。这项研究提出了一种新的方法,可以在保留采样零的同时最小化结构零。设计了两个正规化,以自定义DGM的培训,并应用于生成的对抗网络(GAN)和变量自动编码器(VAE)。通过合成人群的可行性和多样性的指标表明,产生采样和结构零的能力 - 较低的结构零和较低的采样零表明可行性和较低的多样性。结果表明,所提出的正规化可在综合人群的可行性和多样性方面具有相当大的绩效提高,而不是传统模型。拟议的VAE还以79.2%的精度(即20.8%的结构零率)产生了23.5%的人口,而拟议的GAN产生了18.3%的被忽视人群,精度为89.0%。拟议的DGM改进产生了更可行和多样化的合成人群,这对于基于活动模型的准确性至关重要。
An ideal synthetic population, a key input to activity-based models, mimics the distribution of the individual- and household-level attributes in the actual population. Since the entire population's attributes are generally unavailable, household travel survey (HTS) samples are used for population synthesis. Synthesizing population by directly sampling from HTS ignores the attribute combinations that are unobserved in the HTS samples but exist in the population, called 'sampling zeros'. A deep generative model (DGM) can potentially synthesize the sampling zeros but at the expense of generating 'structural zeros' (i.e., the infeasible attribute combinations that do not exist in the population). This study proposes a novel method to minimize structural zeros while preserving sampling zeros. Two regularizations are devised to customize the training of the DGM and applied to a generative adversarial network (GAN) and a variational autoencoder (VAE). The adopted metrics for feasibility and diversity of the synthetic population indicate the capability of generating sampling and structural zeros -- lower structural zeros and lower sampling zeros indicate the higher feasibility and the lower diversity, respectively. Results show that the proposed regularizations achieve considerable performance improvement in feasibility and diversity of the synthesized population over traditional models. The proposed VAE additionally generated 23.5% of the population ignored by the sample with 79.2% precision (i.e., 20.8% structural zeros rates), while the proposed GAN generated 18.3% of the ignored population with 89.0% precision. The proposed improvement in DGM generates a more feasible and diverse synthetic population, which is critical for the accuracy of an activity-based model.