Movesim
Movesim
• 人员流动数据的真实模拟在流行病传播建模和相关卫生政策制定中具有重要意义
• 本文贡献:
– 提出了一种框架,结合了基于模型的方法和无模型方法的优点
– 利用人类移动的先验知识优化了模型的收敛速率和表现效果
– 在两个公共数据集上做了实验,并超过了现有的 SOTA 模型
[1] Shan Jiang, Yingxiang Yang, Siddharth Gupta, Daniele Veneziano, Shounak Athavale, and Marta C González. 2016. The TimeGeo modeling
framework for urban mobility without travel surveys. PNAS 113, 37 (2016).
[2] Mogeng Yin, Madeleine Sheehan, Sidney Feygin, Jean-François Paiement, and Alexei Pozdnoukhov. 2017. A generative model of urban activities
from cellular data. IEEE Transactions on Intelligent Transportation Systems 19, 6 (2017).
问题定义 05
• 给定真实世界中的一个数据集,生成一段尽可能真的移动轨迹
– 表示成数学公式 : 𝑛
^ ∏ ❑𝑝 𝜽 ( 𝑥^ 𝑖 ∣ 𝑥^ 1 ,…, ^𝑥𝑖− 1 )
𝑝 𝜽 ( S)=
𝑖=1
先验知识 06
• 生成对抗网络( GAN )
– 具有两部分:生成器和判别器
– 生成器被训练以达到欺骗判别器的目的
– 判别器被训练以达到识别来自生成器的假数据的目的
– 直到两者处于纳什均衡状态
– 表示成数学形式如下:
𝑚𝑖𝑛 𝑚𝑎𝑥 𝔼 𝐱 ∼ 𝑝 𝑑
¿
𝜽 𝝓
模型方法
2 整体架构
判别器
生成器
训练方法
整体架构 08
生成器 09
生成器 10
SeqNet
1: Sample a location l0 and a fixed start time t0=0
2: Encode space and time as one-hot vectors
3: Concatenate the two vectors into a dense representation vector as x e
4: Project with none-linear operations into three vectors: query vector Qi, key vector Ki, value
vector Vi
5: Apply scaled dot-product attention on them to obtain the weighted sum of value vectors
6: Use multi-head and stacking operations to model the relation from different subspaces
7: Use a linear layer to process the feature and apply the soft-max function to obtain the
probability distribution of the next location
生成器 11
RegNet
1: Build three N×N location relation matrix: physical distance, function similarity, historical
transition
2: Use the probability vector of last generated location to select corresponding relation vector R
from the above matrix
3: Utilize the element-wise multiplication operation to fuse the current generated location
feature vector with the multi-view relation vectors R
• CNN-based Discriminator
CNN-based discriminator
1: a spatial-temporal embedding module to convert raw trajectory sequence S = [x1,…,xn]
into a 2D matrix Xe ∈ Rn×d
2: several convolution layers to extract features from the feature matrix
3: a linear layer with sigmoid activation function to produce the final score based on the
flatten features from convolution layers
判别器 14
• Periodicity loss :
• Pre-training mechanism
训练方法 16
实验结果
基线方法 评价指标
数据集介绍 18
• Dataset
– Mobile Operator
• Duration of 1 week
• Collect 100,000 mobile users in a major city in China
• The partition of train, validation, test is 1:1:1
– GeoLife:
• Contain 17,621 trajectories
• Collect 182 users’ trajectory in a period of over 5 years
• The partition of train, validation, test is 7:2:1
• Set the basic time slot as half an hour of the day in modeling
基线方法 20
• Baselines
– Markov:
• regards all the visited locations of users as states and builds a transition matrix to capture the first-order transition
probabilities.
– DeepMove
• combines neural attention with the recurrent network to capture the periodical patterns in mobility.
– TimeGeo
• defines the weekly home-based tour number, dwell rate, burst rate to model the temporal choices and utilizes a r-EPR
mechanism to model the spatial choices of human mobility.
– IO-HMM
• This method first annotates user activities from trajectory with IO-HMM and then generate sequences of mobility for each
user with the manual assigned home and work.
– GAN
• Based on the general design of GAN, LSTM is selected as the generator and CNN as the discriminator.
– TrajGAN
• This method flattens and embeds a trajectory in the 2D matrix form and then uses a standard GAN to generate the matrix-
form trajectory.
– SeqGAN
• It proposes to combine reinforcement learning with GAN to solve the discrete sequence generation (location sequence)
problem.
评价指标 21
• Indicator Matrix
– Distance
• the cumulative travel distance of per user in the fixed time interval
– Radius
• represents the spatial range of user daily movement
– Duration
• calculated as the stay duration of per location visiting
– DailyLoc
• calculated as the number of visited locations per day for each user
– G-rank
• the number of visits per location, which calculated as the visiting frequency of top-
100 locations
– I-rank
• an individual version of G-rank
评价指标 22
• Evaluate Method
– JSD (Jensen–Shannon divergence)
– Use JSD to measure the similarity between the mobility
pattern distributions of generated trajectory and real
trajectory data
– 𝐽𝑆𝐷(𝑝;𝑞) = ℎ ( (𝑝 + 𝑞)/2) − (ℎ(𝑝) + ℎ(𝑞))/2
– ℎ is the Shannon information, 𝑝 and 𝑞 are distributions
实验结果 23
• Result
– In the GeoLife-GPS dataset, mobility regularity is not so
obvious and becomes less important in mobility modeling
实验结果 24
• Result
– The global spatial population distribution of Mobile Operator
dataset, picture a is the real distribution and picture b is the
generated distribution.
实验结果 25
• Application
– Conduct a simulation experiment on the spreading of COVID-
19 with SEIR model to testify the utility of synthetic mobility
trajectory from Mobile Operator dataset.
– SEIR refers to four status, which are Susceptible, Exposed,
Infectious, Recovered
– Run at least 10 simulations for each experiment
– Calculate MAPE (Mean Absolute Percentage Error) of different
synthetic data on the estimation of different kinds of population
and average results over 7 days
实验结果 26
• Result
– As Figure 5 shows, synthetic data from our framework
produce more similar spreading curves to the real data by
reducing MAPE from 5% ∼ 10% to 2%.
总结
4
总结 28
• Contribution
– propose a novel generative adversarial based framework
– combine the advantage of model-based and model-free methods
– perform better on two real-life mobility datasets comparing to
seven state-of-the-art frameworks
– achieve better performance in the simulation of spreading of
COVID-19