0% found this document useful (0 votes)
53 views27 pages

Movesim

The document proposes a framework that uses generative adversarial networks to simulate human mobility patterns by combining model-based and model-free methods, leveraging prior knowledge about human mobility to improve model convergence and performance. Experiments on two public datasets show the proposed method outperforms existing state-of-the-art models in evaluating metrics like travel distance and location visits. The framework is also applied to simulate the spread of COVID-19 using mobility data.

Uploaded by

xiang lee
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
53 views27 pages

Movesim

The document proposes a framework that uses generative adversarial networks to simulate human mobility patterns by combining model-based and model-free methods, leveraging prior knowledge about human mobility to improve model convergence and performance. Experiments on two public datasets show the proposed method outperforms existing state-of-the-art models in evaluating metrics like travel distance and location visits. The framework is also applied to simulate the spread of COVID-19 using mobility data.

Uploaded by

xiang lee
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 27

KDD 2020

Learning to Simulate Human Mobility


Jie Feng, Yong Li∗
Department of Electronic Engineering, Tsinghua University
目录
CONTENTS

PART 01 PART 02 PART 03 PART 04


简介与背景 模型方法 实验 论文总结
Introduction&Background Methodology Experiment Summary
简介与背景
1 简介 问题定义 先验知识
简介 04

• 人员流动数据的真实模拟在流行病传播建模和相关卫生政策制定中具有重要意义

• 现有方法主要分为两类:基于模型的方法 [1][2] 、无模型方法

• 本文贡献:
– 提出了一种框架,结合了基于模型的方法和无模型方法的优点
– 利用人类移动的先验知识优化了模型的收敛速率和表现效果
– 在两个公共数据集上做了实验,并超过了现有的 SOTA 模型

[1] Shan Jiang, Yingxiang Yang, Siddharth Gupta, Daniele Veneziano, Shounak Athavale, and Marta C González. 2016. The TimeGeo modeling
framework for urban mobility without travel surveys. PNAS 113, 37 (2016).
[2] Mogeng Yin, Madeleine Sheehan, Sidney Feygin, Jean-François Paiement, and Alexei Pozdnoukhov. 2017. A generative model of urban activities
from cellular data. IEEE Transactions on Intelligent Transportation Systems 19, 6 (2017).
问题定义 05

• 移动轨迹数据 : S = [x1, x2, …, xn ]


– xi = (li, ti)
– 其中, li 一般表示成 GPS 坐标点 (lat, lon)
– ti 是第 i 个坐标点的时间戳

• 给定真实世界中的一个数据集,生成一段尽可能真的移动轨迹
– 表示成数学公式 : 𝑛
^ ∏ ❑𝑝 𝜽 ( 𝑥^ 𝑖 ∣ 𝑥^ 1 ,…, ^𝑥𝑖− 1 )
𝑝 𝜽 ( S)=
𝑖=1
先验知识 06

• 生成对抗网络( GAN )
– 具有两部分:生成器和判别器
– 生成器被训练以达到欺骗判别器的目的
– 判别器被训练以达到识别来自生成器的假数据的目的
– 直到两者处于纳什均衡状态
– 表示成数学形式如下:

𝑚𝑖𝑛 𝑚𝑎𝑥 𝔼 𝐱 ∼ 𝑝 𝑑
¿
𝜽 𝝓
模型方法
2 整体架构
判别器
生成器
训练方法
整体架构 08
生成器 09
生成器 10

SeqNet
1: Sample a location l0 and a fixed start time t0=0
2: Encode space and time as one-hot vectors
3: Concatenate the two vectors into a dense representation vector as x e

4: Project with none-linear operations into three vectors: query vector Qi, key vector Ki, value
vector Vi
5: Apply scaled dot-product attention on them to obtain the weighted sum of value vectors

6: Use multi-head and stacking operations to model the relation from different subspaces
7: Use a linear layer to process the feature and apply the soft-max function to obtain the
probability distribution of the next location
生成器 11

RegNet
1: Build three N×N location relation matrix: physical distance, function similarity, historical
transition

2: Use the probability vector of last generated location to select corresponding relation vector R
from the above matrix

3: Utilize the element-wise multiplication operation to fuse the current generated location
feature vector with the multi-view relation vectors R

4: Refine the intermediate output by residual connection


判别器 12

• Monte Carlo search


– maintain an assistant generator Ga, which is the last step version of the
current generator G

• CNN-based Discriminator

• Mobility Regularity-Aware Loss


判别器 13

CNN-based discriminator
1: a spatial-temporal embedding module to convert raw trajectory sequence S = [x1,…,xn]
into a 2D matrix Xe ∈ Rn×d
2: several convolution layers to extract features from the feature matrix
3: a linear layer with sigmoid activation function to produce the final score based on the
flatten features from convolution layers
判别器 14

• Distance aware loss :

– Work by limiting the travel distance between nearby mobility


transitions

• Periodicity loss :

– For example, with 1 hour as the basic time window, P is set as


24 which means that is calculated as the number of different
locations on different days at the same hour of the day.
训练方法 15

• The standard training algorithm of GAN is not applicable due to:


a) The discrete output of G blocks the gradient back-propagation from D
b) The complicated transitions and various noise in trajectory sequence
make it difficult to learn useful knowledge

• Reinforcement learning based training

• Pre-training mechanism
训练方法 16

• Pre-train generator with mobility prediction task


– Predict the next location with knowing all the previous trajectory
– For each trajectory with 𝑛 points, we choose the first 𝑛 − 1 points as the input and the
last 𝑛 − 1 points as the target

• Pre-train discriminator with mobility regularity-aware task


– A binary classification task to distinguish whether the input mobility trajectory exhibits
important mobility regularities including the temporal periodicity and spatial continuity
– Disturb the real trajectory in two ways:
• 1) replace one location in the real trajectory with another which is distant to it
• 2) random disturb the order of locations
实验
3 数据集介绍

实验结果
基线方法 评价指标
数据集介绍 18

• Dataset
– Mobile Operator
• Duration of 1 week
• Collect 100,000 mobile users in a major city in China
• The partition of train, validation, test is 1:1:1
– GeoLife:
• Contain 17,621 trajectories
• Collect 182 users’ trajectory in a period of over 5 years
• The partition of train, validation, test is 7:2:1
• Set the basic time slot as half an hour of the day in modeling
基线方法 20

• Baselines
– Markov:
• regards all the visited locations of users as states and builds a transition matrix to capture the first-order transition
probabilities.
– DeepMove
• combines neural attention with the recurrent network to capture the periodical patterns in mobility.
– TimeGeo
• defines the weekly home-based tour number, dwell rate, burst rate to model the temporal choices and utilizes a r-EPR
mechanism to model the spatial choices of human mobility.
– IO-HMM
• This method first annotates user activities from trajectory with IO-HMM and then generate sequences of mobility for each
user with the manual assigned home and work.
– GAN
• Based on the general design of GAN, LSTM is selected as the generator and CNN as the discriminator.
– TrajGAN
• This method flattens and embeds a trajectory in the 2D matrix form and then uses a standard GAN to generate the matrix-
form trajectory.
– SeqGAN
• It proposes to combine reinforcement learning with GAN to solve the discrete sequence generation (location sequence)
problem.
评价指标 21

• Indicator Matrix
– Distance
• the cumulative travel distance of per user in the fixed time interval
– Radius
• represents the spatial range of user daily movement
– Duration
• calculated as the stay duration of per location visiting
– DailyLoc
• calculated as the number of visited locations per day for each user
– G-rank
• the number of visits per location, which calculated as the visiting frequency of top-
100 locations
– I-rank
• an individual version of G-rank
评价指标 22

• Evaluate Method
– JSD (Jensen–Shannon divergence)
– Use JSD to measure the similarity between the mobility
pattern distributions of generated trajectory and real
trajectory data
– 𝐽𝑆𝐷(𝑝;𝑞) = ℎ ( (𝑝 + 𝑞)/2) − (ℎ(𝑝) + ℎ(𝑞))/2
– ℎ is the Shannon information, 𝑝 and 𝑞 are distributions
实验结果 23

• Result
– In the GeoLife-GPS dataset, mobility regularity is not so
obvious and becomes less important in mobility modeling
实验结果 24

• Result
– The global spatial population distribution of Mobile Operator
dataset, picture a is the real distribution and picture b is the
generated distribution.
实验结果 25

• Application
– Conduct a simulation experiment on the spreading of COVID-
19 with SEIR model to testify the utility of synthetic mobility
trajectory from Mobile Operator dataset.
– SEIR refers to four status, which are Susceptible, Exposed,
Infectious, Recovered
– Run at least 10 simulations for each experiment
– Calculate MAPE (Mean Absolute Percentage Error) of different
synthetic data on the estimation of different kinds of population
and average results over 7 days
实验结果 26

• Result
– As Figure 5 shows, synthetic data from our framework
produce more similar spreading curves to the real data by
reducing MAPE from 5% ∼ 10% to 2%.
总结
4
总结 28

• Contribution
– propose a novel generative adversarial based framework
– combine the advantage of model-based and model-free methods
– perform better on two real-life mobility datasets comparing to
seven state-of-the-art frameworks
– achieve better performance in the simulation of spreading of
COVID-19

You might also like