Trip distribution modeling with Twitter data

doi:10.1016/j.compenvurbsys.2019.101354

Computers, Environment and Urban Systems

Volume 77, September 2019, 101354

https://doi.org/10.1016/j.compenvurbsys.2019.101354 Get rights and content

Highlights

•
Traditional and social media data are integrated to predict commuting trip distribution.
•
The predictive performance of the traditional gravity model and ML techniques is evaluated.
•
The potential of Twitter data in trip distribution modeling at fine spatial scale is explored.
•
The most significant predictor variables are determined.

Abstract

Integrating both traditional and social media data, this study compares the performance of gravity, neural network, and random forest models of commuting trip distribution in New York City. Trip distribution modeling has primarily employed traditional data sources and classical methods such as the gravity. However, with the emergence of social media during the past decade, the potential for integrating traditional and social media data while utilizing new techniques has been identified. Our findings indicate that the random forest model outperforms the traditional gravity and neural network models. Population, distance, number of Twitter users, and employment were identified as the four most influential predictors of trip distibution by the random forest model. While Twitter flows did not enhance the models' performance, the importance of the number of Twitter users at work destinations implies the potential for using social media data in travel demand modeling to improve the predictive power and accuracy.

Introduction

Worldwide increases in traffic congestion and air pollution in urban areas presents a need to better understand mobility patterns of urban populations and their travel demands (Shirzadi Babakan, Alimohammadi, & Taleai, 2015; Yang, 2013). A number of studies have examined either individual or collective mobility patterns at different spatial scales (i.e., Beiró, Panisson, Tizzoni, & Cattuto, 2016; González, Hidalgo, & Barabási, 2008; Hawelka et al., 2014). Mobility information of individuals can be aggregated to study the frequency of travel between different regions, as represented by origin-destination (OD) matrices (Barbosa et al., 2018). An OD matrix provides population flow patterns (trip distribution) studied for diverse purposes such as traffic forecasting, resource allocation, prediction of migration flows, and epidemic spreading (Beiró et al., 2016; Pourebrahim, Sultana, Thill, & Mohanty, 2018). Therefore, improving flow estimations has become critical across various domains of application (Barbosa et al., 2018).

Various trip distribution models have been developed over past decades to estimate population flows with greater accuracy (de Dios Ortuzar & Willumsen, 2011; Roy & Thill, 2003; Simini, González, Maritan, & Barabási, 2012; Wilson, 1970; Wilson, 1998; Zipf, 1946). Most of these models are heavily dependent on conventional data such as population censuses or travel diary surveys. The emergence of social media and location-based services in recent years has introduced new opportunities to the field of transportation (Yang, Jin, Cheng, Zhang, & Ran, 2015). Geospatial big data such as taxi trajectories, mobile phone records, and social media messages have attracted scholars to observe, understand, and visualize (i.e., Karduni et al., 2017) human activities in cities at fine spatio-temporal scales (Liu et al., 2015). These data significantly improve the visualization of human mobility patterns, yet there is a need to better understand and contextualize them in different steps of the travel demand modeling framework (Anda, Erath, & Fourie, 2017).

Traditionally, the gravity model and its derivatives have been used as the most reliable approach to predict trip distribution at fine spatial scales, such as commuting flows within cities (Lenormand, Bassolas, & Ramasco, 2016). The potential for developing hybrid approaches that integrate the vast volume of social media data with the gravity model has recently been noted (Beiró et al., 2016). While traditional models have used statistical methods rooted in sound mathematical foundations, they have been unable to account for nonlinearities and other irregularities in data (Golshani, Shabanpour, Mahmoudifard, Derrible, & Mohammadian, 2018). To alleviate these and other issues, Machine Learning (ML) techniques have been applied in different urban and transportation domains (i.e., Ghasri, Hossein Rashidi, & Waller, 2017; Karimi, Sultana, Shirzadi Babakan, & Suthaharan, 2019). A significant body of literature exists at this time where various ML techniques have been evaluated on their ability to model travel demand, such as artificial neural networks (ANNs) (i.e., Ding, Wang, Wang, & Baumann, 2013; Mozolin, Thill, & Lynn Usery, 2000; Pourebrahim et al., 2018; Tillema, van Zuilekom, & van Maarseveen, 2006) and tree-based ensemble methods (i.e., Ghasri et al., 2017; Rasouli & Timmermans, 2014). While random forests (RFs) (Breiman, 2001) have been identified among the most advanced and most efficient ensemble methods for data classification and regression (Ghasri et al., 2017), they have so far been used in only a few studies of travel demand. Ghasri et al. (2017) and Rasouli and Timmermans (2014) have reported promising results with RF modeling of trip generation and modal split, yet their suitability and usefulness in trip distribution analysis remains to be thoroughly assessed.

Given the current state of research, our objective is to compare the performance of gravity, neural network, and random forest models of commuting trip distribution while combining both traditional and social media data. We also evaluate how information on personal mobility derived from social media affects commuting trip distribution by identifying the importance of different variables. To the best of our knowledge, this paper is one of the first to use machine learning approaches in trip distribution forecasting with social media data. The main contributions of this paper are threefold: (1) revealing the potential of social media data in trip distribution modeling at census tract level; (2) using machine learning techniques to predict trip distribution at census tract level; and (3) comparing the performance of gravity, neural network and random forest models to identify the best model for predicting trip distribution at census tract level. The paper is organized as follows. The review of related work is provided in section 2, followed by a presentation of the study area, data sources and methodology in section 3. Results are presented in section 4, with a discussion and concluding remarks in sections 5 and 6.

Section snippets

Travel demand modeling

The relationship between personal mobility flows and a range of personal and environmental factors has been studied to determine future travel demands within cities (Barbosa et al., 2018). Travel demand modeling has long been dominated by the four-step model with its steps being trip generation, trip distribution, modal split, and traffic assignment (McNally, 2007). The objective of this model is to estimate the traffic in the transportation networks. The model first identifies the amount of

Methodology

We have selected New York City (NYC) as our study area (Fig. 1) due to the large volume of readily available Twitter data. We focused on commuting trips because they are temporally stable and account for the largest share of total flows in a population (Yang, Jin, et al., 2015). The census tracts are used as the geographic units for modeling commuting flows in NYC (Fig. 1).

Results

Oure initial analysis was performed based on 7445 ODs that were represented by both LODES flows and Twitter flows. The gravity, ANN, and RF models were developed first with a specification that excludes Twitter data, but includes variables: 1) network distance between ODs; 2) population, household median income, household median size, and household median number of vehicles in origin census tract; and 3) employment, sprawl, and POIs in destination census tract. Then we added the Twitter flow to

Discussion

We compared the performance of the gravity, ANN, and RF models in commuting trip distribution at the census tract level in NYC. RF is identified as the best model, with the highest R² and the lowest MSE. Travel demand analysis studies have commonly used statistical methods to model different travel components (i.e., travel mode, departure time, and trip destination) (Golshani et al., 2018). With the limitation of these methods in capturing the nonlinearities in the data, machine learning

Conclusions

A primary aim of transportation policy makers is to achieve sustainable mobility in urban areas (Kepaptsoglou et al., 2012; May, 2013). However, collecting travel demand data at high spatio-temporal resolution is the major gap that exists between the current state of the practice and an efficient urban sustainable solution (Yang, Herrera, et al., 2015). Social media may be a useful source of data to achieve this goal, yet their potential remains insufficiently investigated. The primary focus in

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References (81)

M.W. Ahmad et al.
Trees vs neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption
Energy and Buildings
(2017)
H. Barbosa et al.
Human mobility: Models and applications
Physics Reports
(2018)
W.R. Black
Spatial interaction modeling using artificial neural networks
Journal of Transport Geography
(1995)
M. Ghasri et al.
Developing a disaggregate travel demand system of models using data mining techniques
Transportation Research Part A: Policy and Practice
(2017)
N. Golshani et al.
Modeling travel mode and timing decisions: Comparison of artificial neural networks and copula-based joint model
Travel Behaviour and Society
(2018)
B. Jiang et al.
Volunteered geographic information: Towards the establishment of a new paradigm
Computers, Environment and Urban Systems
(2015)
F. Karimi et al.
An enhanced support vector machine model for urban expansion prediction
Computers, Environment and Urban Systems
(2019)
M. Lenormand et al.
Systematic comparison of trip distribution laws and models
Journal of Transport Geography
(2016)
M. Mozolin et al.
Trip distribution forecasting with multilayer perceptron neural networks: A critical evaluation
Transportation Research Part B: Methodological
(2000)
C.S. Pitombo et al.
Comparing decision tree algorithms to estimate intercity trip distribution
Transportation Research Part C: Emerging Technologies
(2017)

C.R. Sekhar et al.

Mode choice analysis using random forrest decision trees

Transportation Research Procedia

(2016)

F. Yang et al.

Origin-destination estimation for non-commuting trips using location-based social networking data

International Journal of Sustainable Transportation

(2015)

Y. Zhang et al.

A gradient boosting method to improve travel time prediction

Transportation Research Part C: Emerging Technologies

(2015)

J. Amita et al.

Prediction of bus travel time using artificial neural network

International Journal for Traffic and Transport Engineering

(2015)

C. Anda et al.

Transport modelling in the age of big data

International Journal of Urban Sciences

(2017)

D.T. Apronti et al.

Four-step travel demand model implementation for estimating traffic volumes on rural low-volume roads in Wyoming

Transportation Planning and Technology

(2018)

ArcGIS

North America detailed streets

M.H. Beale et al.

Neural network toolbox user's guide. MathWorks

(2015)

M.G. Beiró et al.

Predicting human mobility through the assimilation of social media traces into mobility models

EPJ Data Science

(2016)

A.D. Berger

A travel demand model for rural areas, (August)

(2012)

L. Breiman

Random forests

Machine Learning

(2001)

H.M. Celik

Modeling freight distribution using artificial neural networks

Journal of Transport Geography

(2004)

C. Ding et al.

A neural network model for driver's lane-changing trajectory prediction in urban traffic flow

Mathematical Problems in Engineering

(2013)

J. de Dios Ortuzar et al.

Modelling transport

(2011)

M. Eshghi et al.

An approach for safer navigation under severe hurricane damage

Journal of Reliable Intelligent Environments

(2018)

R. Ewing et al.

Measuring urban sprawl and validating sprawl measures

(2014)

M.C. González et al.

Understanding individual human mobility patterns

Nature

(2008)

B. Hamner

Predicting travel times with context-dependent random forests by modeling local and aggregate traffic flow

S. Hasan et al.

Understanding urban human activity and mobility patterns using large-scale location-based data from online social media

B. Hawelka et al.

Geo-located Twitter as proxy for global mobility patterns

Cartography and Geographic Information Science

(2014)

Internet Live Stats

Y. Jiang et al.

Social network, activity space, sentiment and evacuation : What can social media tell us?

Y. Jiang et al.

Understanding demographic and socioeconomic biases of geotagged twitter users at the county level

Cartography and Geographic Information Science

(2018)

L. Josephs

New York City needs foreign visitors because they spend four times more money than Americans. Quartz

A. Karduni et al.

Urban space explorer: A visual analytics system for urban planning

IEEE Computer Graphics and Applications

(2017)

K. Kepaptsoglou et al.

Quality Management in Mobility Management: A scheme for supporting sustainable transportation in cities

International Journal of Sustainable Transportation

(2012)

J. Kim et al.

Why do people move? Enhancing human mobility prediction using local functions based on public records and SNS data

PLoS One

(2018)

A. Kurkcu et al.

Evaluating the usability of geo-located twitter as a tool for human activity and mobility patterns: A case study for New York City

D. Lagomarsino et al.

A tool for classification and regression using random forest methodology: Applications to landslide susceptibility mapping and soil thickness modeling

Environmental Modeling and Assessment

(2017)

G. Leshem et al.

Traffic flow prediction using adaboost algorithm with random forests as a weak learner

International Journal of Mathematical and Computational Sciences

(2007)

Cited by (45)

A fusion model of temporal graph attention network and machine learning for inferring commuting flow from human activity intensity dynamics
2024, International Journal of Applied Earth Observation and Geoinformation
Accurately estimating commuting flow is essential for optimizing urban planning and traffic design. The latest graph neural network (GNN) model with the encoder-decoder-predictor components has several limitations. First, it ignores the temporal dependency of node features for node embedding. Second, different estimation methods used in the decoder and predictor make it difficult to distinguish the contribution of node embedding or estimation method to flow estimation. Third, finer-grained socio-economic features of nodes are difficult to obtain due to low data availability. To address these problems, this study proposes a fusion model of temporal graph attention network and machine learning (TGAT-ML) to infer commuting flow from dynamic human activity intensity distribution. The model first constructs a commuting network with temporal human activity intensity as node features. A temporal graph attention network is then developed to capture the spatiotemporal dependency. The learned node embedding is generated by using a machine learning method in the decoder. Finally, based on learned node embedding and machine learning method used in the decoder, the commuting flow intensity is estimated. Results from an empirical study using the Baidu heat map data of Guangzhou city indicate that the proposed fusion model TGAT-ML outperforms all other baseline models. This study proves that the model performance can be significantly enhanced by determining the edge existence through commuting time-based approach, integrating temporal convolution with graph convolution, and unifying flow estimation method in both decoder and predictor. This work enables commuting flow estimation from dynamic human activity intensity and broadens existing flow generation research in terms of data and methodology.
Spatiotemporal flow force model of source/sink human mobilities within city
2024, Cities
Human mobility is of importance in supporting smart cities, urban planning and constructions of resilient environments. Previous studies on dominant mobility models (i.e., gravity and radiation models) rarely describe the refined spatiotemporal process of human mobility flow forces especially within city. To address this gap, this paper proposes a spatiotemporal flow force model (FFM) of sink/source human mobilities within city, which is derived from Navier-Stokes equation in the field of fluid mechanics. The FFM model outperforms the gravity and radiation models in modeling the refined spatiotemporal flow force process of source/sink human mobilities, in the aspect of intensity and direction of mobility flow force. Comparison results show that the gravity and radiation models in the source mobility scenario can give a rough force estimation about the total outflow from source mobility areas while without the capability of explaining the specific directions of mobility flow from source mobility areas. Pearson correlation coefficient between the intensity results of the two models and those of FFM range from 0.65 to 0.90 and direction differences between the results of the two models and FFM respectively show no clear regularity. In the sink mobility scenario, the direction of mobility flows can be estimated well by the gravity and radiation models while the intensity of mobility flow between small-scale areas within cities is susceptible to inaccuracies. Pearson correlation coefficient ranges from 0.13 to 0.59 and direction differences follow a stepped distribution from high to low within the range of 0° to 180°. In addition, the potential field of human mobility flow force provides a powerful tool for visually analyzing mobility flows within cities. This proposed model enriches human mobility models and is generalizable in supporting smart cities, urban planning, and constructions of resilient environments in terms of the refined spatiotemporal process of mobility flow force.
On the relationship between the activity at point of interests and road traffic
2023, Communications in Transportation Research
The estimation and analysis of road traffic represent the preliminary steps towards satisfying the current needs for smooth, safe, and green transportation. Therefore, effective traffic monitoring is an essential topic alongside the planning of sustainable transportation systems and the development of new traffic management concepts. In contrast to classical traffic detection solutions, this study investigates the correlation between travelers' social activities and road traffic. The s's primary goal is to investigate the presence of the relationship between social activity and road traffic, which might allow an infrastructure-independent traffic monitoring technique as well. People's general activities at Point of Interest (POI) locations (measured as occupancy parameter) are correlated with traffic data so that, finally, proper proxys can be defined for link-level average traffic speed estimation. The method is tested and evaluated using real-world traffic and POI occupancy data from Budapest (District XI.). The results of the correlation investigation justify an indirect relationship between activity at POIs and road traffic, which holds promise for future practical applicability.
A novel ensemble model with conditional intervening opportunities for ride-hailing travel mobility estimation
2023, Physica A: Statistical Mechanics and its Applications
Accurate estimation of ride-hailing travel mobility is significant for demand management, and transportation planning. Although existing intervening opportunities models based on individual destination selection behavior can estimate travel mobility patterns (e.g., commuter flow, and migration flow), they usually ignore the substitutability of candidate destinations. In the context of ride-hailing travel, people tend to have strong destination preferences, and candidate destinations should be related to individual travel needs. Meanwhile, artificial intelligence offers powerful tools to extract complex nonlinear dependencies of mobility data, which are difficult to capture by traditional intervention opportunities models. This paper proposes a novel ensemble model with conditional intervening opportunities to improve the accuracy of ride-hailing travel mobility estimation by considering the substitutability of candidate destinations, that is, only the location related to people’s trip purpose will likely affect people’s travel behavior. The proposed ensemble model employs a stacking strategy to integrate six advanced machine learning and deep learning algorithms to extract complex nonlinear dependencies from ride-hailing travel mobility data, and achieve accurate mobility estimation. Furthermore, datasets from two major cities in China with more than 25 million ride-hailing trips are used for model training and experimental comparison. The results indicate that the proposed model outperforms other baseline models in ride-hailing travel mobility estimation tasks. It accurately predicts trip flows and the trip distance distribution, and can capture mobility patterns with strong interpretability. The proposed model can be applied to analyze the travel behavior of ride-hailing passengers, as well as the mobility patterns between different regions.
Freight trip distribution using spatiotemporal aggregate data: A modified collective flow diffusion model-based approach
2023, Transportation Research Interdisciplinary Perspectives
The estimation of freight trip distribution is an important issue in freight transport studies, and collecting data on the spatiotemporal aggregated number of freight vehicles without trip information is relatively easy and inexpensive. In this paper, we propose a new model for freight distribution estimation, referred to as “TripCFDM.” Unlike the traditional collective flow diffusion model (CFDM), which solely relies on aggregate data, TripCFDM relies on aggregate data along with a small amount of trip data. CFDM estimates the probability of latent zonal freight trips based on aggregate data corresponding to the number of vehicles in each zone and time step. The use of the aggregate and small amounts of trip data ensures a realistic data acquisition environment. Our model is applied to freight trips in the Tokyo Metropolitan Area using the National Survey of Roads and Streets Traffic Conditions (Road Traffic Census). The results reveal that TripCFDM yields a significant improvement in the estimated probability compared with CFDM, particularly for intrazonal freight trips. However, the estimation results of interzonal trips obtained by CFDM and TripCFDM tend to be similar, suggesting that CFDM alone may be adequate to estimate the probability of interzonal trips with a sufficient degree of accuracy; nevertheless, TripCFDM is a better method for freight trip distribution estimation, including for intrazonal trips. The results of CFDM are not significantly affected by the time steps in the input data, whereas TripCFDM can yield higher accuracies with longer time steps. Regarding the number of zones and zone size, sufficient zone aggregation may be effective in improving the estimation accuracy because a detailed zonal division leads to a small probability of origin–destination trips with extensive computations. Based on this modified CFDM approach, origin and destination matrices for freight transport can be generated without large-scale surveys.
Human mobility forecasting with region-based flows and geotagged Twitter data
2022, Expert Systems with Applications
One of the main lines of research in the discipline of mobility mining is the development of predictors able to anticipate human travel behaviour in great detail. However, access to the high-resolution spatio-temporal data on which most existing solutions are based is rather limited due to multiple factors, e.g. costly access to third-party data. These restrictions give rise to a problem of developing predictors of human mobility in most setting, since the amount of data available to train these prediction models is insufficient. This paper explores the feasibility of using a public data source such as Twitter to predict the number of trips at the nationwide level. The proposed approach combines a large set of geotagged Twitter posts with an open data source published by the Spanish government on traveller mobility based on mobile phone location. Both datasets are used as input to Machine Learning models to validate the use of Twitter data for improving the prediction of these models. The results show that Twitter data have considerable value as a predictor of large-scale human mobility, especially for Long Short-Term Memory (LSTM) models. As a result, the relevance of this work resides in demonstrating that the use of Twitter could be considered as an alternative to substantially enhance the prediction of mobility within a country when it is combined with other open data sources.

View all citing articles on Scopus

View full text

Trip distribution modeling with Twitter data

Highlights

Abstract

Introduction

Section snippets

Travel demand modeling

Methodology

Results

Discussion

Conclusions

Funding

Energy and Buildings

Physics Reports

Journal of Transport Geography

Transportation Research Part A: Policy and Practice

Travel Behaviour and Society

Computers, Environment and Urban Systems

Computers, Environment and Urban Systems

Journal of Transport Geography

Transportation Research Part B: Methodological

Transportation Research Part C: Emerging Technologies

Transportation Research Procedia

International Journal of Sustainable Transportation

Transportation Research Part C: Emerging Technologies

Prediction of bus travel time using artificial neural network

International Journal for Traffic and Transport Engineering

Transport modelling in the age of big data

International Journal of Urban Sciences

Four-step travel demand model implementation for estimating traffic volumes on rural low-volume roads in Wyoming

Transportation Planning and Technology

North America detailed streets

Neural network toolbox user's guide. MathWorks

Predicting human mobility through the assimilation of social media traces into mobility models

EPJ Data Science

A travel demand model for rural areas, (August)

Random forests

Machine Learning

Modeling freight distribution using artificial neural networks

Journal of Transport Geography

A neural network model for driver's lane-changing trajectory prediction in urban traffic flow

Mathematical Problems in Engineering

Modelling transport

An approach for safer navigation under severe hurricane damage

Journal of Reliable Intelligent Environments

Measuring urban sprawl and validating sprawl measures

Understanding individual human mobility patterns

Nature

Predicting travel times with context-dependent random forests by modeling local and aggregate traffic flow

Understanding urban human activity and mobility patterns using large-scale location-based data from online social media

Geo-located Twitter as proxy for global mobility patterns

Cartography and Geographic Information Science

Social network, activity space, sentiment and evacuation : What can social media tell us?

Understanding demographic and socioeconomic biases of geotagged twitter users at the county level

Cartography and Geographic Information Science

New York City needs foreign visitors because they spend four times more money than Americans. Quartz

Urban space explorer: A visual analytics system for urban planning

IEEE Computer Graphics and Applications

Quality Management in Mobility Management: A scheme for supporting sustainable transportation in cities

International Journal of Sustainable Transportation

Why do people move? Enhancing human mobility prediction using local functions based on public records and SNS data

PLoS One

Evaluating the usability of geo-located twitter as a tool for human activity and mobility patterns: A case study for New York City

A tool for classification and regression using random forest methodology: Applications to landslide susceptibility mapping and soil thickness modeling

Environmental Modeling and Assessment

Traffic flow prediction using adaboost algorithm with random forests as a weak learner

International Journal of Mathematical and Computational Sciences