Trip distribution modeling with Twitter data
Introduction
Worldwide increases in traffic congestion and air pollution in urban areas presents a need to better understand mobility patterns of urban populations and their travel demands (Shirzadi Babakan, Alimohammadi, & Taleai, 2015; Yang, 2013). A number of studies have examined either individual or collective mobility patterns at different spatial scales (i.e., Beiró, Panisson, Tizzoni, & Cattuto, 2016; González, Hidalgo, & Barabási, 2008; Hawelka et al., 2014). Mobility information of individuals can be aggregated to study the frequency of travel between different regions, as represented by origin-destination (OD) matrices (Barbosa et al., 2018). An OD matrix provides population flow patterns (trip distribution) studied for diverse purposes such as traffic forecasting, resource allocation, prediction of migration flows, and epidemic spreading (Beiró et al., 2016; Pourebrahim, Sultana, Thill, & Mohanty, 2018). Therefore, improving flow estimations has become critical across various domains of application (Barbosa et al., 2018).
Various trip distribution models have been developed over past decades to estimate population flows with greater accuracy (de Dios Ortuzar & Willumsen, 2011; Roy & Thill, 2003; Simini, González, Maritan, & Barabási, 2012; Wilson, 1970; Wilson, 1998; Zipf, 1946). Most of these models are heavily dependent on conventional data such as population censuses or travel diary surveys. The emergence of social media and location-based services in recent years has introduced new opportunities to the field of transportation (Yang, Jin, Cheng, Zhang, & Ran, 2015). Geospatial big data such as taxi trajectories, mobile phone records, and social media messages have attracted scholars to observe, understand, and visualize (i.e., Karduni et al., 2017) human activities in cities at fine spatio-temporal scales (Liu et al., 2015). These data significantly improve the visualization of human mobility patterns, yet there is a need to better understand and contextualize them in different steps of the travel demand modeling framework (Anda, Erath, & Fourie, 2017).
Traditionally, the gravity model and its derivatives have been used as the most reliable approach to predict trip distribution at fine spatial scales, such as commuting flows within cities (Lenormand, Bassolas, & Ramasco, 2016). The potential for developing hybrid approaches that integrate the vast volume of social media data with the gravity model has recently been noted (Beiró et al., 2016). While traditional models have used statistical methods rooted in sound mathematical foundations, they have been unable to account for nonlinearities and other irregularities in data (Golshani, Shabanpour, Mahmoudifard, Derrible, & Mohammadian, 2018). To alleviate these and other issues, Machine Learning (ML) techniques have been applied in different urban and transportation domains (i.e., Ghasri, Hossein Rashidi, & Waller, 2017; Karimi, Sultana, Shirzadi Babakan, & Suthaharan, 2019). A significant body of literature exists at this time where various ML techniques have been evaluated on their ability to model travel demand, such as artificial neural networks (ANNs) (i.e., Ding, Wang, Wang, & Baumann, 2013; Mozolin, Thill, & Lynn Usery, 2000; Pourebrahim et al., 2018; Tillema, van Zuilekom, & van Maarseveen, 2006) and tree-based ensemble methods (i.e., Ghasri et al., 2017; Rasouli & Timmermans, 2014). While random forests (RFs) (Breiman, 2001) have been identified among the most advanced and most efficient ensemble methods for data classification and regression (Ghasri et al., 2017), they have so far been used in only a few studies of travel demand. Ghasri et al. (2017) and Rasouli and Timmermans (2014) have reported promising results with RF modeling of trip generation and modal split, yet their suitability and usefulness in trip distribution analysis remains to be thoroughly assessed.
Given the current state of research, our objective is to compare the performance of gravity, neural network, and random forest models of commuting trip distribution while combining both traditional and social media data. We also evaluate how information on personal mobility derived from social media affects commuting trip distribution by identifying the importance of different variables. To the best of our knowledge, this paper is one of the first to use machine learning approaches in trip distribution forecasting with social media data. The main contributions of this paper are threefold: (1) revealing the potential of social media data in trip distribution modeling at census tract level; (2) using machine learning techniques to predict trip distribution at census tract level; and (3) comparing the performance of gravity, neural network and random forest models to identify the best model for predicting trip distribution at census tract level. The paper is organized as follows. The review of related work is provided in section 2, followed by a presentation of the study area, data sources and methodology in section 3. Results are presented in section 4, with a discussion and concluding remarks in sections 5 and 6.
Section snippets
Travel demand modeling
The relationship between personal mobility flows and a range of personal and environmental factors has been studied to determine future travel demands within cities (Barbosa et al., 2018). Travel demand modeling has long been dominated by the four-step model with its steps being trip generation, trip distribution, modal split, and traffic assignment (McNally, 2007). The objective of this model is to estimate the traffic in the transportation networks. The model first identifies the amount of
Methodology
We have selected New York City (NYC) as our study area (Fig. 1) due to the large volume of readily available Twitter data. We focused on commuting trips because they are temporally stable and account for the largest share of total flows in a population (Yang, Jin, et al., 2015). The census tracts are used as the geographic units for modeling commuting flows in NYC (Fig. 1).
Results
Oure initial analysis was performed based on 7445 ODs that were represented by both LODES flows and Twitter flows. The gravity, ANN, and RF models were developed first with a specification that excludes Twitter data, but includes variables: 1) network distance between ODs; 2) population, household median income, household median size, and household median number of vehicles in origin census tract; and 3) employment, sprawl, and POIs in destination census tract. Then we added the Twitter flow to
Discussion
We compared the performance of the gravity, ANN, and RF models in commuting trip distribution at the census tract level in NYC. RF is identified as the best model, with the highest R2 and the lowest MSE. Travel demand analysis studies have commonly used statistical methods to model different travel components (i.e., travel mode, departure time, and trip destination) (Golshani et al., 2018). With the limitation of these methods in capturing the nonlinearities in the data, machine learning
Conclusions
A primary aim of transportation policy makers is to achieve sustainable mobility in urban areas (Kepaptsoglou et al., 2012; May, 2013). However, collecting travel demand data at high spatio-temporal resolution is the major gap that exists between the current state of the practice and an efficient urban sustainable solution (Yang, Herrera, et al., 2015). Social media may be a useful source of data to achieve this goal, yet their potential remains insufficiently investigated. The primary focus in
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
References (81)
- et al.
Trees vs neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption
Energy and Buildings
(2017) - et al.
Human mobility: Models and applications
Physics Reports
(2018) Spatial interaction modeling using artificial neural networks
Journal of Transport Geography
(1995)- et al.
Developing a disaggregate travel demand system of models using data mining techniques
Transportation Research Part A: Policy and Practice
(2017) - et al.
Modeling travel mode and timing decisions: Comparison of artificial neural networks and copula-based joint model
Travel Behaviour and Society
(2018) - et al.
Volunteered geographic information: Towards the establishment of a new paradigm
Computers, Environment and Urban Systems
(2015) - et al.
An enhanced support vector machine model for urban expansion prediction
Computers, Environment and Urban Systems
(2019) - et al.
Systematic comparison of trip distribution laws and models
Journal of Transport Geography
(2016) - et al.
Trip distribution forecasting with multilayer perceptron neural networks: A critical evaluation
Transportation Research Part B: Methodological
(2000) - et al.
Comparing decision tree algorithms to estimate intercity trip distribution
Transportation Research Part C: Emerging Technologies
(2017)
Mode choice analysis using random forrest decision trees
Transportation Research Procedia
Origin-destination estimation for non-commuting trips using location-based social networking data
International Journal of Sustainable Transportation
A gradient boosting method to improve travel time prediction
Transportation Research Part C: Emerging Technologies
Prediction of bus travel time using artificial neural network
International Journal for Traffic and Transport Engineering
Transport modelling in the age of big data
International Journal of Urban Sciences
Four-step travel demand model implementation for estimating traffic volumes on rural low-volume roads in Wyoming
Transportation Planning and Technology
North America detailed streets
Neural network toolbox user's guide. MathWorks
Predicting human mobility through the assimilation of social media traces into mobility models
EPJ Data Science
A travel demand model for rural areas, (August)
Random forests
Machine Learning
Modeling freight distribution using artificial neural networks
Journal of Transport Geography
A neural network model for driver's lane-changing trajectory prediction in urban traffic flow
Mathematical Problems in Engineering
Modelling transport
An approach for safer navigation under severe hurricane damage
Journal of Reliable Intelligent Environments
Measuring urban sprawl and validating sprawl measures
Understanding individual human mobility patterns
Nature
Predicting travel times with context-dependent random forests by modeling local and aggregate traffic flow
Understanding urban human activity and mobility patterns using large-scale location-based data from online social media
Geo-located Twitter as proxy for global mobility patterns
Cartography and Geographic Information Science
Social network, activity space, sentiment and evacuation : What can social media tell us?
Understanding demographic and socioeconomic biases of geotagged twitter users at the county level
Cartography and Geographic Information Science
New York City needs foreign visitors because they spend four times more money than Americans. Quartz
Urban space explorer: A visual analytics system for urban planning
IEEE Computer Graphics and Applications
Quality Management in Mobility Management: A scheme for supporting sustainable transportation in cities
International Journal of Sustainable Transportation
Why do people move? Enhancing human mobility prediction using local functions based on public records and SNS data
PLoS One
Evaluating the usability of geo-located twitter as a tool for human activity and mobility patterns: A case study for New York City
A tool for classification and regression using random forest methodology: Applications to landslide susceptibility mapping and soil thickness modeling
Environmental Modeling and Assessment
Traffic flow prediction using adaboost algorithm with random forests as a weak learner
International Journal of Mathematical and Computational Sciences
Cited by (45)
A fusion model of temporal graph attention network and machine learning for inferring commuting flow from human activity intensity dynamics
2024, International Journal of Applied Earth Observation and GeoinformationOn the relationship between the activity at point of interests and road traffic
2023, Communications in Transportation ResearchA novel ensemble model with conditional intervening opportunities for ride-hailing travel mobility estimation
2023, Physica A: Statistical Mechanics and its ApplicationsFreight trip distribution using spatiotemporal aggregate data: A modified collective flow diffusion model-based approach
2023, Transportation Research Interdisciplinary PerspectivesHuman mobility forecasting with region-based flows and geotagged Twitter data
2022, Expert Systems with Applications