Trip distribution modeling with Twitter data

https://doi.org/10.1016/j.compenvurbsys.2019.101354Get rights and content

Highlights

  • Traditional and social media data are integrated to predict commuting trip distribution.

  • The predictive performance of the traditional gravity model and ML techniques is evaluated.

  • The potential of Twitter data in trip distribution modeling at fine spatial scale is explored.

  • The most significant predictor variables are determined.

Abstract

Integrating both traditional and social media data, this study compares the performance of gravity, neural network, and random forest models of commuting trip distribution in New York City. Trip distribution modeling has primarily employed traditional data sources and classical methods such as the gravity. However, with the emergence of social media during the past decade, the potential for integrating traditional and social media data while utilizing new techniques has been identified. Our findings indicate that the random forest model outperforms the traditional gravity and neural network models. Population, distance, number of Twitter users, and employment were identified as the four most influential predictors of trip distibution by the random forest model. While Twitter flows did not enhance the models' performance, the importance of the number of Twitter users at work destinations implies the potential for using social media data in travel demand modeling to improve the predictive power and accuracy.

Introduction

Worldwide increases in traffic congestion and air pollution in urban areas presents a need to better understand mobility patterns of urban populations and their travel demands (Shirzadi Babakan, Alimohammadi, & Taleai, 2015; Yang, 2013). A number of studies have examined either individual or collective mobility patterns at different spatial scales (i.e., Beiró, Panisson, Tizzoni, & Cattuto, 2016; González, Hidalgo, & Barabási, 2008; Hawelka et al., 2014). Mobility information of individuals can be aggregated to study the frequency of travel between different regions, as represented by origin-destination (OD) matrices (Barbosa et al., 2018). An OD matrix provides population flow patterns (trip distribution) studied for diverse purposes such as traffic forecasting, resource allocation, prediction of migration flows, and epidemic spreading (Beiró et al., 2016; Pourebrahim, Sultana, Thill, & Mohanty, 2018). Therefore, improving flow estimations has become critical across various domains of application (Barbosa et al., 2018).

Various trip distribution models have been developed over past decades to estimate population flows with greater accuracy (de Dios Ortuzar & Willumsen, 2011; Roy & Thill, 2003; Simini, González, Maritan, & Barabási, 2012; Wilson, 1970; Wilson, 1998; Zipf, 1946). Most of these models are heavily dependent on conventional data such as population censuses or travel diary surveys. The emergence of social media and location-based services in recent years has introduced new opportunities to the field of transportation (Yang, Jin, Cheng, Zhang, & Ran, 2015). Geospatial big data such as taxi trajectories, mobile phone records, and social media messages have attracted scholars to observe, understand, and visualize (i.e., Karduni et al., 2017) human activities in cities at fine spatio-temporal scales (Liu et al., 2015). These data significantly improve the visualization of human mobility patterns, yet there is a need to better understand and contextualize them in different steps of the travel demand modeling framework (Anda, Erath, & Fourie, 2017).

Traditionally, the gravity model and its derivatives have been used as the most reliable approach to predict trip distribution at fine spatial scales, such as commuting flows within cities (Lenormand, Bassolas, & Ramasco, 2016). The potential for developing hybrid approaches that integrate the vast volume of social media data with the gravity model has recently been noted (Beiró et al., 2016). While traditional models have used statistical methods rooted in sound mathematical foundations, they have been unable to account for nonlinearities and other irregularities in data (Golshani, Shabanpour, Mahmoudifard, Derrible, & Mohammadian, 2018). To alleviate these and other issues, Machine Learning (ML) techniques have been applied in different urban and transportation domains (i.e., Ghasri, Hossein Rashidi, & Waller, 2017; Karimi, Sultana, Shirzadi Babakan, & Suthaharan, 2019). A significant body of literature exists at this time where various ML techniques have been evaluated on their ability to model travel demand, such as artificial neural networks (ANNs) (i.e., Ding, Wang, Wang, & Baumann, 2013; Mozolin, Thill, & Lynn Usery, 2000; Pourebrahim et al., 2018; Tillema, van Zuilekom, & van Maarseveen, 2006) and tree-based ensemble methods (i.e., Ghasri et al., 2017; Rasouli & Timmermans, 2014). While random forests (RFs) (Breiman, 2001) have been identified among the most advanced and most efficient ensemble methods for data classification and regression (Ghasri et al., 2017), they have so far been used in only a few studies of travel demand. Ghasri et al. (2017) and Rasouli and Timmermans (2014) have reported promising results with RF modeling of trip generation and modal split, yet their suitability and usefulness in trip distribution analysis remains to be thoroughly assessed.

Given the current state of research, our objective is to compare the performance of gravity, neural network, and random forest models of commuting trip distribution while combining both traditional and social media data. We also evaluate how information on personal mobility derived from social media affects commuting trip distribution by identifying the importance of different variables. To the best of our knowledge, this paper is one of the first to use machine learning approaches in trip distribution forecasting with social media data. The main contributions of this paper are threefold: (1) revealing the potential of social media data in trip distribution modeling at census tract level; (2) using machine learning techniques to predict trip distribution at census tract level; and (3) comparing the performance of gravity, neural network and random forest models to identify the best model for predicting trip distribution at census tract level. The paper is organized as follows. The review of related work is provided in section 2, followed by a presentation of the study area, data sources and methodology in section 3. Results are presented in section 4, with a discussion and concluding remarks in sections 5 and 6.

Section snippets

Travel demand modeling

The relationship between personal mobility flows and a range of personal and environmental factors has been studied to determine future travel demands within cities (Barbosa et al., 2018). Travel demand modeling has long been dominated by the four-step model with its steps being trip generation, trip distribution, modal split, and traffic assignment (McNally, 2007). The objective of this model is to estimate the traffic in the transportation networks. The model first identifies the amount of

Methodology

We have selected New York City (NYC) as our study area (Fig. 1) due to the large volume of readily available Twitter data. We focused on commuting trips because they are temporally stable and account for the largest share of total flows in a population (Yang, Jin, et al., 2015). The census tracts are used as the geographic units for modeling commuting flows in NYC (Fig. 1).

Results

Oure initial analysis was performed based on 7445 ODs that were represented by both LODES flows and Twitter flows. The gravity, ANN, and RF models were developed first with a specification that excludes Twitter data, but includes variables: 1) network distance between ODs; 2) population, household median income, household median size, and household median number of vehicles in origin census tract; and 3) employment, sprawl, and POIs in destination census tract. Then we added the Twitter flow to

Discussion

We compared the performance of the gravity, ANN, and RF models in commuting trip distribution at the census tract level in NYC. RF is identified as the best model, with the highest R2 and the lowest MSE. Travel demand analysis studies have commonly used statistical methods to model different travel components (i.e., travel mode, departure time, and trip destination) (Golshani et al., 2018). With the limitation of these methods in capturing the nonlinearities in the data, machine learning

Conclusions

A primary aim of transportation policy makers is to achieve sustainable mobility in urban areas (Kepaptsoglou et al., 2012; May, 2013). However, collecting travel demand data at high spatio-temporal resolution is the major gap that exists between the current state of the practice and an efficient urban sustainable solution (Yang, Herrera, et al., 2015). Social media may be a useful source of data to achieve this goal, yet their potential remains insufficiently investigated. The primary focus in

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References (81)

  • C.R. Sekhar et al.

    Mode choice analysis using random forrest decision trees

    Transportation Research Procedia

    (2016)
  • F. Yang et al.

    Origin-destination estimation for non-commuting trips using location-based social networking data

    International Journal of Sustainable Transportation

    (2015)
  • Y. Zhang et al.

    A gradient boosting method to improve travel time prediction

    Transportation Research Part C: Emerging Technologies

    (2015)
  • J. Amita et al.

    Prediction of bus travel time using artificial neural network

    International Journal for Traffic and Transport Engineering

    (2015)
  • C. Anda et al.

    Transport modelling in the age of big data

    International Journal of Urban Sciences

    (2017)
  • D.T. Apronti et al.

    Four-step travel demand model implementation for estimating traffic volumes on rural low-volume roads in Wyoming

    Transportation Planning and Technology

    (2018)
  • ArcGIS

    North America detailed streets

  • M.H. Beale et al.

    Neural network toolbox user's guide. MathWorks

    (2015)
  • M.G. Beiró et al.

    Predicting human mobility through the assimilation of social media traces into mobility models

    EPJ Data Science

    (2016)
  • A.D. Berger

    A travel demand model for rural areas, (August)

    (2012)
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • H.M. Celik

    Modeling freight distribution using artificial neural networks

    Journal of Transport Geography

    (2004)
  • C. Ding et al.

    A neural network model for driver's lane-changing trajectory prediction in urban traffic flow

    Mathematical Problems in Engineering

    (2013)
  • J. de Dios Ortuzar et al.

    Modelling transport

    (2011)
  • M. Eshghi et al.

    An approach for safer navigation under severe hurricane damage

    Journal of Reliable Intelligent Environments

    (2018)
  • R. Ewing et al.

    Measuring urban sprawl and validating sprawl measures

    (2014)
  • M.C. González et al.

    Understanding individual human mobility patterns

    Nature

    (2008)
  • B. Hamner

    Predicting travel times with context-dependent random forests by modeling local and aggregate traffic flow

  • S. Hasan et al.

    Understanding urban human activity and mobility patterns using large-scale location-based data from online social media

  • B. Hawelka et al.

    Geo-located Twitter as proxy for global mobility patterns

    Cartography and Geographic Information Science

    (2014)
  • Internet Live Stats
  • Y. Jiang et al.

    Social network, activity space, sentiment and evacuation : What can social media tell us?

  • Y. Jiang et al.

    Understanding demographic and socioeconomic biases of geotagged twitter users at the county level

    Cartography and Geographic Information Science

    (2018)
  • L. Josephs

    New York City needs foreign visitors because they spend four times more money than Americans. Quartz

  • A. Karduni et al.

    Urban space explorer: A visual analytics system for urban planning

    IEEE Computer Graphics and Applications

    (2017)
  • K. Kepaptsoglou et al.

    Quality Management in Mobility Management: A scheme for supporting sustainable transportation in cities

    International Journal of Sustainable Transportation

    (2012)
  • J. Kim et al.

    Why do people move? Enhancing human mobility prediction using local functions based on public records and SNS data

    PLoS One

    (2018)
  • A. Kurkcu et al.

    Evaluating the usability of geo-located twitter as a tool for human activity and mobility patterns: A case study for New York City

  • D. Lagomarsino et al.

    A tool for classification and regression using random forest methodology: Applications to landslide susceptibility mapping and soil thickness modeling

    Environmental Modeling and Assessment

    (2017)
  • G. Leshem et al.

    Traffic flow prediction using adaboost algorithm with random forests as a weak learner

    International Journal of Mathematical and Computational Sciences

    (2007)
  • Cited by (45)

    View all citing articles on Scopus
    View full text