Probabilistic routing using multimodal data
Introduction
With the rapid development of GPS-enabled mobile devices, intelligent transportation systems, and online mapping services, human-mobility tracking and prediction are pervasive [35], [36], [37]. As an example, in many large cities like Hong Kong, the number of commuters is usually very huge at peak hours (e.g., millions of commuters travel around 7:00am∼ 8:30am). In a public transportation network, once a subway station is over-crowded, the transfer cost of commuters may increase. In this work, we integrate multi-source human tracking data and location based social media data, such as spatial data (locations), temporal data (time stamps), and textual data (geo-tagged tweets and geo-tagged microblogs) according to the approaches introduced in [33], [34], to make human-mobility prediction and over-crowded station detection. Then, we define a set of probabilistic spatial metrics to describe transfer cost between two different bus/subway lines, and then we try to compute the convenient bus/subway routes (transfer-cost threshold query) efficiently in such public transportation networks. This type of query is useful in many real applications, including convenient travel route recommendation and location based services in general.
We give an example in Fig. 1, where exist 3 subway lines (blue, green, and red lines) and 9 stations p1, p2, ..., and p9. Assume p1 and p4 are a source station and a destination station for a passenger, and is a travel route. A passenger may take the blue line and at p4 he may transfer to the red line. The local travel time of r1 is 20 min, excluding transfer cost. At peak hours, p5 has 35% over-crowded probability, and its transfer cost is min (if a station is over-crowded, it may be closed for 30 min, and the transfer time at normal time is 1 minute). Thus, the global travel time of r1 is 31.5 min. Route is an alternative travel route from p1 to p4, and it has p2 and p6 two transfer stations. The local travel time of route r2 is 30 min, excluding transfer cost. At peak house, p2 has 5% over-crowded probability, and its transfer cost is min. Thus, the global travel time of r2 is 32.5 min. For a travel-time threshold routing query, if the travel-time threshold in., r2 is returned because it has a lower transfer cost (compared to r1) and its travel time does not exceeds the travel-time threshold τ.t. For a transfer-cost threshold routing query, if the transfer-cost threshold min, r1 is returned because it has a lower travel time and its transfer cost does not exceeds the transfer-cost threshold.
Human-mobility tracking data, such as IC-card data of public transforation, are pervasive, and its data form is (line, source, departure time), which includes spatial, temporal, and textual attributes. As an example in Fig. 1, the stored passenger data is (blue line, p1, 9:00am). The drop-off data (station and time) are not recorded because in many large cities, buses and subways are using equal-price ticket hence there is no need to record these data for ticket-fare calculation. Moreover, location based social media data (e.g., geo-tagged microblogs) are also useful to make human-mobility prediction and over-crowded detection. The main challenge here is how to integrate these data to establish an effective human-mobility prediction model and then to detect over-crowded stations effectively.
According to the approaches introduced in [33], [34], we use multi-source data (human-tracking data and location based social media data) to make human-mobility prediction and over-crowded detection effectively. Stations can be classified into several levels based on their attractiveness. A higher-level station means more attractiveness for passengers. If a station is over-crowded, its transfer cost may increase. We define a set of probabilistic spatial metrics to describe the transfer cost between different lines practically. Then, we propose a travel-time threshold and a transfer-cost threshold convenient route planning queries. A series of optimization techniques are developed to enhance the query efficiency.
Notice that we inherit the prediction model and over-crowded detection methods from existing studies [33], [34]. We improve the transfer cost model, and define two novel routing queries namely travel-time threshold and transfer-cost threshold queries. Existing studies focus on the computation of congestion probability, and their techniques cannot be used in the computation of transfer costs. To sum up, we make the following contributions in this work.
- •
We propose and investigate a novel problem of planning convenient routes in public transportation networks. This study is useful in many real applications, such as travel planning and recommendation and location based services in general.
- •
We define two novel travel-time threshold and transfer-cost threshold routing queries and the corresponding probabilistic spatial metrics practically.
- •
We develop two efficient algorithms to compute the travel-time threshold and transfer-cost threshold routing queries in public transportation networks.
- •
We conduct an experimental study to verify the performance of the developed algorithms.
Section snippets
Multimodal data
We model a public transportation network by a connected and undirected graph G(V, E), where V is a set of vertices (stations) and E is a set of edges (paths between two adjacent stations). We assign a weight to each edge to represent its travel time. A public transportation network includes several bus lines and subway lines.
IC card data are stored in the form of (line, source, departure time). Location based social media data (e.g., geo-tagged tweets, geo-tagged microblogs) are also useful in
Probabilistic routing
In this section, we propose a travel-time threshold and a transfer-cost threshold routing queries, and develop two efficient algorithms to compute them. A series of optimization techniques are developed to improve the query efficiency.
Given a travel path the local travel time and transfer cost of P are defined by By combining the travel time and transfer cost, we define the global travel time of path P as
Experiments
In this section, we conduct an experimental study to verify the performance of the developed TTR and TCR Algorithms. We use two synthetic data sets of public transportation networks, namely “Network I” and “Network II”, which contain 600 subway lines and bus lines and 18,000 stations, and 300 subway lines and bus lines and 9,000 stations. In the data set of Network I, we have 80,000 IC-card records, while in the data set of Network II, we have 20,000 IC-card records. We also generate 100,000
Related work
Multimodal data processing have received extensive attention in recent years [3], [30], [31], [38], [41], [45], [54]. It is of interest to integrate multimodal data and conventional routing problem [11], [26], [28], [39], [42], which motivates our study.
Path planning and location recommendation in spatial networks [22], [27], [32], [52], [55] are also well studied. Generally, spatial networks are classified into two categories: static spatial networks and dynamic spatial networks. In static
Conclusion
In this work, we integrated multi-source human-mobility tracking data and location based social media data, including spatial, temporal, and textual data, to make human-mobility prediction and over-crowded station detection. We define a set of probabilistic spatial metrics and propose a travel-time threshold and a transfer-cost threshold convenient route planning queries. This study is useful in many real applications, including convenient travel route recommendation and location based services
Acknowledgments
This work is supported by the National Natural Science Foundation of China (NSFC.61402532, 61373147, 61672442), the Science and Technology Planning Project of Fujian Province (No.2016Y0079), and the Science and Technology Project of SGCC (Grant No. SGITG-KJ-JSKF[2015]0012).
Shunzhi Zhu is a professor and dean at School of Computer and Information Engineering, Xiamen University of Technology, China. His research interests include data mining, information recommendation, and privacy protection.
References (55)
- et al.
Building dynamic population graph for accurate correspondence detection
Med. Image Anal.
(2015) - et al.
Measuring the impact of MVC attack in large complex networks
Inf. Sci.
(2014) - et al.
Efficient algorithms for mining high-utility itemsets in uncertain databases
Knowl. Based Syst.
(2016) - et al.
A novel hybrid shuffled frog leaping algorithm for vehicle routing problem with time windows
Inf. Sci.
(2015) - et al.
Improving network topology-based protein interactome mapping via collaborative filtering
Knowl. Based Syst.
(2015) - et al.
Prediction-based unobstructed route planning
Neurocomputing
(2016) - et al.
Data-driven contextual modeling for 3d scene understanding
Comput. Graph.
(2016) - et al.
Adaptive comprehensive learning bacterial foraging optimization and its application on vehicle routing problem with time windows
Neurocomputing
(2015) - et al.
Pinning synchronization of nonlinearly coupled complex networks with time-varying delays using m-matrix strategies
Neurocomputing
(2016) - et al.
Recommending high-utility search engine queries via a query-recommending model
Neurocomputing
(2015)
A rapid learning algorithm for vehicle classification
Inf. Sci.
Large margin clustering on uncertain data by considering probability distribution similarity
Neurocomputing
Elastic image registration using hierarchical spatially based mean shift
Comp. Bio. Med.
A linear threshold-hurdle model for product adoption prediction incorporating social network effects
Inf. Sci.
Content-based image retrieval using high-dimensional information geometry
SCIENCE CHINA Inf. Sci.
Standard plane localization in fetal ultrasound via domain transferred deep neural networks
IEEE J. Biomed. Health Inf.
Supervised regularization locality-preserving projection method for face recognition
IJWMIP
Integrating spreadsheet data via accurate and low-effort extraction
Proceedings of the SIGKDD
Long-tail vocabulary dictionary extraction from the web
Proceedings of the WSDM
Achieving high diversity and multiplexing gains in the asynchronous parallel relay network
Trans. Emerg. Telecommun. Technol.
A precise rfid indoor localization system with sensor network assistance
China Commun.
A note on two problems in connection with graphs
Numerische Math
Finding time-dependent shortest paths over large graphs
Proceedings of the EDBT
Modloc: Localizing multiple objects in dynamic indoor environment
IEEE Trans. Parallel Distrib. Syst.
Multiagent reinforcement social learning toward coordination in cooperative multiagent systems
TAAS
A formal basis for the heuristic determination of minimum cost paths
IEEE Trans. Syst. Sci. Cybern.
Probabilistic path queries in road networks: traffic uncertainty aware path selection
Proceedings of the EDBT
Cited by (16)
Deep understanding of big geospatial data for self-driving: Data, technologies, and systems
2022, Future Generation Computer SystemsDetecting urban hot regions by using massive geo-tagged image data
2021, NeurocomputingCitation Excerpt :In dynamic road networks, such as traffic-aware road networks [26,25,24,14], public transportation network [22,13,38], network expansion is also an efficient approach to discover routes with the minimum travel time or the minimum congestion probability. The research on location based social media data [5,32,7,1,2,9,34,8,38,33], including geo-tagged tweets, geo-tagged images, etc. have received many attention in recent years. Such type of data can be used for accessible location discovery [32], spatio-temporal publish/subscribe [7,1,2,9,34], route planning and recommendation [38].
Context-Aware Travel Support During Unplanned Public Transport Disturbances
2023, International Conference on Vehicle Technology and Intelligent Transport Systems, VEHITS - ProceedingsMTLM: a multi-task learning model for travel time estimation
2022, GeoInformaticaHeuristic-based journey planner for mobility as a service (Maas)
2020, Sustainability (Switzerland)
Shunzhi Zhu is a professor and dean at School of Computer and Information Engineering, Xiamen University of Technology, China. His research interests include data mining, information recommendation, and privacy protection.
Yan Wang is an assistant professor at School of Computer and Information Engineering, Xiamen University of Technology, China. His research interests include data mining and recommendation systems.
Shuo Shang is a professor of computer science at China University of Petroleum, Beijing, China. His research interests include efficient query processing in spatio-temporal databases, spatial trajectory computing, and location based social media.
Guang Zhao received master’s degree in cartography and GIS from Wuhan University, China. He is the chief engineer of Xiamen great power geo information technology limited company. His research interests include GIS technology and cloud computation.
Jiye Wang received Ph.D. degree in electrical engineering from Tsinghua University, China. He is a professor-level senior engineer and the director of SGCC Information and Communication department. His research interests include information technology on power systems and GIS technology.