skip to main content
10.1145/3486183.3491000acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

HYPO: skew-resilient partitioning for trajectory datasets

Published:19 November 2021Publication History

ABSTRACT

The rapid increase of GPS-enabled devices has led to immense amounts of trajectory data being collected and analyzed. To provide insight into these datasets, a number of spatio-temporal queries need to be executed efficiently and at scale. One such important query is the Query by Path, which given a series of road segments and a time interval, retrieves all trajectories that have passed through the road segments within a given time interval. The Query by Path finds application in many areas, including traffic management, transportation planning and fleet monitoring.

In this paper we develop an approach to partition and distribute trajectories across a cluster and execute queries by path at scale. At the center of our approach is the partitioning of the entire dataset and indexing each partition with a Trie. We develop a basic set of partitioning approaches and show that each can be rendered inefficient by skew in the dataset. We consequently propose a HYbrid PartitiOning algorithm (HYPO) that performs robustly in face of skew. We also provide the cost models to configure HYPO. Finally we assess its performance extensively using both real and synthetic datasets to demonstrate that it scales well in face of skew.

References

  1. George M Adel'son-Vel'skii and Evgenii Mikhailovich Landis. 1962. An algorithm for organization of information. In Doklady Akademii Nauk, Vol. 146. Russian Academy of Sciences.Google ScholarGoogle Scholar
  2. Apache. [n.d.]. Spark Accumulators. https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#accumulators Accessed: 2019-09-30.Google ScholarGoogle Scholar
  3. Apache. [n.d.]. Spark API Documentation. https://spark.apache.org/docs/2.2.0/api.html Accessed: 2019-09-16.Google ScholarGoogle Scholar
  4. BMW Car IT GmbH. [n.d.]. GitHub - bmwcarit/barefoot. https://github.com/bmwcarit/barefoot Accessed: 2019-09-16.Google ScholarGoogle Scholar
  5. Viorica Botea, Daniel Mallett, Mario A. Nascimento, and Jörg Sander. 2008. PIST: An Efficient and Practical Indexing Technique for Historical Spatio-Temporal Point Data. GeoInformatica 12, 2 (01 Jun 2008).Google ScholarGoogle Scholar
  6. Jian Dai, Bin Yang, Chenjuan Guo, Christian S. Jensen, and Jilin Hu. 2016. Path Cost Distribution Estimation Using Trajectory Data. Proc. VLDB Endow. 10, 3 (Nov. 2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. DiDi Chuxing GAIA Open Dataset Initiative. [n. d.]. Chengdu Trajectory Dataset. https://outreach.didichuxing.com/research/opendata/ Accessed: 2020-01-26.Google ScholarGoogle Scholar
  8. Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In KDD.Google ScholarGoogle Scholar
  9. Filippo Furfaro, Giuseppe Mazzeo, Domenico Saccà, and Cristina Sirangelo. 2008. Compressed hierarchical binary histograms for summarizing multi-dimensional data. Knowl. Inf. Syst. 15 (06 2008), 335--380. Google ScholarGoogle ScholarCross RefCross Ref
  10. Yong Ge, Hui Xiong, Chuanren Liu, and Zhi-Hua Zhou. 2011. A Taxi Driving Fraud Detection System. In International Conference on Data Mining (ICDM).Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Geofabrik GmbH. [n. d.]. Geofabrik Download Server. http://download.geofabrik.de/ Accessed: 2020-23-01.Google ScholarGoogle Scholar
  12. Chong Yang Goh, Justin Dauwels, Nikola Mitrovic, Muhammad Tayyab Asif, Ali Oran, and Patrick Jaillet. 2012. Online map-matching based on hidden markov model for real-time traffic sensing applications. In International Conference on Intelligent Transportation Systems.Google ScholarGoogle ScholarCross RefCross Ref
  13. Benjamin Krogh, Nikos Pelekis, Yannis Theodoridis, and Kristian Torp. 2014. Path-based Queries on Trajectory Data. In SIGSPATIAL.Google ScholarGoogle Scholar
  14. Ruiyuan Li, Sijie Ruan, Jie Bao, Yanhua Li, Yingcai Wu, and Yu Zheng. 2017. Querying Massive Trajectories by Path on the Cloud. In SIGSPATIAL.Google ScholarGoogle Scholar
  15. Sebastian Mattheis, Kazi Khaled Al-Zahid, Birgit Engelmann, Andreas Hildisch, Stefan Holder, Olexiy Lazarevych, Daniel Mohr, Felix Sedlmeier, and Richard Zinck. 2014. Putting the car on the map: a scalable map matching system for the open source community. Informatik (2014).Google ScholarGoogle Scholar
  16. Edward M. McCreight. 1976. A Space-Economical Suffix Tree Construction Algorithm. J. ACM 23, 2 (April 1976).Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Paul Newson and John Krumm. 2009. Hidden Markov Map Matching Through Noise and Sparseness. In GIS.Google ScholarGoogle Scholar
  18. OpenStreetMap. [n. d.]. OSM File Formats - OpenStreetMap Wiki. https://wiki.openstreetmap.org/wiki/OSM_file_formats Accessed: 2020-23-01.Google ScholarGoogle Scholar
  19. Ridester. [n. d.]. How Many Uber Drivers are There? https://www.ridester.com/how-many-uber-drivers-are-there [Online;accessed 11-June-2019].Google ScholarGoogle Scholar
  20. Iulian Sandu Popa, Karine Zeitouni, Vincent Oria, Dominique Barth, and Sandrine Vial. 2011. Indexing In-network Trajectory Flows. The VLDB Journal 20, 5 (Oct. 2011).Google ScholarGoogle Scholar
  21. Renchu Song, Weiwei Sun, Baihua Zheng, and Yu Zheng. 2014. PRESS: A Novel Framework of Trajectory Compression in Road Networks. Proc. VLDB Endow. 7, 9 (May 2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. UCI Machine Learning Repository. [n. d.]. UCI Machine Learning Repository: Taxi Service Trajectory - Prediction Challenge, ECML PKDD 2015 Data Set. https://archive.ics.uci.edu/ml/datasets/Taxi+Service+Trajectory+-+Prediction+Challenge,+ECML+PKDD+2015 Accessed: 2019-09-16.Google ScholarGoogle Scholar
  23. Yilun Wang, Yu Zheng, and Yexiang Xue. 2014. Travel Time Estimation of a Path Using Sparse Trajectories. In KDD.Google ScholarGoogle Scholar
  24. Daqing Zhang, Nan Li, Zhi-Hua Zhou, Chao Chen, Lin Sun, and Shijian Li. 2011. iBAT: Detecting Anomalous Taxi Trajectories from GPS Traces. In UbiComp.Google ScholarGoogle Scholar
  25. Jianting Zhang. 2012. Smarter Outlier Detection and Deeper Understanding of Large-scale Taxi Trip Records: a Case Study of NYC. In SIGKDD.Google ScholarGoogle Scholar

Index Terms

  1. HYPO: skew-resilient partitioning for trajectory datasets

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        LocalRec '21: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Location-based Recommendations, Geosocial Networks and Geoadvertising
        November 2021
        66 pages
        ISBN:9781450391009
        DOI:10.1145/3486183

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 November 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate17of26submissions,65%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader