skip to main content
10.1145/3430984.3430996acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

Inferring customer occupancy status in for-hire vehicles using PU Learning

Authors Info & Claims
Published:02 January 2021Publication History

ABSTRACT

Data from Global Positioning Systems (GPS) and fare-meters in For-Hire vehicles (FHVs) have been used for various applications – both in research as well as organizational decision-making. The utility of such exercises largely depend on the accuracy of the data. This study looks at an environment where the data is partially mislabeled. Specifically, we take a common real-world setting where vehicle operators choose to render transportation services to customers without the use of a fare-meter, often by negotiating a fixed rate with the customer. This practice, which to different degrees, has been observed and documented across urban areas in the world, leads to various undesirable effects. In this study, we seek to identify cases of such behavior in the dataset. Typically, a supervised learning classifier could be built to predict the occupancy status from GPS traces, which can then be used, to look for anomalies between the predicted and stated behaviors. However, in our case the training dataset also contains instances of incorrect tagging. We address this problem by casting it as one of learning from Positive and Unlabeled instances (PU Learning) . This is owing to the fact that we observe the phenomenon of one-sided label noise, where trips tagged ‘vacant’ by the taximeter could be truly vacant or occupied, whereas trips tagged ‘occupied’ are expected to be occupied in reality as well. To support this novel formulation, we apply three state-of-the-art PU Learning algorithms on a real-world trajectory data set from an organization plying 170 active vehicles over a period of two months. We compare these to the baselines of standard supervised learning. Validation is carried out by the organization through alternate channels of investigation which is not indicated in the data set. The results show that the PU Learners provide a significant improvement in classification across a range of metrics when compared to the baseline approaches. This translates to a significant increase in identifying or reclassifying the mislabeled rides.

References

  1. Gilles Blanchard, Marek Flaska, Gregory Handy, Sara Pozzi, and Clayton Scott. 2016. Classification with asymmetric label noise: Consistency and maximal denoising. Electronic journal of Statistics 10, 2 (2016), 2780–2824.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bill De Blasio. 2016. For-hire vehicle Transportation Study. Office of the Mayor, City of New York. http://www1.nyc.gov/assets/operations/downloads/pdf/For-Hire-Vehicle-Transportation-Study.pdf.Google ScholarGoogle Scholar
  3. Bernhard E Boser, Isabelle M Guyon, and Vladimir N Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Leo Breiman. 1996. Bagging predictors. Machine learning 24, 2 (1996), 123–140.Google ScholarGoogle Scholar
  5. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3 (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chao Chen, Daqing Zhang, Pablo Samuel Castro, Nan Li, Lin Sun, Shijian Li, and Zonghui Wang. 2013. iBOAT: Isolation-based online anomalous trajectory detection. IEEE Transactions on Intelligent Transportation Systems 14, 2(2013), 806–818.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Marc Claesen, Frank De Smet, Johan AK Suykens, and Bart De Moor. 2015. A robust ensemble approach to learn from positive and unlabeled data using SVM base models. Neurocomputing 160(2015), 73–84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273–297.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Francois Denis, Anne Laurent, Rémi Gilleron, and Marc Tommasi. 2003. Text classification and co-training from positive and unlabeled examples. In Proceedings of the ICML 2003 workshop: the continuum from labeled to unlabeled data.Google ScholarGoogle Scholar
  10. Jon Fernquest. 2013. Taxi drivers: Customer hotline successful. http://www.bangkokpost.com/learning/learning-news/368779/taxi-drivers-customer-hotline-successful.Google ScholarGoogle Scholar
  11. Benoît Frénay and Michel Verleysen. 2014. Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems 25, 5(2014), 845–869.Google ScholarGoogle ScholarCross RefCross Ref
  12. Donato Hernández Fusilier, Manuel Montes-y Gómez, Paolo Rosso, and Rafael Guzmán Cabrera. 2015. Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management 51, 4 (2015), 433–443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yong Ge, Chuanren Liu, Hui Xiong, and Jian Chen. 2011. A taxi business intelligence system. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Marta C Gonzalez, Cesar A Hidalgo, and Albert-Laszlo Barabasi. 2008. Understanding individual human mobility patterns. Nature 453(2008), 779–782.Google ScholarGoogle ScholarCross RefCross Ref
  15. Zhongqing Huang and Jinjun Chen. 2015. Taxi Operational Status Real Time Monitoring System based on seat sensing. In International Conference on Intelligent Systems Research and Mechatronics Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  16. T. Joachims. 1997. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Machine Learning-International Workshop then Conference.Google ScholarGoogle Scholar
  17. Shehroz S Khan and Michael G Madden. 2009. A survey of recent trends in one class classification. In Proceedings of the 20th Irish conference on Artificial Intelligence and Cognitive Science.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cumhur Kılıç and Mehmet Tan. 2012. Positive unlabeled learning for deriving protein interaction networks. Network Modeling Analysis in Health Informatics and Bioinformatics 1, 3(2012), 87–102.Google ScholarGoogle ScholarCross RefCross Ref
  19. Wang-Chien Lee and John Krumm. 2011. Trajectory preprocessing. In Computing with spatial trajectories. Springer New York, 3–33.Google ScholarGoogle Scholar
  20. Wee Sun Lee and Bing Liu. 2003. Learning with positive and unlabeled examples using weighted logistic regression. In Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003).Google ScholarGoogle Scholar
  21. Quannan Li, Yu Zheng, Xing Xie, Yukun Chen, Wenyu Liu, and Wei-Ying Ma. 2008. Mining user similarity based on location history. In Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xiaoli Li and Bing Liu. 2003. Learning to classify texts using positive and unlabeled data. In Proceedings of the 18th international joint conference on Artificial intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Miao Lin and Wen-Jing Hsu. 2014. Mining GPS data for mobility patterns: A survey. Pervasive and Mobile Computing 12 (2014), 1–16.Google ScholarGoogle ScholarCross RefCross Ref
  24. Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee, and Philip S Yu. 2003. Building text classifiers using positive and unlabeled examples. In Third IEEE International Conference on Data Mining, 2003 (ICDM ’03).Google ScholarGoogle ScholarCross RefCross Ref
  25. Siyuan Liu, Lionel M Ni, and Ramayya Krishnan. 2014. Fraud detection from taxis’ driving behaviors. IEEE Transactions on Vehicular Technology 63, 1 (2014), 464–472.Google ScholarGoogle ScholarCross RefCross Ref
  26. Fantine Mordelet and Jean-Philippe Vert. 2011. ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples. BMC bioinformatics 12, 1 (2011).Google ScholarGoogle Scholar
  27. Fantine Mordelet and J-P Vert. 2014. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognition Letters 37 (2014), 201–209.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. OpenStreetMap.Com. 2004. World Map. https://www.openstreetmap.org, last accessed: Feb 8, 2016.Google ScholarGoogle Scholar
  29. Overpass-Turbo.eu. 2016. Data Mining tool for OpenStreetMap. https://overpass-turbo.eu/, last accessed: March 12, 2016.Google ScholarGoogle Scholar
  30. Santi Phithakkitnukoon, Marco Veloso, Carlos Bento, Assaf Biderman, and Carlo Ratti. 2010. Taxi-aware map: Identifying and predicting vacant taxis in the city. In International Joint Conference on Ambient Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  31. Mike Rizzo. 2010. Taxi scams. http://landingpadba.com/taxi-scams/.Google ScholarGoogle Scholar
  32. Bernhard Schölkopf, John C Platt, John Shawe-Taylor, Alex J Smola, and Robert C Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural computation 13, 7 (2001), 1443–1471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. John Shaheen. 1967. Taxi meter monitoring system. Publication No. US 3,343,624 A, Filed Oct 22nd.,1965 ,Issued Sep 26th 1967.Google ScholarGoogle Scholar
  34. Peter Torjesen. 2015. The Truth About Taxis in Bangkok. https://petertorjesen.wordpress.com/2015/05/31/the-truth-about-taxis-in-bangkok/.Google ScholarGoogle Scholar
  35. Chunlin Wang, Chris Ding, Richard F Meraz, and Stephen R Holbrook. 2006. PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics 22, 21 (2006), 2590–2596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Beth Williams. 2015. Taxi Scams-Domestic and International. https://www.corporatetravelsafety.com/safety-tips/tax-scams-domestic-and-international/.Google ScholarGoogle Scholar
  37. Jing Yuan, Yu Zheng, Liuhang Zhang, XIng Xie, and Guangzhong Sun. 2011. Where to find my next passenger. In Proceedings of the 13th international conference on Ubiquitous computing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Bangzuo Zhang and Wanli Zuo. 2008. Learning from positive and unlabeled examples: A survey. In International Symposiums on Information Processing (ISIP), 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Daqing Zhang, Nan Li, Zhi-Hua Zhou, Chao Chen, Lin Sun, and Shijian Li. 2011. iBAT: detecting anomalous taxi trajectories from GPS traces. In Proceedings of the 13th international conference on Ubiquitous computing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yu Zheng. 2015. Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology (TIST) 6, 3(2015).Google ScholarGoogle Scholar
  41. Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. 2014. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 3(2014).Google ScholarGoogle Scholar
  42. Xingquan Zhu and Xindong Wu. 2004. Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review 22, 3 (2004), 177–210.Google ScholarGoogle ScholarCross RefCross Ref
  43. Yin Zhu, Yu Zheng, Liuhang Zhang, Darshan Santani, Xing Xie, and Qiang Yang. 2011. Inferring Taxi Status using GPS Trajectories. Technical Report. Microsoft Research. Report MSR-TR-2011-144.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)
    January 2021
    453 pages

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 2 January 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate197of680submissions,29%
  • Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format