Abstract
The cab services, present in most of the cities, are one of the most used offerings for passenger transportation. Nowadays their business model is being threatened by the meddling of emerging third parties powered by modern technologies. Based on the New York cab data, we will make a comparison of several machine learning techniques (linear regression, support vector machines and random forest) for forecasting the amount of dollars spent in the cab service. The comparison of those methods will focus on the accuracy of their forecasts under several circumstances: real data applied to all features, some noisy data (real data with some uniform distributed noise added) applied to several key features and some estimated data (obtained from other statistical estimators) applied to the key features. The main goal of this comparison is to provide some data regarding the performance of those methods when they are used in conjunction with other estimators
Similar content being viewed by others
References
Aarhaug J, Skollerud K (2014) Taxi: different solutions in different segments. Transportation Research Procedia 1(1):276–283
Adusumilli S, Bhatt D, Wang H, Devabhaktuni V, Bhattacharya P (2015) A novel hybrid approach utilizing principal component regression and random forest regression to bridge the period of gps outages. Neurocomputing 166:185–192
Ahmed MS, Cook AR (1979) Analysis of freeway traffic Time-Series data by using Box-Jenkins techniques. Transportation Research Board
Azevedo CL, Cardoso JL, Ben-Akiva M (2014) Vehicle tracking using the k-shortest paths algorithm and dual graphs. Transportation Research Procedia 1(1):3–11
Bloomberg MR, Yasski D (2014) Factbook. http://www.nyc.gov/html/tlc/downloads/pdf/2014_taxicab_fact_book.pdf
Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley, Hoboken
Brands T, de Romph E, Veitch T, Cook J (2014) Modelling public transport route choice, with multiple access and egress modes. Transportation research procedia 1(1):12–23
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Dai W, Brisimi TS, Adams WG, Mela T, Saligrama V, Paschalidis IC (2015) Prediction of hospitalization due to heart diseases by supervised learning methods. Int J Med Inform 84(3):189–197
(2016) Dan Work: Cab data public repository at the University of Illinois. http://www.dos.ny.gov/coog/foil2.html
Dobson AJ, Barnett A (2008) An introduction to generalized linear models. CRC Press, Boca Raton
García Turrado F, García Villalba LJ, Portela J (2012) Intelligent system for time series classification using support vector machines applied to supply-chain. Expert Systems with Applications 39(12):10,590–10,599
Guo G (2014) Soft biometrics from face images using support vector machines. In: Support vector machines applications, Springer, pp 269–302
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer Science & Business Media, Berlin
Hastie TJ, Pregibon D (1992) Statistical Models, chap. Generalized linear models. Hastie, Wadsworth & Brooks/Cole
Hwang RH, Hsueh YL, Chen YT (2015) An effective taxi recommender ssystem based on a spatio-temporal factor analysis model. Inf Sci 314:28–40
Hyndman RJ (2014) Forecast package for R https://cran.r-project.org/web/packages/forecast/forecast.pdf
Kumar M, Rath SK (2015) Classification of microarray using mapreduce based proximal support vector machine classifier. Knowl-Based Syst 89:584–602
Liaw A, Wiener M (2002) Classification and regression by randomforest. R news 2(3):18–22
Lindner C, Bromiley PA, Ionita MC, Cootes TF (2015) Robust and accurate shape model matching using random forest regression-voting. IEEE Trans Pattern Anal Mach Intell 37(9):1862–1874
McCullagh P (1989) Generalized linear models. Chapman and Hall, UK
McCullagh P, Nelder JA (1989) Generalized linear models. Chapman and Hall, UK
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2012) Misc functions of the department of statistics (e1071). TU Wien, Version pp. 1–6
Nelder JA, Baker RJ (1972) Generalized linear models. Encyclopedia of Statistical Sciences
(2016) New York Freedom Of Information Law: Freedom of Information Law http://www.dos.ny.gov/coog/foil2.html
Pan X, Luo Y, Xu Y (2015) K-nearest neighbor based structural twin support vector machine. Knowl-Based Syst 88:34–44
Tomar D, Agarwal S (2015) A comparison on multi-class classification methods based on least squares twin support vector machine. Knowl-Based Syst 81:131–147
Were K, Bui DT, Dick ØB, Singh BR (2015) A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an afromontane landscape. Ecol Indic 52:394–403
Wickham H (2009) ggplot2: elegant graphics for data analysis. Springer Science & Business Media, Berlin
Wong R, Szeto W, Wong S (2014) Bi-level decisions of vacant taxi drivers traveling towards taxi stands in customer-search: modeling methodology and policy implications. Transp Policy 33:73–81
Wong R, Szeto W, Wong S (2014) A cell-based logit-opportunity taxi customer-search model. Transportation Research Part C: Emerging Technologies 48:84–96
Wu Q, Ye Y, Zhang H, Ng MK, Ho SS (2014) Forestexter: an efficient random forest algorithm for imbalanced text categorization. Knowl-Based Syst 67:105–116
Yu H, Mu C, Sun C, Yang W, Yang X, Zuo X (2015) Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data. Knowl-Based Syst 76:67–78
Zhang W, Niu P, Li G, Li P (2013) Forecasting of turbine heat rate with online least squares support vector machine based on gravitational search algorithm. Knowl-Based Syst 39:34–44
Zheng B, Myint SW, Thenkabail PS, Aggarwal RM (2015) A support vector machine to identify irrigated crop types using time-series landsat ndvi data. Int J Appl Earth Obs Geoinf 34:103–112
Zhou QF, Zhou H, Ning YP, Yang F, Li T (2015) Two approaches for novelty detection using random forest. Expert Systems With Applications 42(10):4840–4850
Acknowledgements
This research work was supported by Sungshin Women’s University. In addition, L.J.G.V. and A.L.S.O thanks the European Commission Horizon 2020 5G-PPP Programme (Grant Agreement number H2020-ICT-2014-2/671672-SELFNET - Framework for Self-Organized Network Management in Virtualized and Software-Defined Networks).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Turrado García, F., García Villalba, L.J., Sandoval Orozco, A.L. et al. A comparison of learning methods over raw data: forecasting cab services market share in New York City. Multimed Tools Appl 78, 29783–29804 (2019). https://doi.org/10.1007/s11042-018-6285-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6285-x