Full Model Selection in Big Data

Díaz-Pacheco, Angel; Gonzalez-Bernal, Jesús A.; Reyes-García, Carlos Alberto; Escalante-Balderas, Hugo Jair

doi:10.1007/978-3-030-02837-4_23

Full Model Selection in Big Data

Angel Díaz-Pacheco ORCID: orcid.org/0000-0002-5978-0377¹⁵,
Jesús A. Gonzalez-Bernal¹⁵,
Carlos Alberto Reyes-García¹⁵ &
…
Hugo Jair Escalante-Balderas¹⁵

Conference paper
First Online: 01 January 2019

431 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10632))

Abstract

The increasingly larger quantities of information generated in the world over the last few years, has led to the emergence of the paradigm known as Big Data. The analysis of those vast quantities of data has become an important task in science and business in order to turn that information into a valuable asset. Many data analysis tasks involves the use of machine learning techniques during the model creation step and the goal of these predictive models consists on achieving the highest possible accuracy to predict new samples, and for this reason there is high interest in selecting the most suitable algorithm for a specific dataset. This trend is known as model selection and it has been widely studied in datasets of common size, but poorly explored in the Big Data context. As an effort to explore in this direction this work propose an algorithm for model selection in Big Data.

Supported by CONACyT.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Apacheorg: ML tuning: model selection and hyperparameter tuning, August 2016. http://spark.apache.org/docs/latest/ml-tuning.html
Bansal, B., Sahoo, A.: Full model selection using bat algorithm. In: 2015 International Conference on Cognitive Computing and Information Processing (CCIP), pp. 1–4. IEEE (2015)
Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
MathSciNet MATH Google Scholar
Ceruti, C., Bassis, S., Rozza, A., Lombardi, G., Casiraghi, E., Campadelli, P.: DANCo: dimensionality from angle and norm concentration. arXiv preprint arXiv:1206.3881 (2012)
Chatelain, C., Adam, S., Lecourtier, Y., Heutte, L., Paquet, T.: A multi-model selection framework for unknown and/or evolutive misclassification cost problems. Pattern Recogn. 43(3), 815–823 (2010). https://doi.org/10.1016/j.patcog.2009.07.006
Article MATH Google Scholar
Escalante, H.J., Montes, M., Sucar, L.E.: Particle swarm model selection. J. Mach. Learn. Res. 10(Feb), 405–440 (2009)
Google Scholar
Goodrich, M.T., Sitchinava, N., Zhang, Q.: Sorting, searching, and simulation in the MapReduce framework. In: Asano, T., Nakano, S., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 374–383. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25591-5_39
Chapter MATH Google Scholar
Guller, M.: Big Data Analytics with Spark: A Practitioners Guide to Using Spark for Large Scale Data Analysis. Apress, New York (2015). http://www.apress.com/9781484209653
Chapter Google Scholar
Guo, X., Yang, J., Wu, C., Wang, C., Liang, Y.: A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing 71(16), 3211–3215 (2008)
Article Google Scholar
Kaneko, H., Funatsu, K.: Fast optimization of hyperparameters for support vector regression models with highly predictive ability. Chemom. Intell. Lab. Syst. 142, 64–69 (2015). https://doi.org/10.1016/j.chemolab.2015.01.001, http://linkinghub.elsevier.com/retrieve/pii/S0169743915000039
Article Google Scholar
Lessmann, S., Stahlbock, R., Crone, S.F.: Genetic algorithms for support vector machine model selection. In: 2006 International Joint Conference on Neural Networks. IJCNN 2006, pp. 3063–3069. IEEE (2006)
Google Scholar
Lombardi, G., Rozza, A., Ceruti, C., Casiraghi, E., Campadelli, P.: Minimum neighbor distance estimators of intrinsic dimension. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6912, pp. 374–389. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23783-6_24
Chapter Google Scholar
Rosales-Pérez, A.: Surrogate-assisted multi-objective model selection for support vector machines. Neurocomputing 150(2015), 163–172 (2015)
Article Google Scholar
Rosales-Pérez, A., Gonzalez, J.A., Coello Coello, C.A., Escalante, H.J., Reyes-Garcia, C.A.: Multi-objective model type selection. Neurocomputing, 146, 83–94 (2014). https://doi.org/10.1016/j.neucom.2014.05.077, http://linkinghub.elsevier.com/retrieve/pii/S0925231214008789
Article Google Scholar
Sánchez-Monedero, J., Gutiérrez, P.A., Pérez-Ortiz, M., Hervás-Martínez, C.: An n-spheres based synthetic data generator for supervised classification. In: Rojas, I., Joya, G., Gabestany, J. (eds.) IWANN 2013. LNCS, vol. 7902, pp. 613–621. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38679-4_62
Chapter Google Scholar
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855. ACM (2013)
Google Scholar
Tlili, M., Hamdani, T.M.: Big data clustering validity. In: 2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 348–352. IEEE (2014)
Google Scholar
Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Article Google Scholar
Yu, K., Ji, L., Zhang, X.: Kernel nearest-neighbor algorithm. Neural Process. Lett. 15(2), 147–156 (2002)
Article Google Scholar
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro No. 1, Santa Mara Tonantzintla, 72840, Puebla, Mexico
Angel Díaz-Pacheco, Jesús A. Gonzalez-Bernal, Carlos Alberto Reyes-García & Hugo Jair Escalante-Balderas

Authors

Angel Díaz-Pacheco
View author publications
You can also search for this author in PubMed Google Scholar
Jesús A. Gonzalez-Bernal
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Alberto Reyes-García
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Jair Escalante-Balderas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Angel Díaz-Pacheco .

Editor information

Editors and Affiliations

Universidad Autónoma del Estado de Hidalgo, Pachuca, Mexico
Félix Castro
INFOTEC Aguascalientes, Aguascalientes, Mexico
Sabino Miranda-Jiménez
Tecnológico de Monterrey, Atizapán de Zaragoza, Mexico
Miguel González-Mendoza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Díaz-Pacheco, A., Gonzalez-Bernal, J.A., Reyes-García, C.A., Escalante-Balderas, H.J. (2018). Full Model Selection in Big Data. In: Castro, F., Miranda-Jiménez, S., González-Mendoza, M. (eds) Advances in Soft Computing. MICAI 2017. Lecture Notes in Computer Science(), vol 10632. Springer, Cham. https://doi.org/10.1007/978-3-030-02837-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-02837-4_23
Published: 01 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02836-7
Online ISBN: 978-3-030-02837-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics