Automated Spark Clusters Deployment for Big Data with Standalone Applications Integration

Fernández, A. M.; Torres, J. F.; Troncoso, A.; Martínez-Álvarez, F.

doi:10.1007/978-3-319-44636-3_14

A. M. Fernández²⁰,
J. F. Torres²⁰,
A. Troncoso²⁰ &
…
F. Martínez-Álvarez²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9868))

Included in the following conference series:

Conference of the Spanish Association for Artificial Intelligence

1735 Accesses
1 Citations

Abstract

The huge amount of data stored nowadays has turned big data analytics into a very trendy research field. Spark has emerged as a very powerful and widely used paradigm for clusters deployment and big data management. However, to get started is still a very tough task, due to the excessive requisites that all nodes must fulfil. Thus, this work introduces a web service specifically designed for an easy and efficient Spark cluster management. In particular, a service with a friendly graphical user interface has been developed to automate the deploying of clusters. Another relevant feature is the possibility of integrating any algorithm into the web service. That is, the user only needs to provide the executable file and the number of required inputs for a proper parametrization. Finally, an illustrative case study is included to show ad hoc algorithms usage (the MLlib implementation for k-means, in this case) across the nodes of the configured cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Framework for Clustering and Classification of Big Data Using Spark

Hadoop: A Standard Framework for Computer Cluster

HDBSCAN: Evaluating the Performance of Hierarchical Clustering for Big Data

References

Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Gorton, I., Greenfield, P., Szalay, A., Williams, R.: Computing in the 21st century. IEEE Comput. 41(4), 30–32 (2008)
Article Google Scholar
Hamstra, M., Karau, H., Zaharia, M., Knwinski, A., Wendell, P.: Learning Spark: Lightning-Fast Big Analytics. O’ Really Media, Sebastopol (2015)
Google Scholar
Kouzes, R.T., Anderson, G.A., Elbert, S.T., Gorton, I., Gracio, D.K.: The changing paradigm of data-intensive computing. Computer 42(1), 26–34 (2009)
Article Google Scholar
Middleton, A.M.: Data-Intensive Technologies for Cloud Computing. Springer, Heidelberg (2010)
Book Google Scholar
Minelli, M., Chambers, M., Dhiraj, A., Data, B., Analytics, B.: Emerging Business Intelligence and Analytics Trends for Today’s Businesses. Wiley, Hoboken (2013)
Book Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., Dewitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD International conference on Management of Data, pp. 165–178 (2009)
Google Scholar
Pérez-Chacón, R., Talavera-Llames, R.L., Troncoso, A., Martínez-Álvarez, F.: Finding electric energy consumption patterns in big time series data. In: Proceedings of the International Conference on Distributed Computing and Artificial Intelligence, pp. 231–238 (2016)
Google Scholar
Talavera-Llames, R.L., Pérez-Chacón, R., Martínez-Ballesteros, M., Troncoso, A., Martínez-Álvarez, F.: A nearest neighbours-based algorithm for big time series data forecasting. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS, vol. 9648, pp. 174–185. Springer, Heidelberg (2016). doi:10.1007/978-3-319-32034-2_15
Chapter Google Scholar
White, T.: Hadoop: The definitive Guide. O’ Really Media, Sebastopol (2012)
Google Scholar

Download references

Acknowledgements

The authors would like to thank the Spanish Ministry of Economy and Competitiveness, Junta de Andalucía for the support under projects TIN2014-55894-C2-R and P12-TIC-1728 and PRY153/14, respectively.

Author information

Authors and Affiliations

Division of Computer Science, Universidad Pablo de Olavide, 41013, Seville, Spain
A. M. Fernández, J. F. Torres, A. Troncoso & F. Martínez-Álvarez

Authors

A. M. Fernández
View author publications
You can also search for this author in PubMed Google Scholar
J. F. Torres
View author publications
You can also search for this author in PubMed Google Scholar
A. Troncoso
View author publications
You can also search for this author in PubMed Google Scholar
F. Martínez-Álvarez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to F. Martínez-Álvarez .

Editor information

Editors and Affiliations

Artificial Intelligence Center, University of Oviedo, Gijón, Spain
Oscar Luaces
University of Castilla-La Mancha , Albacete, Spain
José A. Gámez
Public University of Navarre , Pamplona, Spain
Edurne Barrenechea
Universidad Pablo de Olavide , Sevilla, Spain
Alicia Troncoso
Public University of Navarre , Pamplona, Navarra, Spain
Mikel Galar
University of Salamanca , Salamanca, Spain
Héctor Quintián
University of Salamanca , Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernández, A.M., Torres, J.F., Troncoso, A., Martínez-Álvarez, F. (2016). Automated Spark Clusters Deployment for Big Data with Standalone Applications Integration. In: Luaces , O., et al. Advances in Artificial Intelligence. CAEPIA 2016. Lecture Notes in Computer Science(), vol 9868. Springer, Cham. https://doi.org/10.1007/978-3-319-44636-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-44636-3_14
Published: 08 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44635-6
Online ISBN: 978-3-319-44636-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automated Spark Clusters Deployment for Big Data with Standalone Applications Integration

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Framework for Clustering and Classification of Big Data Using Spark

Hadoop: A Standard Framework for Computer Cluster

HDBSCAN: Evaluating the Performance of Hierarchical Clustering for Big Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automated Spark Clusters Deployment for Big Data with Standalone Applications Integration

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Framework for Clustering and Classification of Big Data Using Spark

Hadoop: A Standard Framework for Computer Cluster

HDBSCAN: Evaluating the Performance of Hierarchical Clustering for Big Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation