A Comparison of Predictive Analytics Solutions on Hadoop

Norousi, Ramin; Bauer, Jan; Härting, Ralf-Christian; Reichstein, Christopher

doi:10.1007/978-3-319-59424-8_15

Ramin Norousi⁶,
Jan Bauer⁶,
Ralf-Christian Härting⁷ &
…
Christopher Reichstein⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 73))

Included in the following conference series:

International Conference on Intelligent Decision Technologies

1559 Accesses
2 Citations
1 Altmetric

Abstract

New approaches regarding data streaming, data storage and data analysis have been developed facing the huge volume and velocity of generated data. Enterprises are convinced that one of their key success factor is to consider available data searching for patterns and predicting the future in order to gain more insights about their business, to optimize processes and to save costs. Hence, predictive analytics has never been considered more important than it is now. Hadoop as a popular open-source framework was introduced to store and process extremely large data sets. The paper shows various ways of carrying out predictive analytics based on a Hadoop ecosystem. We investigated different solutions of both commercial vendors and open-source communities interoperating with Hadoop. Each scenario is described by its technical implementation, features and restrictions. A comparison sums up the most important issues to get a deeper insight in order to optimize Predictive Analytics Solutions based on Hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Operating Systems Design and Implementation (OSDI), p. 10. USENIX Association, Berkeley (2004)
Google Scholar
White, T.E.: Hadoop: The Definitive Guide, 3rd edn. O’Reilly, Sebastopol (2012)
Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Shvachko, K., Kuang, H., Radia, S. (eds.) 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–2. IEEE, Incline Village (2010)
Google Scholar
Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., Ranjan, R., Georgakopoulos, D.: A security framework in G-Hadoop for big data computing across distributed Cloud data centres. J. Comput. Syst. Sci. 80(5), 994–1007 (2014)
Article MathSciNet MATH Google Scholar
McAfee, A., Brynjolfsson, E., Davenport, T.H., Patil, D.J., Barton, D.: Big data. The management revolution. Harv. Bus. Rev. 90(10), 61–67 (2012)
Google Scholar
Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Khan, S.U.: The rise of “big data” on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015)
Article Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.S.: Cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud), p. 10. USIENIX Association, Berkeley (2010)
Google Scholar
Srirama, S.N., Jakovits, P., Vainikko, E.: Adapting scientific computing problems to clouds using MapReduce. Future Gener. Comput. Syst. 28(1), 184–192 (2012)
Article Google Scholar
Sagiroglu, S., Sinanc, D.: Big data: a review. In: International Conference on Collaboration Technologies and Systems (CTS), pp. 42–47. IEEE, San Diego (2013)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M.: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. EECS Department, University of California, Berkeley (2011)
Google Scholar
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17(34), 1–7 (2016)
MathSciNet MATH Google Scholar
Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media, New York (2011)
Google Scholar
Patel, A.B., Birla, M., Nair, U.: Addressing big data problem using Hadoop and MapReduce. In: Nirma University International Conference on Engineering (NUiCONE), pp. 1–5. IEEE, Ahmedabad (2012)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2012)
MATH Google Scholar
Apache Spark: Apache Spark™ - Lightning-Fast Cluster Computing. https://spark.apache.org/. Accessed 11 Jan 2017
Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Article Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Google Scholar
Odersky, M., Venners, B., Spoon, L.: Programming in Scala, 2nd edn. Artima Press, Walnut Creek (2011)
Google Scholar
DMG: Data Mining Group. http://dmg.org/. Accessed 17 Jan 2017
Kart, L., Herschel, G., Linden, A., Hare, J.: Magic quadrant for advanced analytics platforms. Gartner report 9 (2016)
Google Scholar
IBM: IBM SPSS Analytic Server Version 3.0: Overview. ftp://public.dhe.ibm.com/software/analytics/spss/documentation/analyticserver/3.0/English/IBM_SPSS_Analytic_Server_3.0_Overview.pdf. Accessed 19 Jan 2017
RapidMiner Radoop: RapidMiner Radoop - RapidMiner Documentation. http://docs.rapidminer.com/radoop/. Accessed 19 Jan 2017
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N.: Hive - a petabyte scale data warehouse using Hadoop. In: IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE, Piscataway (2010)
Google Scholar
Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. ACM SIGKDD Explor. Newsl. 14(2), 1–5 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Business Field Advanced Analytics, MHP – A Porsche Company, Ludwigsburg, Germany
Ramin Norousi & Jan Bauer
Business Administration, Aalen University of Applied Sciences, Aalen, Germany
Ralf-Christian Härting & Christopher Reichstein

Authors

Ramin Norousi
View author publications
You can also search for this author in PubMed Google Scholar
Jan Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Ralf-Christian Härting
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Reichstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher Reichstein .

Editor information

Editors and Affiliations

Maritime University , Gdynia, Poland
Ireneusz Czarnowski
Bournemouth University and KES International, Poole, Dorset, United Kingdom
Robert J. Howlett
University of Canberra, Canberra, Aust Capital Terr, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Norousi, R., Bauer, J., Härting, RC., Reichstein, C. (2018). A Comparison of Predictive Analytics Solutions on Hadoop. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies 2017. IDT 2017. Smart Innovation, Systems and Technologies, vol 73. Springer, Cham. https://doi.org/10.1007/978-3-319-59424-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-59424-8_15
Published: 26 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59423-1
Online ISBN: 978-3-319-59424-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics