Skip to main content

A Research Study on Running Machine Learning Algorithms on Big Data with Spark

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12815))

  • 2308 Accesses

Abstract

The design and implementation of proactive fault diagnosis systems concerning the bearings during their manufacturing process requires the selection of robust representation learning techniques, which belong to the broader scope of the machine learning techniques. Particular systems, such as those that are based on machine learning libraries like Scikit-learn, favor the actual processing of the data, while essentially disregarding relevant computational parameters, such as the speed of the data processing, or the consideration of scalability as an important design and implementation feature. This paper describes an integrated machine learning-based data analytics system, which processes the large amounts of data that are generated by the bearings manufacturing processes using a multinode cluster infrastructure. The data analytics system uses an optimally configured and deployed Spark environment. The proposed data analytics system is thoroughly assessed using a large dataset that stores real manufacturing data, which is generated by the respective bearings manufacturing processes. The performance assessment demonstrates that the described approach ensures the timely and scalable processing of the data. This achievement is relevant, as it exceeds the processing capabilities of significant existing data analytics systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baicoianu, A., Mathe, A.: Diagnose bearing failures with machine learning models (2021). In review process

    Google Scholar 

  2. Cachuan, A.: A gentle introduction to apache arrow with apache spark and pandas (2020). https://towardsdatascience.com

  3. Databricks: Parquet files (2020). https://docs.databricks.com/data/data-sources/read-parquet.html

  4. Davis, C.: Big data on a laptop: Tools and strategies - part 3 (2018). https://tech.popdata.org

  5. Driscoll, M.: Winning with big data: Secrets of the successful data scientist (2010). https://conferences.oreilly.com/datascience/public/schedule/detail/15316

  6. Freitas, C., Cuenca, J., Morais, P., Ompusunggu, A., Sarrazin, M., Janssens, K.: Comparison of vibration and acoustic measurements for detection of bearing defects. In: International Conference on Noise and Vibration Engineering 2016 and International Conference on Uncertainty in Structural Dynamics 2016, vol. 1 (2016)

    Google Scholar 

  7. Nagpal, A., Gabrani, G.: Python for data analytics, scientific and technical applications. In: 2019 Amity International Conference on Artificial Intelligence (AICAI), pp. 140–145. IEEE (2019)

    Google Scholar 

  8. Pedapatnam, R.: Understanding resource allocation configurations for a spark application (2016). http://site.clairvoyantsoft.com/

  9. Spark, A.: Pyspark usage guide for pandas with apache arrow (2020). https://spark.apache.org/docs

  10. Case Western Reserve University: The case western reserve university bearing data center website (2020). https://csegroups.case.edu/bearingdatacenter

  11. Zhang, R., Tao, H., Wu, L., Guan, Y.: Transfer learning with neural networks for bearing fault diagnosis in changing working conditions. IEEE Access 5, 14347–14357 (2017)

    Article  Google Scholar 

  12. Zhang, S., Zhang, S., Wang, B., Habetler, T.G.: Machine learning and deep learning algorithms for bearing fault diagnostics-a comprehensive review. arXiv preprints arXiv:1901.08247 (2019)

Download references

Acknowledgments

The authors wish to extend their gratitude to Siemens Industry Software Romania for their kind support and for the industrial experimental dataset, and also to the Transilvania University of Brasov for the provision of the necessary hardware infrastructure.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Razvan Bocu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kerestely, A., Baicoianu, A., Bocu, R. (2021). A Research Study on Running Machine Learning Algorithms on Big Data with Spark. In: Qiu, H., Zhang, C., Fei, Z., Qiu, M., Kung, SY. (eds) Knowledge Science, Engineering and Management. KSEM 2021. Lecture Notes in Computer Science(), vol 12815. Springer, Cham. https://doi.org/10.1007/978-3-030-82136-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-82136-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-82135-7

  • Online ISBN: 978-3-030-82136-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics