Skip to main content

A Machine Learning Platform for NLP in Big Data

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1251))

Included in the following conference series:

  • 936 Accesses

Abstract

The objective of this work is the design of an architecture for the management and storage of data that are exponentially increasing and coming from different sources. The main areas of intervention are focused on demonstrating how a solution of a service architecture is able to abstract itself from any software that produces or receives data and guarantees a correct flow of them regardless of the speed or quantity of input, also guaranteeing fault-tolerance. A case study is presented for processing and archiving natural language, ensuring fault tolerance, reliability, speed and a high storage capacity. The result of this paper is an architecture that is able to process text data and to obtain the subject from each of them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hewitt, C., Bishop, P., Steiger, R.: A universal modular actor formalism for artificial intelligence. ACM (1973)

    Google Scholar 

  2. Marz, N.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning, New York (2015)

    Google Scholar 

  3. Estrada, R.: Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka. Apress, New York (2016)

    Book  Google Scholar 

  4. Akka case studies. https://www.lightbend.com/case-studies. Consultato in data 01 Sept 2019

  5. Gamma, E., Helm, R., Johnson, R., Vlissidies, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Boston (1994)

    Google Scholar 

  6. Chambers, B., Zaharia, M.: Spark: The Definitive Guide: Big Data Processing Made Simple. O’Reilly, Sebastopol (2018)

    Google Scholar 

  7. Karau, H.: Learning Spark: Lightning-Fast Big Data Analysis. O’Reilly, Sebastopol (2015)

    Google Scholar 

  8. Mazzei, M., Di Guida, S.: Spatial data warehouse and spatial OLAP in indoor/outdoor cultural environments. In: ICCSA (3), pp. 233–250 (2018)

    Google Scholar 

  9. Mazzei, M.: An unsupervised machine learning approach in remote sensing data. In: ICCSA (3), pp. 435–447 (2019)

    Google Scholar 

  10. Mazzei, M., Palma, A.L.: Spatial multicriteria analysis approach for evaluation of mobility demand in urban areas. In: ICCSA (4), pp. 451–468 (2017)

    Google Scholar 

  11. Mazzei, M.: Software development for unsupervised approach to identification of a multi temporal spatial analysis model. In: Muller, J. (ed.) Proceedings of the 2018 International Conference on Image Processing, Computer Vision, & Pattern Recognition. Computer Science, Computer Engineering & Applied Computing (2018)

    Google Scholar 

  12. Mazzei, M., Palma, A.L.: Spatial statistical models for the evaluation of the landscape. In: ICCSA (4), pp. 419–432 (2013)

    Google Scholar 

  13. Wickramasinghe, A., Ranasinghe, D.C., Fumeaux, C., Hill, K.D., Visvanathan, R.: Sequence learning with passive RFID sensors for real time bed-egress recognition in older people. IEEE J. Biomed. Health Inform. 21, 917–929 (2016)

    Article  Google Scholar 

  14. Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: 26th ACM SIGMOD International Conference on Management of Data (2007)

    Google Scholar 

  15. Ting, S.L., Ip, W.H., Tsang, A.H.C.: Department of Industrial and Systems Engineering. The Hong Kong Polytechnic University, Hung Hum, Kowloon, Hong Kong

    Google Scholar 

  16. Eadline, D.: Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem. Addison-Wesley Professional, Boston (2015)

    Google Scholar 

  17. Alapati, S.R.: Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS. Addison-Wesley Professional, Boston (2016)

    Google Scholar 

  18. Quddus, J.: Machine Learning with Apache Spark Quick Start Guide: Uncover Patterns, Derive Actionable Insights, and Learn from Big Data Using MLlib. Packt Publishing, Birmingham (2018)

    Google Scholar 

  19. Luu, H.: Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library. Apress, New York (2018)

    Book  Google Scholar 

  20. Kimball, R.: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Wiley, Hoboken (2013)

    Google Scholar 

  21. Pandey, P.K.: Kafka Streams - Real-Time Stream Processing. Prashant Kumar Pandey (2019)

    Google Scholar 

  22. Mitchell, R.: Web Scraping With Python: Collecting More Data from the Modern Web. O’Reilly & Associates Inc., Sebastopol (2018)

    Google Scholar 

  23. Marr, B.: Big Data: Using Smart Big Data, Analytics and Metrics to Make Better Decisions and Improve Performance. Wiley, Hoboken (2015)

    Google Scholar 

  24. Helland, P.: Immutability changes everything. In: Conference on Innovative Data Systems Research (CIDR) (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mauro Mazzei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mazzei, M. (2021). A Machine Learning Platform for NLP in Big Data. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_21

Download citation

Publish with us

Policies and ethics