Abstract
The objective of this work is the design of an architecture for the management and storage of data that are exponentially increasing and coming from different sources. The main areas of intervention are focused on demonstrating how a solution of a service architecture is able to abstract itself from any software that produces or receives data and guarantees a correct flow of them regardless of the speed or quantity of input, also guaranteeing fault-tolerance. A case study is presented for processing and archiving natural language, ensuring fault tolerance, reliability, speed and a high storage capacity. The result of this paper is an architecture that is able to process text data and to obtain the subject from each of them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hewitt, C., Bishop, P., Steiger, R.: A universal modular actor formalism for artificial intelligence. ACM (1973)
Marz, N.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning, New York (2015)
Estrada, R.: Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka. Apress, New York (2016)
Akka case studies. https://www.lightbend.com/case-studies. Consultato in data 01 Sept 2019
Gamma, E., Helm, R., Johnson, R., Vlissidies, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Boston (1994)
Chambers, B., Zaharia, M.: Spark: The Definitive Guide: Big Data Processing Made Simple. O’Reilly, Sebastopol (2018)
Karau, H.: Learning Spark: Lightning-Fast Big Data Analysis. O’Reilly, Sebastopol (2015)
Mazzei, M., Di Guida, S.: Spatial data warehouse and spatial OLAP in indoor/outdoor cultural environments. In: ICCSA (3), pp. 233–250 (2018)
Mazzei, M.: An unsupervised machine learning approach in remote sensing data. In: ICCSA (3), pp. 435–447 (2019)
Mazzei, M., Palma, A.L.: Spatial multicriteria analysis approach for evaluation of mobility demand in urban areas. In: ICCSA (4), pp. 451–468 (2017)
Mazzei, M.: Software development for unsupervised approach to identification of a multi temporal spatial analysis model. In: Muller, J. (ed.) Proceedings of the 2018 International Conference on Image Processing, Computer Vision, & Pattern Recognition. Computer Science, Computer Engineering & Applied Computing (2018)
Mazzei, M., Palma, A.L.: Spatial statistical models for the evaluation of the landscape. In: ICCSA (4), pp. 419–432 (2013)
Wickramasinghe, A., Ranasinghe, D.C., Fumeaux, C., Hill, K.D., Visvanathan, R.: Sequence learning with passive RFID sensors for real time bed-egress recognition in older people. IEEE J. Biomed. Health Inform. 21, 917–929 (2016)
Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: 26th ACM SIGMOD International Conference on Management of Data (2007)
Ting, S.L., Ip, W.H., Tsang, A.H.C.: Department of Industrial and Systems Engineering. The Hong Kong Polytechnic University, Hung Hum, Kowloon, Hong Kong
Eadline, D.: Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem. Addison-Wesley Professional, Boston (2015)
Alapati, S.R.: Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS. Addison-Wesley Professional, Boston (2016)
Quddus, J.: Machine Learning with Apache Spark Quick Start Guide: Uncover Patterns, Derive Actionable Insights, and Learn from Big Data Using MLlib. Packt Publishing, Birmingham (2018)
Luu, H.: Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library. Apress, New York (2018)
Kimball, R.: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Wiley, Hoboken (2013)
Pandey, P.K.: Kafka Streams - Real-Time Stream Processing. Prashant Kumar Pandey (2019)
Mitchell, R.: Web Scraping With Python: Collecting More Data from the Modern Web. O’Reilly & Associates Inc., Sebastopol (2018)
Marr, B.: Big Data: Using Smart Big Data, Analytics and Metrics to Make Better Decisions and Improve Performance. Wiley, Hoboken (2015)
Helland, P.: Immutability changes everything. In: Conference on Innovative Data Systems Research (CIDR) (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mazzei, M. (2021). A Machine Learning Platform for NLP in Big Data. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-55187-2_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55186-5
Online ISBN: 978-3-030-55187-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)