Abstract
Variety is a property related to data diversity in Big Data Systems (BDS) that comprises several cases, such as structural diversity (variety in data types), source diversity (variety in the way data are produced), etc. Recently, adding contextual information allows more complex analyses, which open the possibility of modeling variety thinking of reuse. In this article, we introduce a proposal for modeling variety by following a dual data processing perspective. We exemplify the case of one of these perspectives by identifying and modeling contextual variations in a particular domain problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
Common services are services that will be part of every product derived from the SPL.
- 4.
- 5.
The \(<<\)require\(>>\) restriction implies that every new model is stored in the KB.
- 6.
- 7.
- 8.
- 9.
- 10.
To do so, we used the pivot operation of Pandas (https://pandas.pydata.org/docs/index.html).
- 11.
In order to refine the correlation analysis, the transformed dataset was treated to mitigate influences of null values. Here, we tried several replacement alternatives (zero, mean value, etc.), to choose the better one.
- 12.
We used Keras (https://keras.io/).
- 13.
References
Borrison, R., Klöpper, B., Chioua, M., Dix, M., Sprick, B.: Reusable big data system for industrial data mining - a case study on anomaly detection in chemical plants. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A.J. (eds.) IDEAL 2018. LNCS, vol. 11314, pp. 611–622. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03493-1_64
Buccella, A., Cechich, A., Arias, M., Pol’la, M., Doldan, S., Morsan, E.: Towards systematic software reuse of GIS: Insights from a case study. Comput. Geosci. 54, 9–20 (2013)
Buccella, A., Cechich, A., Pol’la, M., Arias, M., Doldan, S., Morsan, E.: Marine ecology service reuse through taxonomy-oriented SPL development. Comput. Geosci. 73, 108–121 (2014)
Buccella, A., Cechich, A., Porfiri, J., Diniz Dos Santos, D.: Taxonomy-oriented domain analysis of GIS: a case study for paleontological software systems. ISPRS Int. J. Geo-Inf. 8(6), 270 (2019). https://www.mdpi.com/2220-9964/8/6/270
Custers, B., Uršič, H.: Big data and data reuse: a taxonomy of data reuse for balancing big data benefits and personal data protection. Int. Data Priv. Law 6(1), 4–15 (2016)
Davoudian, A., Liu, M.: Big data systems: a software engineering perspective. ACM Comput. Surv. 53(5), 1–39 (2020)
Klein, J.: Reference architectures for big data systems, Carnegie Mellon University’s software engineering institute blog (2017). http://insights.sei.cmu.edu/blog/reference-architectures-for-big-data-systems/. Accessed 9 Jun 2021
Loucks, D.P., van Beek, E.: Water Resource Systems Planning and Management. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-44234-1
Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015). https://doi.org/10.1186/s40537-014-0007-7
Osycka, L., Buccella, A., Cechich, A.: Identificación de variedad contextual en modelado de sistemas big data. In: Memorias del XXVII Congreso Argentino de Ciencias de la Computación (CACIC), pp. 367–376. Red de Universidades con Carreras en Informática (2021)
Pasquetto, I., Randles, B., Borgman, C.: On the reuse of scientific data. Data Sci. J. 16(8) (2017)
Xie, Z., Chen, Y., Speer, J., Walters, T., Tarazaga, P.A., Kasarda, M.: Towards use and reuse driven big data management. In: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 65–74. Association for Computing Machinery (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Osycka, L., Buccella, A., Cechich, A. (2022). Data Variety Modeling: A Case of Contextual Diversity Identification from a Bottom-up Perspective. In: Pesado, P., Gil, G. (eds) Computer Science – CACIC 2021. CACIC 2021. Communications in Computer and Information Science, vol 1584. Springer, Cham. https://doi.org/10.1007/978-3-031-05903-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-05903-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05902-5
Online ISBN: 978-3-031-05903-2
eBook Packages: Computer ScienceComputer Science (R0)