Skip to main content

Data Variety Modeling: A Case of Contextual Diversity Identification from a Bottom-up Perspective

  • Conference paper
  • First Online:
Computer Science – CACIC 2021 (CACIC 2021)

Abstract

Variety is a property related to data diversity in Big Data Systems (BDS) that comprises several cases, such as structural diversity (variety in data types), source diversity (variety in the way data are produced), etc. Recently, adding contextual information allows more complex analyses, which open the possibility of modeling variety thinking of reuse. In this article, we introduce a proposal for modeling variety by following a dual data processing perspective. We exemplify the case of one of these perspectives by identifying and modeling contextual variations in a particular domain problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://hadoop.apache.org/.

  2. 2.

    https://spark.apache.org/.

  3. 3.

    Common services are services that will be part of every product derived from the SPL.

  4. 4.

    https://www.json.org/json-es.html.

  5. 5.

    The \(<<\)require\(>>\) restriction implies that every new model is stored in the KB.

  6. 6.

    https://data.kingcounty.gov/Environment-Waste-Management/Water-Quality/vwmt-pvjw.

  7. 7.

    https://flume.apache.org/.

  8. 8.

    https://hbase.apache.org/.

  9. 9.

    https://cassandra.apache.org/.

  10. 10.

    To do so, we used the pivot operation of Pandas (https://pandas.pydata.org/docs/index.html).

  11. 11.

    In order to refine the correlation analysis, the transformed dataset was treated to mitigate influences of null values. Here, we tried several replacement alternatives (zero, mean value, etc.), to choose the better one.

  12. 12.

    We used Keras (https://keras.io/).

  13. 13.

    https://inta.gob.ar/altovalle.

References

  1. Borrison, R., Klöpper, B., Chioua, M., Dix, M., Sprick, B.: Reusable big data system for industrial data mining - a case study on anomaly detection in chemical plants. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A.J. (eds.) IDEAL 2018. LNCS, vol. 11314, pp. 611–622. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03493-1_64

    Chapter  Google Scholar 

  2. Buccella, A., Cechich, A., Arias, M., Pol’la, M., Doldan, S., Morsan, E.: Towards systematic software reuse of GIS: Insights from a case study. Comput. Geosci. 54, 9–20 (2013)

    Article  Google Scholar 

  3. Buccella, A., Cechich, A., Pol’la, M., Arias, M., Doldan, S., Morsan, E.: Marine ecology service reuse through taxonomy-oriented SPL development. Comput. Geosci. 73, 108–121 (2014)

    Article  Google Scholar 

  4. Buccella, A., Cechich, A., Porfiri, J., Diniz Dos Santos, D.: Taxonomy-oriented domain analysis of GIS: a case study for paleontological software systems. ISPRS Int. J. Geo-Inf. 8(6), 270 (2019). https://www.mdpi.com/2220-9964/8/6/270

  5. Custers, B., Uršič, H.: Big data and data reuse: a taxonomy of data reuse for balancing big data benefits and personal data protection. Int. Data Priv. Law 6(1), 4–15 (2016)

    Google Scholar 

  6. Davoudian, A., Liu, M.: Big data systems: a software engineering perspective. ACM Comput. Surv. 53(5), 1–39 (2020)

    Article  Google Scholar 

  7. Klein, J.: Reference architectures for big data systems, Carnegie Mellon University’s software engineering institute blog (2017). http://insights.sei.cmu.edu/blog/reference-architectures-for-big-data-systems/. Accessed 9 Jun 2021

  8. Loucks, D.P., van Beek, E.: Water Resource Systems Planning and Management. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-44234-1

    Book  Google Scholar 

  9. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015). https://doi.org/10.1186/s40537-014-0007-7

    Article  Google Scholar 

  10. Osycka, L., Buccella, A., Cechich, A.: Identificación de variedad contextual en modelado de sistemas big data. In: Memorias del XXVII Congreso Argentino de Ciencias de la Computación (CACIC), pp. 367–376. Red de Universidades con Carreras en Informática (2021)

    Google Scholar 

  11. Pasquetto, I., Randles, B., Borgman, C.: On the reuse of scientific data. Data Sci. J. 16(8) (2017)

    Google Scholar 

  12. Xie, Z., Chen, Y., Speer, J., Walters, T., Tarazaga, P.A., Kasarda, M.: Towards use and reuse driven big data management. In: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 65–74. Association for Computing Machinery (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Líam Osycka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Osycka, L., Buccella, A., Cechich, A. (2022). Data Variety Modeling: A Case of Contextual Diversity Identification from a Bottom-up Perspective. In: Pesado, P., Gil, G. (eds) Computer Science – CACIC 2021. CACIC 2021. Communications in Computer and Information Science, vol 1584. Springer, Cham. https://doi.org/10.1007/978-3-031-05903-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05903-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05902-5

  • Online ISBN: 978-3-031-05903-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics