Skip to main content

A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws

  • Conference paper
  • First Online:
Advances in Computational Intelligence (MICAI 2022)

Abstract

This article focuses on the one hand, on showing some techniques applied during the preprocessing of texts represented by environmental laws of Mexico. The need to carry out this type of analysis is due to several factors such as: the large number of existing legislative documents such as laws, programs, regulations, etc., the modifications that are made to the legal system due to reforms and decrees, and especially, to those possible contradictions that may arise among one or more laws. On the other hand, certain tasks of the CRISP-DM methodology were selected and, specifically, for the data preparation phase in the generic tasks of selection, cleaning, transformation, and formatting. This was done using the NLTK library through text preprocessing techniques of tokenization, segmentation, denoising and normalization. Among the most remarkable results there is a combination between CRISP-DM and Team Data Science Process by Microsoft oriented to the preprocessing of Mexican federal environmental laws. In addition, this article shows a detailed application of the hybrid methodology with the execution of a specialized task related to the extraction of text from a pdf file using the PyPDF2 and Pdfplumber libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pyle, D.: Data Preparation for Data Mining, 1st edn. Morgan Kaufmann Pub, Burlington (1999)

    Google Scholar 

  2. Medina Palmeros, F.: Leyes ambientales son letra muerta, acusa biólogo. Diario de Xalapa (2017)

    Google Scholar 

  3. Flores, D.: Exigen ejecutar leyes ambientales. Diario de Xalapa (2019)

    Google Scholar 

  4. Narave Flores, H., Cházaro Basañez, M.J., Arzaba Villalba, C.: La Paila un proyecto ambientalmente inviable: necesidad de fortalecer legislación de protección Ambiental. In: Aguilar López, M., Canales Espinosa, M., Domínguez González, N., Ojeda Jimeno, A., (eds.) En defensa del patrimonio natural y cultural de Veracruz. El caso del proyecto de la Mina La Paila, Municipio de Alto Lucero, Veracruz, Secretaría de Medio Ambiente del Estado de Veracruz/Universidad Veracruzana, pp. 29–40 (2018)

    Google Scholar 

  5. Hidalgo Reyes, M.A.: Ayudando al medio ambiente con inteligencia artificial. In: 4° Seminario de aprendizaje computacional, pp. 31–32 (2018)

    Google Scholar 

  6. Pichardo Lagunas, O., Martínez Seis, B.C., Carrera Trejo, V.: Interrogando datos en legislación Ambiental. Suplemento Científico de La Jornada Veracruz: El Jarocho Cuántico, no. 14, p. 6 (2020)

    Google Scholar 

  7. Rollins, J.B.: Metodología Fundamental para la Ciencia de Datos (2015)

    Google Scholar 

  8. IBM: IBM Analytics Solutions Unified Method (ASUM) (2015). http://i2t.icesi.edu.co/ASUM-DM_External/index.htm#cognos.external.asum-DM_Teaser/deliveryprocesses/ASUM-DM_8A5C87D5.html

  9. Microsoft: What is the Team Data Science Process? (2015). https://docs.microsoft.com/en-us/azure/architecture/data-science-process/overview

  10. Plotnikova, V., Dumas, M., Milani, F.: Adaptations of data mining methodologies: a systematic literature review. Peer J. Comput. Sci. 6, 1–43 (2020)

    Google Scholar 

  11. Plotnikova, V., Dumas, M., Milani, F.P.: Applying the CRISP-DM data mining process in the financial services industry: elicitation of adaptation requirements. Data Knowl. Eng. 139, 102013 (2022)

    Article  Google Scholar 

  12. Chapman, P., et al.: CRISP-DM 1.0 Step-by-step data mining guide (2000)

    Google Scholar 

  13. Nuggets, K.D.: What main methodology are you using for your analytics, data mining, or data science projects? kdnuggets.com (2014). https://www.kdnuggets.com/polls/2014/analytics-data-mining-data-science-methodology.html

  14. Marbán, O., Mariscal, G., Segovia, J.: A data mining & knowledge discovery process model. In: Ponce, J., Karahoca, A., (eds.) Data Mining and Knowledge Discovery in Real Life Applications, Vienna, p. 438. I-Tech, Austria (2009)

    Google Scholar 

  15. Martínez-Plumed, F., et al.: CRISP-DM twenty years later: from data mining processes to data science trajectories. IEEE Trans. Knowl. Data Eng. 33(8), 3048–3061 (2021)

    Article  Google Scholar 

  16. Fois, G, Agüero Crovella, G.A., Britos, P.V.: Evaluación comparativa de las metodologías Team Data Science Process TDSP y Analytics Solutions Unified Method for Data Mining ASUM-DM desde la perspectiva de la ciencia de datos. In: Cuarta, E. Serna M., (Ed.) Investigación Formativa en Ingeniería, Medellín - Antioquia: Editorial IAI, pp. 264–270 (2020)

    Google Scholar 

  17. McGeoch, C.C.: A Guide to Experimental Algoritmics, 1st edn. Cambridge University Press, Cambridge (2012)

    Book  Google Scholar 

  18. Lagunas Rivera, A.R., del Alcalde, M.: Colección del Poder Judicial del Estado de Oaxaca. Escuela Judicial - Consejo de la Judicatura (2016)

    Google Scholar 

  19. de Diputados, C.: Leyes Federales Vigentes. LXV Legislatura (2022). https://www.diputados.gob.mx/LeyesBiblio/index.htm

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miguel Ángel Hidalgo Reyes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Díaz Álvarez, Y., Hidalgo Reyes, M.Á., Lagunes Barradas, V., Pichardo Lagunas, O., Martínez Seis, B. (2022). A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws. In: Pichardo Lagunas, O., Martínez-Miranda, J., Martínez Seis, B. (eds) Advances in Computational Intelligence. MICAI 2022. Lecture Notes in Computer Science(), vol 13613. Springer, Cham. https://doi.org/10.1007/978-3-031-19496-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19496-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19495-5

  • Online ISBN: 978-3-031-19496-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics