Abstract
This article focuses on the one hand, on showing some techniques applied during the preprocessing of texts represented by environmental laws of Mexico. The need to carry out this type of analysis is due to several factors such as: the large number of existing legislative documents such as laws, programs, regulations, etc., the modifications that are made to the legal system due to reforms and decrees, and especially, to those possible contradictions that may arise among one or more laws. On the other hand, certain tasks of the CRISP-DM methodology were selected and, specifically, for the data preparation phase in the generic tasks of selection, cleaning, transformation, and formatting. This was done using the NLTK library through text preprocessing techniques of tokenization, segmentation, denoising and normalization. Among the most remarkable results there is a combination between CRISP-DM and Team Data Science Process by Microsoft oriented to the preprocessing of Mexican federal environmental laws. In addition, this article shows a detailed application of the hybrid methodology with the execution of a specialized task related to the extraction of text from a pdf file using the PyPDF2 and Pdfplumber libraries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pyle, D.: Data Preparation for Data Mining, 1st edn. Morgan Kaufmann Pub, Burlington (1999)
Medina Palmeros, F.: Leyes ambientales son letra muerta, acusa biólogo. Diario de Xalapa (2017)
Flores, D.: Exigen ejecutar leyes ambientales. Diario de Xalapa (2019)
Narave Flores, H., Cházaro Basañez, M.J., Arzaba Villalba, C.: La Paila un proyecto ambientalmente inviable: necesidad de fortalecer legislación de protección Ambiental. In: Aguilar López, M., Canales Espinosa, M., Domínguez González, N., Ojeda Jimeno, A., (eds.) En defensa del patrimonio natural y cultural de Veracruz. El caso del proyecto de la Mina La Paila, Municipio de Alto Lucero, Veracruz, Secretaría de Medio Ambiente del Estado de Veracruz/Universidad Veracruzana, pp. 29–40 (2018)
Hidalgo Reyes, M.A.: Ayudando al medio ambiente con inteligencia artificial. In: 4° Seminario de aprendizaje computacional, pp. 31–32 (2018)
Pichardo Lagunas, O., Martínez Seis, B.C., Carrera Trejo, V.: Interrogando datos en legislación Ambiental. Suplemento Científico de La Jornada Veracruz: El Jarocho Cuántico, no. 14, p. 6 (2020)
Rollins, J.B.: Metodología Fundamental para la Ciencia de Datos (2015)
IBM: IBM Analytics Solutions Unified Method (ASUM) (2015). http://i2t.icesi.edu.co/ASUM-DM_External/index.htm#cognos.external.asum-DM_Teaser/deliveryprocesses/ASUM-DM_8A5C87D5.html
Microsoft: What is the Team Data Science Process? (2015). https://docs.microsoft.com/en-us/azure/architecture/data-science-process/overview
Plotnikova, V., Dumas, M., Milani, F.: Adaptations of data mining methodologies: a systematic literature review. Peer J. Comput. Sci. 6, 1–43 (2020)
Plotnikova, V., Dumas, M., Milani, F.P.: Applying the CRISP-DM data mining process in the financial services industry: elicitation of adaptation requirements. Data Knowl. Eng. 139, 102013 (2022)
Chapman, P., et al.: CRISP-DM 1.0 Step-by-step data mining guide (2000)
Nuggets, K.D.: What main methodology are you using for your analytics, data mining, or data science projects? kdnuggets.com (2014). https://www.kdnuggets.com/polls/2014/analytics-data-mining-data-science-methodology.html
Marbán, O., Mariscal, G., Segovia, J.: A data mining & knowledge discovery process model. In: Ponce, J., Karahoca, A., (eds.) Data Mining and Knowledge Discovery in Real Life Applications, Vienna, p. 438. I-Tech, Austria (2009)
Martínez-Plumed, F., et al.: CRISP-DM twenty years later: from data mining processes to data science trajectories. IEEE Trans. Knowl. Data Eng. 33(8), 3048–3061 (2021)
Fois, G, Agüero Crovella, G.A., Britos, P.V.: Evaluación comparativa de las metodologías Team Data Science Process TDSP y Analytics Solutions Unified Method for Data Mining ASUM-DM desde la perspectiva de la ciencia de datos. In: Cuarta, E. Serna M., (Ed.) Investigación Formativa en Ingeniería, Medellín - Antioquia: Editorial IAI, pp. 264–270 (2020)
McGeoch, C.C.: A Guide to Experimental Algoritmics, 1st edn. Cambridge University Press, Cambridge (2012)
Lagunas Rivera, A.R., del Alcalde, M.: Colección del Poder Judicial del Estado de Oaxaca. Escuela Judicial - Consejo de la Judicatura (2016)
de Diputados, C.: Leyes Federales Vigentes. LXV Legislatura (2022). https://www.diputados.gob.mx/LeyesBiblio/index.htm
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Díaz Álvarez, Y., Hidalgo Reyes, M.Á., Lagunes Barradas, V., Pichardo Lagunas, O., Martínez Seis, B. (2022). A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws. In: Pichardo Lagunas, O., Martínez-Miranda, J., Martínez Seis, B. (eds) Advances in Computational Intelligence. MICAI 2022. Lecture Notes in Computer Science(), vol 13613. Springer, Cham. https://doi.org/10.1007/978-3-031-19496-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-19496-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19495-5
Online ISBN: 978-3-031-19496-2
eBook Packages: Computer ScienceComputer Science (R0)