Abstract
This article shows the use of different techniques for the extraction of information through text mining. Through this implementation, the performance of each of the techniques in the dataset analysis process can be identified, which allows the reader to recommend the most appropriate technique for the processing of this type of data. This article shows the implementation of the K-means algorithm to determine the location of the news described in RSS format and the results of this type of grouping through a descriptive analysis of the resulting clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Palechor, F., De la hoz manotas, A., De la hoz franco, E., Colpas, P: Feature selection, learning metrics and dimension reduction in training and classification processes in intrusion detection systems. J. Theor. Appl. Inf. Technol. 82(2) (2015)
Calabria-Sarmiento, J.C., et al.: Software applications to health sector: a systematic review of literature (2018)
Sen, T., Ali, M.R., Hoque, M.E., Epstein, R., Duberstein, P.: Modeling doctor-patient communication with affective text analysis. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 170–177. IEEE (2017)
Jeon, S.W., Lee, H.J., Cho, S.: Building industry network based on business text: corporate disclosures and news. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 4696–4704. IEEE (2017)
Irfan, M., Zulfikar, W.B.: Implementation of fuzzy C-Means algorithm and TF-IDF on English journal summary. In: 2017 Second International Conference on Informatics and Computing (ICIC), pp. 1–5. IEEE (2017)
De-La-Hoz-Franco, E., Ariza-Colpas, P., Quero, J.M., Espinilla, M.: Sensor-based datasets for human activity recognition–a systematic review of literature. IEEE Access 6, 59192–59210 (2018)
Zhang, X., Yu, Q.: Hotel reviews sentiment analysis based on word vector clustering. In: 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA), pp. 260–264. IEEE (2017)
Vieira, A.S., Borrajo, L., Iglesias, E.L.: Improving the text classification using clustering and a novel HMM to reduce the dimensionality. Comput. Methods Programs Biomed. 136, 119–130 (2016)
Wu, H., Zou, B., Zhao, Y.Q., Chen, Z., Zhu, C., Guo, J.: Natural scene text detection by multi-scale adaptive color clustering and non-text filtering. Neurocomputing 214, 1011–1025 (2016)
Palechor, F.M., De la Hoz Manotas, A., Colpas, P.A., Ojeda, J.S., Ortega, R.M., Melo, M.P.: Cardiovascular disease analysis using supervised and unsupervised data mining techniques. JSW 12(2), 81–90 (2017)
Aradhya, V.M., Pavithra, M.S.: A comprehensive of transforms, Gabor filter and k-means clustering for text detection in images and video. Appl. Comput. Inform. (2014)
Bharti, K.K., Singh, P.K.: Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl. Soft Comput. 43, 20–34 (2016)
Li, C.H.: Confirmatory factor analysis with ordinal data: comparing robust maximum likelihood and diagonally weighted least squares. Behav. Res. Methods 48(3), 936–949 (2016)
Melissa, A., François, R., Mohamed, N.: Graph modularity maximization as an effective method for co-clustering text data. Knowl.-Based Syst. 109(1), 160–173 (2016)
Mendoza-Palechor, F.E., Ariza-Colpas, P.P., Sepulveda-Ojeda, J.A., De-la-Hoz-Manotas, A., Piñeres Melo, M.: Fertility analysis method based on supervised and unsupervised data mining techniques (2016)
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)
Shafiabady, N., Lee, L.H., Rajkumar, R., Kallimani, V.P., Akram, N.A., Isa, D.: Using unsupervised clustering approach to train the Support Vector Machine for text classification. Neurocomputing 211, 4–10 (2016)
Zhang, W., Tang, X., Yoshida, T.: Tesc: an approach to text classification using semi-supervised clustering. Knowl.-Based Syst. 75, 152–160 (2015)
De França, F.O.: A hash-based co-clustering algorithm for categorical data. arXiv preprint arXiv:1407.7753 (2014)
Echeverri-Ocampo, I., Urina-Triana, M., Patricia Ariza, P., Mantilla, M.: El trabajo colaborativo entre ingenieros y personal de la salud para el desarrollo de proyectos en salud digital: una visión al futuro para lograr tener éxito (2018)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Drineas, P., Frieze, A.M., Kannan, R., Vempala, S., Vinay, V.: Clustering in large graphs and matrices. In: SODA, vol. 99, pp. 291–299 (1999)
Meila, M., Shi, J.: Learning segmentation by random walks. In: NIPS, pp. 873–879 (2000)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data (1988)
Guerrero Cuentas, H.R., Polo Mercado, S.S., Martinez Royert, J.C., Ariza Colpas, P.P.: Trabajo colaborativo como estrategia didáctica para el desarrollo del pensamiento crítico (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ariza-Colpas, P., Oviedo-Carrascal, A.I., De-la-hoz-Franco, E. (2019). Using K-Means Algorithm for Description Analysis of Text in RSS News Format. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2019. Communications in Computer and Information Science, vol 1071. Springer, Singapore. https://doi.org/10.1007/978-981-32-9563-6_17
Download citation
DOI: https://doi.org/10.1007/978-981-32-9563-6_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9562-9
Online ISBN: 978-981-32-9563-6
eBook Packages: Computer ScienceComputer Science (R0)