Abstract
MEDLINE is the largest biomedical literature database. It is updated daily with 200–4,000 citations. This permanent growth induces the need of a good MEDLINE abstract clustering to accelerate the procedure of research and information retrieval. Several works have been developed in this context, but clustering MEDLINE abstracts are still an area where researchers are trying to propose new approaches to better clustering. Over the last few years, evolutionary algorithms have been widely applied to clustering problems because of their ability to avoid local optimal solutions and converge to a global one. In this article, a new approach is proposed for clustering MEDLINE abstracts based on an extension of an evolutionary algorithm which is the genetic algorithm combined with a Vector Space Model and an agglomerative algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Fiszman, M., Demner-Fushman, D., Kilicoglu, H., Rindflesch, T.: Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J. Biomed. Inf. 42(5), 801–813 (1999)
Ikeda, N., Araki, T., Dey, N., Bose, S., Shafique, S., El-Baz, A., Cuadrado Godia, E., Anzidei, M., Saba, L., Suri, J.S.: Automated and accurate carotid bulb detection, its verification and validation in low quality frozen frames and motion video. Int. Angiol. 3(6), 573–89 (2014)
Bhattacharya, T., Dey, D., Chaudhuri, S.: A novel session based dual steganographic technique using DWT and spread spectrum. Int. J. Mod. Eng. Res. 1(1), 157–161 (2012)
Dey, N., Roy, A., Pal, M., Das, A.: FCM Based blood vessel segmentation method for retinal images. Int. J. Comput. Sci. Netw. 1(3) (2012) (ISSN 2277–5420)
Dey, N., Das, A., Chaudhuri, S.: Wavelet based normal and abnormal heart sound identification using spectrogram analysis. Int. J. Comput. Sci. Eng. Technol. 3(6) (2012) (ISSN: 2229–3345)
Chakraborty, S., Mukherjee, A., Chatterjee, D., Maji, P., Acharjee, S., Dey, N.: A semi-automated system for optic nerve head segmentation in digital retinal images. In: 2014 International Conference on Information Technology, IEEE, Bhubaneswar, pp. 112–117, 22–24 Dec 2014. doi:10.1109/ICIT.2014.51
Feldman, R., Dagan, I.: Knowledge discovery in textual databases (KDT). In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-9), pp. 112–117 (1995)
Navathe, B., Elmasri, R.: Data Warehousing and Data Mining, Fundamentals of Database Systems, pp. 841–872. Pearson Education pvt Inc, Singapore (2000)
Gupta, V., Lehal, G.: A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1) (2009)
Liritano, S., Ruffolo, M.: Managing the knowledge contained in electronic documents: a clustering method for text mining. In: Proceedings of the IEEE 12th International Workshop on Database and Expert Systems Applications, pp. 454–458, Italy (2001)
Jensen, L., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7(2), 119–129 (2006)
Zhu, S., Zeng, S., Mamitsuk, H.: Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity. Bioinf. Adv. Access Published 25(15), 1944–1951 (2009)
Dey, N., Acharjee, S., Biswas, D., Das, A., Chaudhuri, S.: Medical information embedding in compressed watermarked intravascular ultrasound video. Seria Electronica si Telecomunicatii Transactions on Electronics and Communications 57(71) (2012)
Dey, N., Chakraborty, S., Samanta, S.: Optimization of watermarking in biomedical signal. Lambert Academic Publishing, Heinrich-Böcking-Straße 6, 66121. Saarbrücken, Germany (2014) ISBN-13: 978-3-659-46460-7
Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Upper Saddle River (1971)
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Vesanto, J., Alhoniemi, E.: Clustering of the self-organizing map. IEEE Trans. Neural Netw. 11(3), 586–600 (2000)
Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Duda, R., Hart, P., Stork, D.: Pattern classification. J. Classif. 24(2), 305–307 (2007)
Tayal, M., Raghuwanshi, M.: Review on various clustering methods for the image data. J. Emerg. Trends Comput. Inf. Sci. 2, 34–38 (2010)
Tou, J., Gonzalez, R.: Pattern Recognition Principles. Addison-Wesley Publishing Company, Massachusetts (1974)
George, A.: Efficient high dimension data clustering using constraint-partitioning K-means algorithm. Int. Arab J. Inf. Technol. 10(5), 467–476 (2013)
Koontz, W., Narendra, P., Fukunaga, K.: A branch and bound clustering algorithm. IEEE Trans. Comput. 24(9), 908–915 (1975)
Wolfe, J.: Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5, 329–350 (1970)
Koontz, W., Narendra, P., Fukunaga, K.: A graph theoretic approach to non parametic cluster analysis. IEEE Trans. Comput. C-25, 936–944 (1975)
Yang, X., Guo, D., Cao, X., Zhou, J.: Research on ontology-based text clustering. In: Third International Workshop on Semantic Media Adaptation and Personalization, pp. 141–146, IEEE Computer Society, China (2008)
Hotho, A., Maedche, A., Staab, S.: Text clustering based on good aggregations. Künstliche Intelligenz (KI) 16(4), 48–54 (2002)
Iliopulos, I., Enright, A., Ouzounis, C.: TEXTQUEST: document clustering of MEDLINE abstracts for concept discovery in molecular biology. Pac. Symp. Biocomput. 6, 384–395 (2001)
Chaussabel, D., Sher, A.: Mining microarray expression data by literature profiling. Genome Biol. 3(10) (2002)
Glenisson, P., Coessens, B., Van Vooren, S., Mathys, J., Moreau, Y., De Moor, B.: TXTGate: profiling gene groups with text-based information. Genome Biol. 5, R43 (2004). doi:10.1186/gb-2004-5-6-r43
Liu, Y., Ciliax, B., Borges, K., Dasigi, V., Ram, A., Navathe, S., Dingledine, R.: Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. In: Proceedings of the IEEE Computational Systems Bioinformatics Conference, CSB 2004, pp. 394–404 (2004)
Chang, H., Hsu, C., Deng, Y.: Unsupervised document clustering based on keyword clusters. In: IEEE International Symposium on Communications and Information Technology 2004 (ISCIT 2004), vol. 2, pp. 1198–1203 (2004)
Yoo, I., Hu, X.: A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital libraries, pp. 220–229 (2006)
Zhang, Z., Cheng, H., Zhang, S., Chen, W., Fang, Q.: Clustering aggregation based on genetic algorithm for documents clustering. In: IEEE Congress on Evolutionary Computation, CEC 2008, (IEEE World Congress on Computational Intelligence) pp. 3156–3161 (2008)
Zhang, C.: Document clustering description based on combination strategy, In: Innovative Computing, Information and Control (ICICIC), pp. 1084–1088 (2009)
Zhu, S., Zeng, J., Mamitsuka, H.: Enhancing MEDLINE document clustering by incorporating mesh semantic similarity. Bioinformatics 25(15), 1944–1951 (2009)
He, H., Tan, Y.: A dynamic genetic clustering algorithm for automatic choice of the number of clusters. In: 9th IEEE International Conference on Control and Automation (ICCA), pp. 544–549 (2011)
El-Bathy, N., Azar, G., El-Bathy, M., Stein, G.: Intelligent extended clustering genetic algorithm. In: IEEE International Conference on Electro/Information Technology (EIT), pp. 1–5 (2011)
Pachgade, S., Dhande, S.: Outlier detection over data set using cluster-based and distance based approach. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(6), 12–16 (2012)
Gajawada, S., Toshniwal, D.: A framework for classification using genetic algorithm based clustering. In: 12th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 752–757 (2012)
Gu, J., Feng, W., Zeng, J., Mamitsuka, H., Zhu, S.: Efficient semisupervised MEDLINE document clustering with MeSH-semantic and global-content constraints. IEEE Trans. Cybern. 43(4), 1265–1276 (2013)
Yafooz, W., Abidin, S., Omar, N., Halim, R.: Dynamic semantic textual document clustering using frequent terms and named entity. In: IEEE 3rd International Conference on System Engineering and Technology (ICSET), pp. 336–340 (2013)
Gu, J., Feng, W., Zeng, J., Mamitsuka, H., Zhu, S.: Efficient semisupervised MEDLINE document clustering with MeSH-semantic and global-content constraints. IEEE Trans. Cybern. 43(4) (2013)
Xiaoping, S.: Textual document clustering using topic models. In: Semantics, Knowledge and Grids (SKG), pp. 1–4 (2014)
Bharathi, B., Vijayan, A.: Clustering of Biomedical documents using semi supervised clustering method. Int. J. Comput. Sci. Inf. Technol. 5(1), 661–664 (2014)
Selim, S., Ismail, M.: K-means-type algorithm: generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. 6(1), 81–87 (1986)
Song, W., Li, C., Park, S.: Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Syst. Appl. 36, 9095–9104 (2009)
Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998). ISBN 0471971502
Hruschka, E., Campello, R., Freitas, A., Carvalho, A.: A survey of evolutionary algorithms for clustering. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 39(2), 133–155 (2009)
Raghavan, V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22 (1979)
Santimetvirul, C., Willett, P.: Non-hierarchic document clustering using a genetic algorithm. Inf. Res. 1(1) (1995)
Dey, N., Samanta, S., Yang, X., Das, A., Chaudhuri, S.: Optimisation of scaling factors in electrocardiogram signal watermarking using cuckoo search. Int. J. Bio-Inspired Comput. 5(5), 315–326 (2013)
Merkl, D.: Text mining with self-organizing maps. Handbook of Data Mining and Knowledge, pp. 903–910. Oxford University Press, Inc., New York (2002)
Day, N., Samanta, S., Chakraborty, S., Das, A., Chaudhuri, S., Suri, J.: Firefly algorithm for optimization of scaling factors during embedding of manifold medical information: an application in ophthalmology imaging. J. Med. Imaging Health Inf. 4(3), 384–394 (2014)
Dey, N., Mukhopadhyay, S., Das, A., Chaudhuri, S.: Using DWT analysis of P, QRS and T components and cardiac output modified by blind watermarking technique within the electrocardiogram signal for authentication in the wireless telecardiology. Int. J. Image Graphics Signal Proces. 7, 33–46 (2012) (ISSN:2074–9074)
Dey, N., Nandi, B., Roy, A., Biswas, D., Das, A., Chaudhuri, S.: Analysis of Blood Cell Smears using Stationary Wavelet Transform and Harris Corner Detection, Published by Recent Advances in Computer Vision and Image Processing, Methodologies and Applications, pp. 357–370 (2013)
Choukikar, P., Patel, A., Mishra, R.: Segmenting the optic disc in retinal images using thresholding. Int. J. Comput. Appl. 94(11), 6–10 (2014)
Araki, T., Ikeda, N., Dey, N., Acharjee, S., Molinari, F., Saba, L., Godia, E.C., Nicolaides, A., Suri, J.S.: Shape-based approach for coronary calcium lesion volume measurement on intravascular ultrasound imaging and its association with carotid intima-media thickness. J Ultrasound Med. 34(3), 469–82 (2015). doi:10.7863/ultra.34.3.469
Samanta, S., Ahmed, S., Salem, M., Nath, S., Dey, N., Chowdhury, S.: Haralick features based automated glaucoma classification using back propagation neural network. In: Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol. 327, pp. 351–358 (2015)
Araki, T., Ikeda, N., Dey, N., Chakraborty, S., Saba, L., Kumar, D., Godia, E.C., Xiaoyi J., Gupta, A., Radeva, P., Laird, J., Nicolaides, A., Suri, J.: A comparative approach of four different imageregistration techniques for quantitative assessment of coronary artery calcium lesionsusing intravascular ultrasound. Comput. Methods Programs Biomed. II8, 158–172 (2015)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval, Computer Science, Series. McGraw-Hill, Inc., New York (1986)
Huang, X., Zheng, X., Yuan, W., Wang, F., Zhu, S.: Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization. Inf. Sci. 181(111), 2293–2302 (2011)
Yoo, I., Xiaohua, H.: Biomedical ontology MeSH improves document clustering qualify on MEDLINE articles: a comparison study. In: 19th IEEE International Symposium on Computer-Based Medical Systems, CBMS 2006, pp. 577–582 (2006)
Manicassamy, J., Dhavachelvan, P.: Rank based clustering for document retrieval from biomedical databases. In. J. Comput. Sci. Eng. 1(2), 111–115 (2009)
Zhang, X., Jing, L., Hu, X., Ng, M., Zhou, X.: A Comparative Study of Ontology Based Term Similarity Measures on Pubmed Document Clustering, vol. 4443, pp. 115–126. Springer, Berlin/Heidelberg (2007)
Kuncheva, L., Bezdek, J.: Nearest prototype classification: Clustering, genetic algorithms or random search. IEEE Trans. Syst. Man Cybern. Part B 28(1), 160–164 (1998)
Krishna, K., Murty, M.: Genetic K-means algorithm. IEEE Trans. Syst. Man Cybern. Part B 29(9), 433–439 (1999)
Fränti, P.: Genetic algorithm with deterministic crossover for vector quantization. Pattern Recognit. Lett. 21(1), 61–68 (2000)
Mitra, S.: An evolutionary rough partitive clustering. Pattern Recognit. Lett. 25, 1439–1449 (2004)
Martnez-Otzeta, J., Sierra, B., Lazkano, E., Astigarraga, A.: Classifier hierarchy learning by means of genetic algorithms. Pattern Recognit. Lett. 27(16), 1998–2004 (2006)
Lukasova, A.: Hierarchical agglomerative clustering procedure. Pattern Recognit. 11, 365–381 (1979)
Maulik, U., Bandyopadhyay, S.: Genetic algorithm based clustering technique. Pattern Recognit. 33(9), 1455–1460 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Karaa, W.B.A., Ashour, A.S., Sassi, D.B., Roy, P., Kausar, N., Dey, N. (2016). MEDLINE Text Mining: An Enhancement Genetic Algorithm Based Approach for Document Clustering. In: Hassanien, AE., Grosan, C., Fahmy Tolba, M. (eds) Applications of Intelligent Optimization in Biology and Medicine. Intelligent Systems Reference Library, vol 96. Springer, Cham. https://doi.org/10.1007/978-3-319-21212-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-21212-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21211-1
Online ISBN: 978-3-319-21212-8
eBook Packages: EngineeringEngineering (R0)