MEDLINE Text Mining: An Enhancement Genetic Algorithm Based Approach for Document Clustering

Karaa, Wahiba Ben Abdessalem; Ashour, Amira S.; Sassi, Dhekra Ben; Roy, Payel; Kausar, Noreen; Dey, Nilanjan

doi:10.1007/978-3-319-21212-8_12

MEDLINE Text Mining: An Enhancement Genetic Algorithm Based Approach for Document Clustering

Wahiba Ben Abdessalem Karaa⁶,
Amira S. Ashour⁷,
Dhekra Ben Sassi⁸,
Payel Roy⁹,
Noreen Kausar¹⁰ &
…
Nilanjan Dey¹¹

Chapter
First Online: 01 January 2015

1355 Accesses
35 Citations

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 96))

Abstract

MEDLINE is the largest biomedical literature database. It is updated daily with 200–4,000 citations. This permanent growth induces the need of a good MEDLINE abstract clustering to accelerate the procedure of research and information retrieval. Several works have been developed in this context, but clustering MEDLINE abstracts are still an area where researchers are trying to propose new approaches to better clustering. Over the last few years, evolutionary algorithms have been widely applied to clustering problems because of their ability to avoid local optimal solutions and converge to a global one. In this article, a new approach is proposed for clustering MEDLINE abstracts based on an extension of an evolutionary algorithm which is the genetic algorithm combined with a Vector Space Model and an agglomerative algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Fiszman, M., Demner-Fushman, D., Kilicoglu, H., Rindflesch, T.: Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J. Biomed. Inf. 42(5), 801–813 (1999)
Article Google Scholar
Ikeda, N., Araki, T., Dey, N., Bose, S., Shafique, S., El-Baz, A., Cuadrado Godia, E., Anzidei, M., Saba, L., Suri, J.S.: Automated and accurate carotid bulb detection, its verification and validation in low quality frozen frames and motion video. Int. Angiol. 3(6), 573–89 (2014)
Google Scholar
Bhattacharya, T., Dey, D., Chaudhuri, S.: A novel session based dual steganographic technique using DWT and spread spectrum. Int. J. Mod. Eng. Res. 1(1), 157–161 (2012)
Google Scholar
Dey, N., Roy, A., Pal, M., Das, A.: FCM Based blood vessel segmentation method for retinal images. Int. J. Comput. Sci. Netw. 1(3) (2012) (ISSN 2277–5420)
Google Scholar
Dey, N., Das, A., Chaudhuri, S.: Wavelet based normal and abnormal heart sound identification using spectrogram analysis. Int. J. Comput. Sci. Eng. Technol. 3(6) (2012) (ISSN: 2229–3345)
Google Scholar
Chakraborty, S., Mukherjee, A., Chatterjee, D., Maji, P., Acharjee, S., Dey, N.: A semi-automated system for optic nerve head segmentation in digital retinal images. In: 2014 International Conference on Information Technology, IEEE, Bhubaneswar, pp. 112–117, 22–24 Dec 2014. doi:10.1109/ICIT.2014.51
Feldman, R., Dagan, I.: Knowledge discovery in textual databases (KDT). In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-9), pp. 112–117 (1995)
Google Scholar
Navathe, B., Elmasri, R.: Data Warehousing and Data Mining, Fundamentals of Database Systems, pp. 841–872. Pearson Education pvt Inc, Singapore (2000)
Google Scholar
Gupta, V., Lehal, G.: A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1) (2009)
Google Scholar
Liritano, S., Ruffolo, M.: Managing the knowledge contained in electronic documents: a clustering method for text mining. In: Proceedings of the IEEE 12th International Workshop on Database and Expert Systems Applications, pp. 454–458, Italy (2001)
Google Scholar
Jensen, L., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7(2), 119–129 (2006)
Article Google Scholar
Zhu, S., Zeng, S., Mamitsuk, H.: Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity. Bioinf. Adv. Access Published 25(15), 1944–1951 (2009)
Google Scholar
Dey, N., Acharjee, S., Biswas, D., Das, A., Chaudhuri, S.: Medical information embedding in compressed watermarked intravascular ultrasound video. Seria Electronica si Telecomunicatii Transactions on Electronics and Communications 57(71) (2012)
Google Scholar
Dey, N., Chakraborty, S., Samanta, S.: Optimization of watermarking in biomedical signal. Lambert Academic Publishing, Heinrich-Böcking-Straße 6, 66121. Saarbrücken, Germany (2014) ISBN-13: 978-3-659-46460-7
Google Scholar
Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Upper Saddle River (1971)
Google Scholar
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Google Scholar
Vesanto, J., Alhoniemi, E.: Clustering of the self-organizing map. IEEE Trans. Neural Netw. 11(3), 586–600 (2000)
Article Google Scholar
Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern classification. J. Classif. 24(2), 305–307 (2007)
Article Google Scholar
Tayal, M., Raghuwanshi, M.: Review on various clustering methods for the image data. J. Emerg. Trends Comput. Inf. Sci. 2, 34–38 (2010)
Google Scholar
Tou, J., Gonzalez, R.: Pattern Recognition Principles. Addison-Wesley Publishing Company, Massachusetts (1974)
Google Scholar
George, A.: Efficient high dimension data clustering using constraint-partitioning K-means algorithm. Int. Arab J. Inf. Technol. 10(5), 467–476 (2013)
Google Scholar
Koontz, W., Narendra, P., Fukunaga, K.: A branch and bound clustering algorithm. IEEE Trans. Comput. 24(9), 908–915 (1975)
Google Scholar
Wolfe, J.: Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5, 329–350 (1970)
Article Google Scholar
Koontz, W., Narendra, P., Fukunaga, K.: A graph theoretic approach to non parametic cluster analysis. IEEE Trans. Comput. C-25, 936–944 (1975)
Google Scholar
Yang, X., Guo, D., Cao, X., Zhou, J.: Research on ontology-based text clustering. In: Third International Workshop on Semantic Media Adaptation and Personalization, pp. 141–146, IEEE Computer Society, China (2008)
Google Scholar
Hotho, A., Maedche, A., Staab, S.: Text clustering based on good aggregations. Künstliche Intelligenz (KI) 16(4), 48–54 (2002)
Google Scholar
Iliopulos, I., Enright, A., Ouzounis, C.: TEXTQUEST: document clustering of MEDLINE abstracts for concept discovery in molecular biology. Pac. Symp. Biocomput. 6, 384–395 (2001)
Google Scholar
Chaussabel, D., Sher, A.: Mining microarray expression data by literature profiling. Genome Biol. 3(10) (2002)
Google Scholar
Glenisson, P., Coessens, B., Van Vooren, S., Mathys, J., Moreau, Y., De Moor, B.: TXTGate: profiling gene groups with text-based information. Genome Biol. 5, R43 (2004). doi:10.1186/gb-2004-5-6-r43
Google Scholar
Liu, Y., Ciliax, B., Borges, K., Dasigi, V., Ram, A., Navathe, S., Dingledine, R.: Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. In: Proceedings of the IEEE Computational Systems Bioinformatics Conference, CSB 2004, pp. 394–404 (2004)
Google Scholar
Chang, H., Hsu, C., Deng, Y.: Unsupervised document clustering based on keyword clusters. In: IEEE International Symposium on Communications and Information Technology 2004 (ISCIT 2004), vol. 2, pp. 1198–1203 (2004)
Google Scholar
Yoo, I., Hu, X.: A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital libraries, pp. 220–229 (2006)
Google Scholar
Zhang, Z., Cheng, H., Zhang, S., Chen, W., Fang, Q.: Clustering aggregation based on genetic algorithm for documents clustering. In: IEEE Congress on Evolutionary Computation, CEC 2008, (IEEE World Congress on Computational Intelligence) pp. 3156–3161 (2008)
Google Scholar
Zhang, C.: Document clustering description based on combination strategy, In: Innovative Computing, Information and Control (ICICIC), pp. 1084–1088 (2009)
Google Scholar
Zhu, S., Zeng, J., Mamitsuka, H.: Enhancing MEDLINE document clustering by incorporating mesh semantic similarity. Bioinformatics 25(15), 1944–1951 (2009)
Article Google Scholar
He, H., Tan, Y.: A dynamic genetic clustering algorithm for automatic choice of the number of clusters. In: 9th IEEE International Conference on Control and Automation (ICCA), pp. 544–549 (2011)
Google Scholar
El-Bathy, N., Azar, G., El-Bathy, M., Stein, G.: Intelligent extended clustering genetic algorithm. In: IEEE International Conference on Electro/Information Technology (EIT), pp. 1–5 (2011)
Google Scholar
Pachgade, S., Dhande, S.: Outlier detection over data set using cluster-based and distance based approach. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(6), 12–16 (2012)
Google Scholar
Gajawada, S., Toshniwal, D.: A framework for classification using genetic algorithm based clustering. In: 12th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 752–757 (2012)
Google Scholar
Gu, J., Feng, W., Zeng, J., Mamitsuka, H., Zhu, S.: Efficient semisupervised MEDLINE document clustering with MeSH-semantic and global-content constraints. IEEE Trans. Cybern. 43(4), 1265–1276 (2013)
Article Google Scholar
Yafooz, W., Abidin, S., Omar, N., Halim, R.: Dynamic semantic textual document clustering using frequent terms and named entity. In: IEEE 3rd International Conference on System Engineering and Technology (ICSET), pp. 336–340 (2013)
Google Scholar
Gu, J., Feng, W., Zeng, J., Mamitsuka, H., Zhu, S.: Efficient semisupervised MEDLINE document clustering with MeSH-semantic and global-content constraints. IEEE Trans. Cybern. 43(4) (2013)
Google Scholar
Xiaoping, S.: Textual document clustering using topic models. In: Semantics, Knowledge and Grids (SKG), pp. 1–4 (2014)
Google Scholar
Bharathi, B., Vijayan, A.: Clustering of Biomedical documents using semi supervised clustering method. Int. J. Comput. Sci. Inf. Technol. 5(1), 661–664 (2014)
Google Scholar
Selim, S., Ismail, M.: K-means-type algorithm: generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. 6(1), 81–87 (1986)
Google Scholar
Song, W., Li, C., Park, S.: Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Syst. Appl. 36, 9095–9104 (2009)
Article Google Scholar
Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998). ISBN 0471971502
Google Scholar
Hruschka, E., Campello, R., Freitas, A., Carvalho, A.: A survey of evolutionary algorithms for clustering. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 39(2), 133–155 (2009)
Google Scholar
Raghavan, V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22 (1979)
Google Scholar
Santimetvirul, C., Willett, P.: Non-hierarchic document clustering using a genetic algorithm. Inf. Res. 1(1) (1995)
Google Scholar
Dey, N., Samanta, S., Yang, X., Das, A., Chaudhuri, S.: Optimisation of scaling factors in electrocardiogram signal watermarking using cuckoo search. Int. J. Bio-Inspired Comput. 5(5), 315–326 (2013)
Article Google Scholar
Merkl, D.: Text mining with self-organizing maps. Handbook of Data Mining and Knowledge, pp. 903–910. Oxford University Press, Inc., New York (2002)
Google Scholar
Day, N., Samanta, S., Chakraborty, S., Das, A., Chaudhuri, S., Suri, J.: Firefly algorithm for optimization of scaling factors during embedding of manifold medical information: an application in ophthalmology imaging. J. Med. Imaging Health Inf. 4(3), 384–394 (2014)
Article Google Scholar
Dey, N., Mukhopadhyay, S., Das, A., Chaudhuri, S.: Using DWT analysis of P, QRS and T components and cardiac output modified by blind watermarking technique within the electrocardiogram signal for authentication in the wireless telecardiology. Int. J. Image Graphics Signal Proces. 7, 33–46 (2012) (ISSN:2074–9074)
Google Scholar
Dey, N., Nandi, B., Roy, A., Biswas, D., Das, A., Chaudhuri, S.: Analysis of Blood Cell Smears using Stationary Wavelet Transform and Harris Corner Detection, Published by Recent Advances in Computer Vision and Image Processing, Methodologies and Applications, pp. 357–370 (2013)
Google Scholar
Choukikar, P., Patel, A., Mishra, R.: Segmenting the optic disc in retinal images using thresholding. Int. J. Comput. Appl. 94(11), 6–10 (2014)
Google Scholar
Araki, T., Ikeda, N., Dey, N., Acharjee, S., Molinari, F., Saba, L., Godia, E.C., Nicolaides, A., Suri, J.S.: Shape-based approach for coronary calcium lesion volume measurement on intravascular ultrasound imaging and its association with carotid intima-media thickness. J Ultrasound Med. 34(3), 469–82 (2015). doi:10.7863/ultra.34.3.469
Article Google Scholar
Samanta, S., Ahmed, S., Salem, M., Nath, S., Dey, N., Chowdhury, S.: Haralick features based automated glaucoma classification using back propagation neural network. In: Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol. 327, pp. 351–358 (2015)
Google Scholar
Araki, T., Ikeda, N., Dey, N., Chakraborty, S., Saba, L., Kumar, D., Godia, E.C., Xiaoyi J., Gupta, A., Radeva, P., Laird, J., Nicolaides, A., Suri, J.: A comparative approach of four different imageregistration techniques for quantitative assessment of coronary artery calcium lesionsusing intravascular ultrasound. Comput. Methods Programs Biomed. II8, 158–172 (2015)
Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval, Computer Science, Series. McGraw-Hill, Inc., New York (1986)
Google Scholar
Huang, X., Zheng, X., Yuan, W., Wang, F., Zhu, S.: Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization. Inf. Sci. 181(111), 2293–2302 (2011)
Article Google Scholar
Yoo, I., Xiaohua, H.: Biomedical ontology MeSH improves document clustering qualify on MEDLINE articles: a comparison study. In: 19th IEEE International Symposium on Computer-Based Medical Systems, CBMS 2006, pp. 577–582 (2006)
Google Scholar
Manicassamy, J., Dhavachelvan, P.: Rank based clustering for document retrieval from biomedical databases. In. J. Comput. Sci. Eng. 1(2), 111–115 (2009)
Google Scholar
Zhang, X., Jing, L., Hu, X., Ng, M., Zhou, X.: A Comparative Study of Ontology Based Term Similarity Measures on Pubmed Document Clustering, vol. 4443, pp. 115–126. Springer, Berlin/Heidelberg (2007)
Google Scholar
Kuncheva, L., Bezdek, J.: Nearest prototype classification: Clustering, genetic algorithms or random search. IEEE Trans. Syst. Man Cybern. Part B 28(1), 160–164 (1998)
Google Scholar
Krishna, K., Murty, M.: Genetic K-means algorithm. IEEE Trans. Syst. Man Cybern. Part B 29(9), 433–439 (1999)
Google Scholar
Fränti, P.: Genetic algorithm with deterministic crossover for vector quantization. Pattern Recognit. Lett. 21(1), 61–68 (2000)
Google Scholar
Mitra, S.: An evolutionary rough partitive clustering. Pattern Recognit. Lett. 25, 1439–1449 (2004)
Article Google Scholar
Martnez-Otzeta, J., Sierra, B., Lazkano, E., Astigarraga, A.: Classifier hierarchy learning by means of genetic algorithms. Pattern Recognit. Lett. 27(16), 1998–2004 (2006)
Article Google Scholar
Lukasova, A.: Hierarchical agglomerative clustering procedure. Pattern Recognit. 11, 365–381 (1979)
Article MathSciNet MATH Google Scholar
Maulik, U., Bandyopadhyay, S.: Genetic algorithm based clustering technique. Pattern Recognit. 33(9), 1455–1460 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and IT, Taif University, Taif, Saudi Arabia
Wahiba Ben Abdessalem Karaa
Department of Electronics and Electrical Communications Engineering, Faculty of Engineering, Tanta University, Tanta, Egypt
Amira S. Ashour
High Institute of Management of Tunis, Tunis, Tunisia
Dhekra Ben Sassi
Department of CA, JIS College of Engineering, Kalyani, West Bengal, India
Payel Roy
Malaysia University of Science and Technology, Petaling Jaya, Selangor, Malaysia
Noreen Kausar
Department of ETCE, Jadavpur University, Kolkata, West Bengal, India
Nilanjan Dey

Authors

Wahiba Ben Abdessalem Karaa
View author publications
You can also search for this author in PubMed Google Scholar
Amira S. Ashour
View author publications
You can also search for this author in PubMed Google Scholar
Dhekra Ben Sassi
View author publications
You can also search for this author in PubMed Google Scholar
Payel Roy
View author publications
You can also search for this author in PubMed Google Scholar
Noreen Kausar
View author publications
You can also search for this author in PubMed Google Scholar
Nilanjan Dey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nilanjan Dey .

Editor information

Editors and Affiliations

Information Technology Department, Cairo University, Giza, Egypt
Aboul-Ella Hassanien
Department of Information Systems and Computing, Brunel University, London, United Kingdom
Crina Grosan
Faculty of Computers and information, Ain Shams University, Cairo, Egypt
Mohamed Fahmy Tolba

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Karaa, W.B.A., Ashour, A.S., Sassi, D.B., Roy, P., Kausar, N., Dey, N. (2016). MEDLINE Text Mining: An Enhancement Genetic Algorithm Based Approach for Document Clustering. In: Hassanien, AE., Grosan, C., Fahmy Tolba, M. (eds) Applications of Intelligent Optimization in Biology and Medicine. Intelligent Systems Reference Library, vol 96. Springer, Cham. https://doi.org/10.1007/978-3-319-21212-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-21212-8_12
Published: 19 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21211-1
Online ISBN: 978-3-319-21212-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics