Skip to main content

MEDLINE Text Mining: An Enhancement Genetic Algorithm Based Approach for Document Clustering

  • Chapter
  • First Online:

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 96))

Abstract

MEDLINE is the largest biomedical literature database. It is updated daily with 200–4,000 citations. This permanent growth induces the need of a good MEDLINE abstract clustering to accelerate the procedure of research and information retrieval. Several works have been developed in this context, but clustering MEDLINE abstracts are still an area where researchers are trying to propose new approaches to better clustering. Over the last few years, evolutionary algorithms have been widely applied to clustering problems because of their ability to avoid local optimal solutions and converge to a global one. In this article, a new approach is proposed for clustering MEDLINE abstracts based on an extension of an evolutionary algorithm which is the genetic algorithm combined with a Vector Space Model and an agglomerative algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Fiszman, M., Demner-Fushman, D., Kilicoglu, H., Rindflesch, T.: Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J. Biomed. Inf. 42(5), 801–813 (1999)

    Article  Google Scholar 

  2. Ikeda, N., Araki, T., Dey, N., Bose, S., Shafique, S., El-Baz, A., Cuadrado Godia, E., Anzidei, M., Saba, L., Suri, J.S.: Automated and accurate carotid bulb detection, its verification and validation in low quality frozen frames and motion video. Int. Angiol. 3(6), 573–89 (2014)

    Google Scholar 

  3. Bhattacharya, T., Dey, D., Chaudhuri, S.: A novel session based dual steganographic technique using DWT and spread spectrum. Int. J. Mod. Eng. Res. 1(1), 157–161 (2012)

    Google Scholar 

  4. Dey, N., Roy, A., Pal, M., Das, A.: FCM Based blood vessel segmentation method for retinal images. Int. J. Comput. Sci. Netw. 1(3) (2012) (ISSN 2277–5420)

    Google Scholar 

  5. Dey, N., Das, A., Chaudhuri, S.: Wavelet based normal and abnormal heart sound identification using spectrogram analysis. Int. J. Comput. Sci. Eng. Technol. 3(6) (2012) (ISSN: 2229–3345)

    Google Scholar 

  6. Chakraborty, S., Mukherjee, A., Chatterjee, D., Maji, P., Acharjee, S., Dey, N.: A semi-automated system for optic nerve head segmentation in digital retinal images. In: 2014 International Conference on Information Technology, IEEE, Bhubaneswar, pp. 112–117, 22–24 Dec 2014. doi:10.1109/ICIT.2014.51

  7. Feldman, R., Dagan, I.: Knowledge discovery in textual databases (KDT). In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-9), pp. 112–117 (1995)

    Google Scholar 

  8. Navathe, B., Elmasri, R.: Data Warehousing and Data Mining, Fundamentals of Database Systems, pp. 841–872. Pearson Education pvt Inc, Singapore (2000)

    Google Scholar 

  9. Gupta, V., Lehal, G.: A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1) (2009)

    Google Scholar 

  10. Liritano, S., Ruffolo, M.: Managing the knowledge contained in electronic documents: a clustering method for text mining. In: Proceedings of the IEEE 12th International Workshop on Database and Expert Systems Applications, pp. 454–458, Italy (2001)

    Google Scholar 

  11. Jensen, L., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7(2), 119–129 (2006)

    Article  Google Scholar 

  12. Zhu, S., Zeng, S., Mamitsuk, H.: Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity. Bioinf. Adv. Access Published 25(15), 1944–1951 (2009)

    Google Scholar 

  13. Dey, N., Acharjee, S., Biswas, D., Das, A., Chaudhuri, S.: Medical information embedding in compressed watermarked intravascular ultrasound video. Seria Electronica si Telecomunicatii Transactions on Electronics and Communications 57(71) (2012)

    Google Scholar 

  14. Dey, N., Chakraborty, S., Samanta, S.: Optimization of watermarking in biomedical signal. Lambert Academic Publishing, Heinrich-Böcking-Straße 6, 66121. Saarbrücken, Germany (2014) ISBN-13: 978-3-659-46460-7

    Google Scholar 

  15. Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Upper Saddle River (1971)

    Google Scholar 

  16. Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  17. Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Google Scholar 

  18. Vesanto, J., Alhoniemi, E.: Clustering of the self-organizing map. IEEE Trans. Neural Netw. 11(3), 586–600 (2000)

    Article  Google Scholar 

  19. Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Google Scholar 

  20. Duda, R., Hart, P., Stork, D.: Pattern classification. J. Classif. 24(2), 305–307 (2007)

    Article  Google Scholar 

  21. Tayal, M., Raghuwanshi, M.: Review on various clustering methods for the image data. J. Emerg. Trends Comput. Inf. Sci. 2, 34–38 (2010)

    Google Scholar 

  22. Tou, J., Gonzalez, R.: Pattern Recognition Principles. Addison-Wesley Publishing Company, Massachusetts (1974)

    Google Scholar 

  23. George, A.: Efficient high dimension data clustering using constraint-partitioning K-means algorithm. Int. Arab J. Inf. Technol. 10(5), 467–476 (2013)

    Google Scholar 

  24. Koontz, W., Narendra, P., Fukunaga, K.: A branch and bound clustering algorithm. IEEE Trans. Comput. 24(9), 908–915 (1975)

    Google Scholar 

  25. Wolfe, J.: Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5, 329–350 (1970)

    Article  Google Scholar 

  26. Koontz, W., Narendra, P., Fukunaga, K.: A graph theoretic approach to non parametic cluster analysis. IEEE Trans. Comput. C-25, 936–944 (1975)

    Google Scholar 

  27. Yang, X., Guo, D., Cao, X., Zhou, J.: Research on ontology-based text clustering. In: Third International Workshop on Semantic Media Adaptation and Personalization, pp. 141–146, IEEE Computer Society, China (2008)

    Google Scholar 

  28. Hotho, A., Maedche, A., Staab, S.: Text clustering based on good aggregations. Künstliche Intelligenz (KI) 16(4), 48–54 (2002)

    Google Scholar 

  29. Iliopulos, I., Enright, A., Ouzounis, C.: TEXTQUEST: document clustering of MEDLINE abstracts for concept discovery in molecular biology. Pac. Symp. Biocomput. 6, 384–395 (2001)

    Google Scholar 

  30. Chaussabel, D., Sher, A.: Mining microarray expression data by literature profiling. Genome Biol. 3(10) (2002)

    Google Scholar 

  31. Glenisson, P., Coessens, B., Van Vooren, S., Mathys, J., Moreau, Y., De Moor, B.: TXTGate: profiling gene groups with text-based information. Genome Biol. 5, R43 (2004). doi:10.1186/gb-2004-5-6-r43

    Google Scholar 

  32. Liu, Y., Ciliax, B., Borges, K., Dasigi, V., Ram, A., Navathe, S., Dingledine, R.: Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. In: Proceedings of the IEEE Computational Systems Bioinformatics Conference, CSB 2004, pp. 394–404 (2004)

    Google Scholar 

  33. Chang, H., Hsu, C., Deng, Y.: Unsupervised document clustering based on keyword clusters. In: IEEE International Symposium on Communications and Information Technology 2004 (ISCIT 2004), vol. 2, pp. 1198–1203 (2004)

    Google Scholar 

  34. Yoo, I., Hu, X.: A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital libraries, pp. 220–229 (2006)

    Google Scholar 

  35. Zhang, Z., Cheng, H., Zhang, S., Chen, W., Fang, Q.: Clustering aggregation based on genetic algorithm for documents clustering. In: IEEE Congress on Evolutionary Computation, CEC 2008, (IEEE World Congress on Computational Intelligence) pp. 3156–3161 (2008)

    Google Scholar 

  36. Zhang, C.: Document clustering description based on combination strategy, In: Innovative Computing, Information and Control (ICICIC), pp. 1084–1088 (2009)

    Google Scholar 

  37. Zhu, S., Zeng, J., Mamitsuka, H.: Enhancing MEDLINE document clustering by incorporating mesh semantic similarity. Bioinformatics 25(15), 1944–1951 (2009)

    Article  Google Scholar 

  38. He, H., Tan, Y.: A dynamic genetic clustering algorithm for automatic choice of the number of clusters. In: 9th IEEE International Conference on Control and Automation (ICCA), pp. 544–549 (2011)

    Google Scholar 

  39. El-Bathy, N., Azar, G., El-Bathy, M., Stein, G.: Intelligent extended clustering genetic algorithm. In: IEEE International Conference on Electro/Information Technology (EIT), pp. 1–5 (2011)

    Google Scholar 

  40. Pachgade, S., Dhande, S.: Outlier detection over data set using cluster-based and distance based approach. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(6), 12–16 (2012)

    Google Scholar 

  41. Gajawada, S., Toshniwal, D.: A framework for classification using genetic algorithm based clustering. In: 12th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 752–757 (2012)

    Google Scholar 

  42. Gu, J., Feng, W., Zeng, J., Mamitsuka, H., Zhu, S.: Efficient semisupervised MEDLINE document clustering with MeSH-semantic and global-content constraints. IEEE Trans. Cybern. 43(4), 1265–1276 (2013)

    Article  Google Scholar 

  43. Yafooz, W., Abidin, S., Omar, N., Halim, R.: Dynamic semantic textual document clustering using frequent terms and named entity. In: IEEE 3rd International Conference on System Engineering and Technology (ICSET), pp. 336–340 (2013)

    Google Scholar 

  44. Gu, J., Feng, W., Zeng, J., Mamitsuka, H., Zhu, S.: Efficient semisupervised MEDLINE document clustering with MeSH-semantic and global-content constraints. IEEE Trans. Cybern. 43(4) (2013)

    Google Scholar 

  45. Xiaoping, S.: Textual document clustering using topic models. In: Semantics, Knowledge and Grids (SKG), pp. 1–4 (2014)

    Google Scholar 

  46. Bharathi, B., Vijayan, A.: Clustering of Biomedical documents using semi supervised clustering method. Int. J. Comput. Sci. Inf. Technol. 5(1), 661–664 (2014)

    Google Scholar 

  47. Selim, S., Ismail, M.: K-means-type algorithm: generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. 6(1), 81–87 (1986)

    Google Scholar 

  48. Song, W., Li, C., Park, S.: Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Syst. Appl. 36, 9095–9104 (2009)

    Article  Google Scholar 

  49. Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998). ISBN 0471971502

    Google Scholar 

  50. Hruschka, E., Campello, R., Freitas, A., Carvalho, A.: A survey of evolutionary algorithms for clustering. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 39(2), 133–155 (2009)

    Google Scholar 

  51. Raghavan, V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22 (1979)

    Google Scholar 

  52. Santimetvirul, C., Willett, P.: Non-hierarchic document clustering using a genetic algorithm. Inf. Res. 1(1) (1995)

    Google Scholar 

  53. Dey, N., Samanta, S., Yang, X., Das, A., Chaudhuri, S.: Optimisation of scaling factors in electrocardiogram signal watermarking using cuckoo search. Int. J. Bio-Inspired Comput. 5(5), 315–326 (2013)

    Article  Google Scholar 

  54. Merkl, D.: Text mining with self-organizing maps. Handbook of Data Mining and Knowledge, pp. 903–910. Oxford University Press, Inc., New York (2002)

    Google Scholar 

  55. Day, N., Samanta, S., Chakraborty, S., Das, A., Chaudhuri, S., Suri, J.: Firefly algorithm for optimization of scaling factors during embedding of manifold medical information: an application in ophthalmology imaging. J. Med. Imaging Health Inf. 4(3), 384–394 (2014)

    Article  Google Scholar 

  56. Dey, N., Mukhopadhyay, S., Das, A., Chaudhuri, S.: Using DWT analysis of P, QRS and T components and cardiac output modified by blind watermarking technique within the electrocardiogram signal for authentication in the wireless telecardiology. Int. J. Image Graphics Signal Proces. 7, 33–46 (2012) (ISSN:2074–9074)

    Google Scholar 

  57. Dey, N., Nandi, B., Roy, A., Biswas, D., Das, A., Chaudhuri, S.: Analysis of Blood Cell Smears using Stationary Wavelet Transform and Harris Corner Detection, Published by Recent Advances in Computer Vision and Image Processing, Methodologies and Applications, pp. 357–370 (2013)

    Google Scholar 

  58. Choukikar, P., Patel, A., Mishra, R.: Segmenting the optic disc in retinal images using thresholding. Int. J. Comput. Appl. 94(11), 6–10 (2014)

    Google Scholar 

  59. Araki, T., Ikeda, N., Dey, N., Acharjee, S., Molinari, F., Saba, L., Godia, E.C., Nicolaides, A., Suri, J.S.: Shape-based approach for coronary calcium lesion volume measurement on intravascular ultrasound imaging and its association with carotid intima-media thickness. J Ultrasound Med. 34(3), 469–82 (2015). doi:10.7863/ultra.34.3.469

    Article  Google Scholar 

  60. Samanta, S., Ahmed, S., Salem, M., Nath, S., Dey, N., Chowdhury, S.: Haralick features based automated glaucoma classification using back propagation neural network. In: Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol. 327, pp. 351–358 (2015)

    Google Scholar 

  61. Araki, T., Ikeda, N., Dey, N., Chakraborty, S., Saba, L., Kumar, D., Godia, E.C., Xiaoyi J., Gupta, A., Radeva, P., Laird, J., Nicolaides, A., Suri, J.: A comparative approach of four different imageregistration techniques for quantitative assessment of coronary artery calcium lesionsusing intravascular ultrasound. Comput. Methods Programs Biomed. II8, 158–172 (2015)

    Google Scholar 

  62. Salton, G., McGill, M.: Introduction to Modern Information Retrieval, Computer Science, Series. McGraw-Hill, Inc., New York (1986)

    Google Scholar 

  63. Huang, X., Zheng, X., Yuan, W., Wang, F., Zhu, S.: Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization. Inf. Sci. 181(111), 2293–2302 (2011)

    Article  Google Scholar 

  64. Yoo, I., Xiaohua, H.: Biomedical ontology MeSH improves document clustering qualify on MEDLINE articles: a comparison study. In: 19th IEEE International Symposium on Computer-Based Medical Systems, CBMS 2006, pp. 577–582 (2006)

    Google Scholar 

  65. Manicassamy, J., Dhavachelvan, P.: Rank based clustering for document retrieval from biomedical databases. In. J. Comput. Sci. Eng. 1(2), 111–115 (2009)

    Google Scholar 

  66. Zhang, X., Jing, L., Hu, X., Ng, M., Zhou, X.: A Comparative Study of Ontology Based Term Similarity Measures on Pubmed Document Clustering, vol. 4443, pp. 115–126. Springer, Berlin/Heidelberg (2007)

    Google Scholar 

  67. Kuncheva, L., Bezdek, J.: Nearest prototype classification: Clustering, genetic algorithms or random search. IEEE Trans. Syst. Man Cybern. Part B 28(1), 160–164 (1998)

    Google Scholar 

  68. Krishna, K., Murty, M.: Genetic K-means algorithm. IEEE Trans. Syst. Man Cybern. Part B 29(9), 433–439 (1999)

    Google Scholar 

  69. Fränti, P.: Genetic algorithm with deterministic crossover for vector quantization. Pattern Recognit. Lett. 21(1), 61–68 (2000)

    Google Scholar 

  70. Mitra, S.: An evolutionary rough partitive clustering. Pattern Recognit. Lett. 25, 1439–1449 (2004)

    Article  Google Scholar 

  71. Martnez-Otzeta, J., Sierra, B., Lazkano, E., Astigarraga, A.: Classifier hierarchy learning by means of genetic algorithms. Pattern Recognit. Lett. 27(16), 1998–2004 (2006)

    Article  Google Scholar 

  72. Lukasova, A.: Hierarchical agglomerative clustering procedure. Pattern Recognit. 11, 365–381 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  73. Maulik, U., Bandyopadhyay, S.: Genetic algorithm based clustering technique. Pattern Recognit. 33(9), 1455–1460 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nilanjan Dey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Karaa, W.B.A., Ashour, A.S., Sassi, D.B., Roy, P., Kausar, N., Dey, N. (2016). MEDLINE Text Mining: An Enhancement Genetic Algorithm Based Approach for Document Clustering. In: Hassanien, AE., Grosan, C., Fahmy Tolba, M. (eds) Applications of Intelligent Optimization in Biology and Medicine. Intelligent Systems Reference Library, vol 96. Springer, Cham. https://doi.org/10.1007/978-3-319-21212-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21212-8_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21211-1

  • Online ISBN: 978-3-319-21212-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics