Large image modality labeling initiative using semi-supervised and optimized clustering

Vajda, Szilárd; You, Daekeun; Antani, Sameer; Thoma, George

doi:10.1007/s13735-015-0078-z

Large image modality labeling initiative using semi-supervised and optimized clustering

Regular Paper
Published: 17 March 2015

Volume 4, pages 143–151, (2015)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Szilárd Vajda¹,
Daekeun You¹,
Sameer Antani¹ &
…
George Thoma¹

158 Accesses
3 Citations
Explore all metrics

Abstract

Medical image modality detection is a key step for indexing images from biomedical articles. Traditionally, complex supervised classification methods have been used for this. However, they rely on proportionally sized labeled training samples. With the increase in availability of image data it has become increasingly challenging to obtain reasonably accurate manual labels to train classifiers. Toward meeting this shortcoming, we propose a semi-automatic labeling strategy that reduces the human annotator effort. Each image is projected into several feature spaces, and each entry in these spaces is clustered in an unsupervised manner. The cluster centers for each feature representation are then labeled by a human annotator, and the labels propagated through each cluster. To find the optimal cluster numbers for each feature space, a so-called “jump” method is used. The final label of an image is decided by a voting scheme that summarizes the different opinions on the same image provided by the different feature representations. The proposed method is evaluated on ImageCLEFmed2012 data set containing approximately 300,000 images, and showed that annotating \(<\)1 % of the data is sufficient to label correctly 49.95 % of the images. The method spared approximately 700 h of human annotation labor and associated costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi–supervised Learning for Image Modality Classification

A semi-supervised medical image classification method based on combined pseudo-labeling and distance metric consistency

Article 16 September 2023

Interactive cross and multimodal biomedical image retrieval based on automatic region-of-interest (ROI) identification and classification

Article 13 May 2014

Notes

References

Chatzichristofis SA, Boutalis YS (2008) Cedd: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In: Proceedings of the 6th international conference on computer vision systems, ICVS’08Springer. Berlin, Heidelberg, pp 312–322
Foundation AS. http://lucene.apache.org/core/index.html
Fritzke B (1995) A growing neural gas network learns topologies. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems, vol 7. MIT Press, Cambridge, pp 625–632
Google Scholar
He J, Tan AH, Tan CL, Sung SY (2003) On quantitative evaluation of clustering systems. Kluwer Academic Publishers, Boston
Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Kahn CE, Rubin DL (2009) Automated semantic indexing of figure captions to improve radiology image retrieval. J Am Med Inform Assoc 16:380–386
Article Google Scholar
Kohonen T, Schroeder MR, Huang TS (eds) (2001) Self-organizing maps, 3rd edn. Springer-Verlag New York Inc, Secaucus
Krishnamachari S, Yamada A, Abdel-Mottaleb M, Kasutani E (2000) Multimedia content filtering, browsing, and matching using MPEG-7 compact color descriptors. In: Laurini R (ed) Advances in visual information systems, vol 1929., Lecture notes in computer scienceSpringer, Berlin Heidelberg, pp 200–211
Chapter Google Scholar
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, New York
Book Google Scholar
Li J, Mouchère H, Viard-Gaudin C (2014) An annotation assistance system using an unsupervised codebook composed of handwritten graphical multi-stroke symbols. Pattern Recogn Lett 35:46–57
Article Google Scholar
Montage Healthcare Solutions I, Yottalook. http://www.yottalook.com/
Müller H, de Herrera AGS, Kalpathy-Cramer J, Demner-Fushman D, Antani S, Eggel I (2012) Overview of the ImageCLEF 2012 medical image retrieval and classification tasks
Müller H, Kalpathy-Cramer J, Demner-Fushman D, Antani S (2012) Creating a classification of image types in the medical literature for visual categorization. In: SPIE medical imaging
Park DK, Jeon YS, Won CS (2000) Efficient use of local edge histogram descriptor. In:Proceedings of the 2000 ACM workshops on multimedia., Multimedia ’00ACM, New York, NY, USA, pp 51–54
Rahman M, You D, Simpson M, Antani SK, Demner-Fushman D, Thoma GR (2013) Multimodal biomedical image retrieval using hierarchical classification and modality fusion. Int J Multimed Inform Retriev 2(3):159–173
Article Google Scholar
Richarz J, Vajda S, Grzeszick R, Fink GA (2014) Semi-supervised learning for character recognition in historical archive documents. Pattern Recogn 47(3):1011–1020
Article Google Scholar
Rokach L (2009) Pattern classification using ensemble methods, series in machine perception and artificial intelligence. World Scientific Publishing Company, Singapore
Google Scholar
Settles B (2009) Active learning literature survey. Tech. Rep. 1648, University of Wisconsin-Madison
Simpson MS, Rahman MM, Phadnis S, Apostolova E, Demner-Fushman D, Antani S, Thoma GR (2011) Text and content-based approaches to image modality classification and retrieval for the imageclef 2011 medical retrieval track. In: CLEF (Notebook Papers/Labs/Workshop)
Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750–763
Article MATH MathSciNet Google Scholar
Toselli AH, Romero V, Pastor M, Vidal E (2010) Multimodal interactive transcription of text images. Pattern Recogn 43(5):1814–1825
Article MATH Google Scholar
Vajda S, Junaidi A, Fink GA (2011) A semi-supervised ensemble learning approach for character labeling with minimal human effort. In: ICDAR, pp 259–263 (2011)
You D, Rahman MM, Antani S, Demner-Fushman D, Thoma GR (2013) Text- and content-based biomedical image modality classification. In: Proceedings of SPIE medical imaging, pp 86740L–86740L–8
Zhou ZH (2009) When semi-supervised learning meets ensemble learning. In: MCS, pp 529–538 (2009)

Download references

Acknowledgments

This research is supported by the Intramural Research Program of the National Institutes of Health (NIH), National Library of Medicine, and Lister Hill National Center for Biomedical Communications (LHNCBC).

Author information

Authors and Affiliations

National Library of Medicine, National Institutes of Health, Maryland, USA
Szilárd Vajda, Daekeun You, Sameer Antani & George Thoma

Authors

Szilárd Vajda
View author publications
You can also search for this author in PubMed Google Scholar
Daekeun You
View author publications
You can also search for this author in PubMed Google Scholar
Sameer Antani
View author publications
You can also search for this author in PubMed Google Scholar
George Thoma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Szilárd Vajda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vajda, S., You, D., Antani, S. et al. Large image modality labeling initiative using semi-supervised and optimized clustering. Int J Multimed Info Retr 4, 143–151 (2015). https://doi.org/10.1007/s13735-015-0078-z

Download citation

Received: 31 August 2014
Revised: 09 January 2015
Accepted: 27 February 2015
Published: 17 March 2015
Issue Date: June 2015
DOI: https://doi.org/10.1007/s13735-015-0078-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large image modality labeling initiative using semi-supervised and optimized clustering

Abstract

Access this article

Similar content being viewed by others

Semi–supervised Learning for Image Modality Classification

A semi-supervised medical image classification method based on combined pseudo-labeling and distance metric consistency

Interactive cross and multimodal biomedical image retrieval based on automatic region-of-interest (ROI) identification and classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large image modality labeling initiative using semi-supervised and optimized clustering

Abstract

Access this article

Similar content being viewed by others

Semi–supervised Learning for Image Modality Classification

A semi-supervised medical image classification method based on combined pseudo-labeling and distance metric consistency

Interactive cross and multimodal biomedical image retrieval based on automatic region-of-interest (ROI) identification and classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation