Abstract
In this paper, we present a system using computational linguistic techniques to extract metadata for image access. We discuss the implementation, functionality and evaluation of an image catalogers’ toolkit, developed in the Computational Linguistics for Metadata Building (CLiMB) research project. We have tested components of the system, including phrase finding for the art and architecture domain, functional semantic labeling using machine learning, and disambiguation of terms in domain-specific text vis a vis a rich thesaurus of subject terms, geographic and artist names. We present specific results on disambiguation techniques and on the nature of the ambiguity problem given the thesaurus, resources, and domain-specific text resource, with a comparison of domain-general resources and text. Our primary user group for evaluation has been the cataloger expert with specific expertise in the fields of painting, sculpture, and vernacular and landscape architecture.





Similar content being viewed by others
Notes
Some examples include OntoImage’2006—First International “Language Resources for Content-Based Image Retrieval” Workshop, held in conjunction with the Language Resources and Evaluation Conference (LREC) 2006, http://www.lrec-conf.org/lrec2006; OntoImage’2008—Second 2nd International “Language Resources for Content-Based Image Retrieval” Workshop, held in conjunction with LREC’2008, http://www.dfki.de/∼declerck/ontoimage.html; workshops on computational linguistics for image access held at the Visual Resources Association annual meetings, 2006, 2007, 2008, http://www.vraweb.org.
One such project, T 3 : Text, Tagging and Trust to Improve Image Access for Museums and Libraries, has just been funded from the Institute for Museum and Library Science, imls.gov.
Some metadata standards mentioned in Baca 2003 were: Categories for the Description of Works of Art (CDWA) from the Getty Research Institute and Cataloging Cultural Objects (CCO) from the Visual Resources Association.
Notable controlled vocabularies noted in Baca 2003 were: Library of Congress Subject Headings; Library of Congress Name Authority File; the Getty Vocabularies; Thesaurus for Graphic Materials I and II.
Both the tagger and parser are available at: http://nlp.stanford.edu/software.
Lucene is a search engine library: http://lucene.apache.org.
Getty resources can be accessed at: http://getty.edu/research/conducting_research/vocabularies/aat.
According to the documentation on the TGN, natural order refers to searching on the most common order of a name, e.g. Al-Hoceima, whereas inverted order would be Hoceima, Al-.
Steve: The Museum Social Tagging Project. http://www.steve.museum.
Luis von Ahn: The ESP Game at Games with a Purpose (GWAP).
Jennifer Golbeck: FilmTrust. http://www.mindswap.org.
References
Anderson JD, Perez-Carballo J (2001) The nature of indexing: how humans and machines analyze messages and texts for retrieval. Part I: research, and the nature of human indexing. Inf Process Manag 37:231–254
Anderson JD, Perez-Carballo J (2001) The nature of indexing: how humans and machines analyze messages and texts for retrieval—part II: machine indexing, and the allocation of human versus machine effort. Inf Process Manag 37:255–277
Baca M (2003) Practical issues in applying metadata schemas and controlled vocabularies to cultural heritage information. Cat Classif Q 36(3/4):47–55
Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp 805–810
Barnard K, Forsyth DA (2001) Learning the semantics of words and pictures. Proceedings of International Conference on Computer Vision, pp 408–415
Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21(4):543–565
Charniak E (1997) Statistical techniques for natural language parsing. AI Mag 18(4):33–44
Chen H (2001) An analysis of image retrieval tasks in the field of art history. Inf Process Manag 37:701–720
Choi Y, Rasmussen E (2003) Searching for images: the analysis of users’ queries for image retrieval in American history. J Am Soc Inf Sci Technol 54:498–511
Church KW (1988) A stochastic parts program and noun phrase parser for unrestricted text. Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 9–12 February, pp 136–143
Collins K (1998) Providing subject access to images: a study of user queries. Am Arch 61:36–55
Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):5–60
Demner-Fushman D (2008) Combining medical domain ontological knowledge and low-level image features for multimedia indexing. OntoImage 2008: 2nd International Language Resources for Content-Based Image Retrieval Workshop in conjunction with LREC’2008, pp 18–23
Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT, Cambridge, MA
Gale W, Church K, Yarowsky D (1993) A method for disambiguation word senses in a large corpus. Computers and Humanities 26:415–439
Grishman R, Sundheim B (Eds) (1995) Design of the MUC-6 evaluation. Sixth Message Understanding Conference (MUC-6), NIST, Morgan-Kaufmann, Columbia, MD, pp 1–11
Hatzivassiloglou V, Klavans JL, Eskin E (1999) Detecting text similarity over short passages: exploring linguistic feature combinations via machine learning. Proceedings of Empirical Methods in Natural Language Processing (EMNLP) and Very Large Corpora, MD, USA, pp 203–212
Hatzivassiloglou V, Gravano L, Maganti A (2000) An investigation of linguistic features and clustering algorithms for topical document clustering. Proceedings of the Annual Meeting of ACM-SIGIR, pp 224–231
Hearst M (1997) TextTiling: segmenting text into multi-paragraph subtopic passages. Comput Linguist 23(1):33–64
Kan M, Klavans JL, McKeown KR (1998) Linear segmentation and segment relevance. Proceedings of the 6th International Workshop of Very Large Corpora (WVLC-6), Montréal, Québec, Canada, pp 197–205
Keister LH (1994) User types and queries: impact on image access systems. In: Fidel R, Hahn TB, Rasmussen E, Smith PJ (eds) Challenges in indexing electronic text and images. Learned Information for the American Society of Information Science, Medford, pp 7–22
Klavans JL, Chodorow MS, Wacholder N (1990) From dictionary to knowledge base via taxonomy. Proceedings of the sixth conference of the University of Waterloo Centre for the New Oxford English Dictionary and Text Research: Electronic Text Research, University of Waterloo, Waterloo, Canada, pp 110–132
Klavans JL, Tzoukermann E (1996) Dictionaries and corpora: combining corpus and machine-readable dictionary data for building bilingual lexicons. Journal of Machine Translation 10(3–4):185–218
Klein S, Simmons RF (1963) A computational approach to grammatical coding of English words. J Assoc Comput Mach 10(3):334–347
Lesk M (1986) Automatic sense disambiguation: how to tell a pine cone from an ice cream cone. Proceedings of the 1986 ACM SIGDOC Conference, pp 24–26
Lew MS (2000) Next-generation web searches for visual content. IEEE Computer 33:46–53
Maron ME (1961) Automatic indexing: an experimental inquiry. J Assoc Comput Mach 8(3):404–417
Palmer M, Ng HT, Dang HT (2006) Evaluation. In: Edmonds P, Agirre E (eds) Word sense disambiguation: algorithms, applications, and trends. text, speech, and language technology series. Kluwer, The Netherlands
Panofsky E (1962) Studies in iconology: humanistic themes in the art of the renaissance. Harper & Row, New York
Passonneau R, Yano T, Lippincott T, Klavans J (2008) Functional semantic categories for art history text: human labeling and preliminary machine learning. Proceedings of the 3rd International Conference on Computer Vision Theory and Applications, Workshop on Metadata Mining for Image Understanding, pp 13–22
Pastra K, Saggion H, Wilks Y (2003) Intelligent indexing of crime-scene photographs. IEEE Intell Syst Their Appl 18(1):55–61
Patwardhan S, Banerjee S, Pedersen T (2003) Using measures of semantic relatedness for word sense disambiguation. Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, pp 241–257
Rasmussen EM (1997) Indexing images. Annu Rev Inf Sci Technol 32:169–196
Resnik R (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130
Rorissa A, Iyer H (2008) Theories of cognition and image categorization: what category labels reveal about basic level theory. J Am Soc Inf Sci Technol 59(9):1383–1392
Shatford S (1986) Analyzing the subject of a picture: a theoretical approach. Cat Classif Q 6(3):39–62
Sidhu T, Klavans JL, Lin J (2007) Concept disambiguation for improved subject access using multiple knowledge sources. Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTech 2007), 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp 25–32
Tibbo HR (1994) Indexing for the humanities. J Am Soc Inf Sci 45(8):607–619
Wilks Y, Catizone R (2002) What is lexical tuning? J Semant 19(2):167–190
Yang Y, Liu X (1999) A re-examination of text categorization methods. Proceedings of the 22nd Annual International ACM SIGIR, pp 42–49
Yarowsky D (1994) Decision lists for lexical ambiguity resolution. Proceedings of ACL-94, Las Cruces, NM, pp 88–95
Yarowsky D (1992) Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. Proceedings of COLING’92 Conference, pp 454–460
Acknowledgements
We acknowledge the Program Office for Scholarly Communications of the Andrew W. Mellon Foundation, especially Don Waters and Suzanne Lodato; Dr. Murtha Baca, director of the Getty Vocabulary Program and Digital Resource Management, Getty Research Institute for providing us with research access to resources; cataloging and domain expert Angela Giral; collections partners, including Jeff Cohen, Bryn Mawr College and University of Pennsylvania for the vernacular architecture collection; Jack Sullivan, University of Maryland for landscape architecture; the Senate Museum and Library; and ARTStor. Finally, Joan Beaudoin (Drexel), Laura Jaeneman (Drexel), and Brooke Rosenblatt (the Phillips Gallery) helped with annotation, collections and user studies.
Author information
Authors and Affiliations
Corresponding author
Additional information
This project, funded by the Andrew W. Mellon Foundation, was initiated at the Center for Research on Information Access at Columbia University and is currently based at the University of Maryland.
Rights and permissions
About this article
Cite this article
Klavans, J.L., Sheffield, C., Abels, E. et al. Computational linguistics for metadata building (CLiMB): using text mining for the automatic identification, categorization, and disambiguation of subject terms for image metadata. Multimed Tools Appl 42, 115–138 (2009). https://doi.org/10.1007/s11042-008-0253-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-008-0253-9