skip to main content
10.1145/1743384.1743441acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
poster

Wikipedia-assisted concept thesaurus for better web media understanding

Published: 29 March 2010 Publication History

Abstract

Concept ontology has been used in the area of artificial intelligence, biomedical informatics and library science and it has been shown as an effective approach to better understand data in the respective domains. One main difficulty that hedge against the development of ontology approaches is the extra work required in ontology construction and annotation. With the emergent lexical dictionaries and encyclopedias such as WordNet, Wikipedia, innovations from different directions have been proposed to automatically extract concept ontologies. Unfortunately, many of the proposed ontologies are not fully exploited according to the general human knowledge. We study the various knowledge sources and aim to build a construct scalable concept thesaurus suitable for better understanding of media in the World Wide Web from Wikipedia. With its wide concept coverage, finely organized categories, diverse concept relations, and up-to-date information, the collaborative encyclopedia Wikipedia has almost all the requisite attributes to contribute to a well-defined concept ontology. Besides the explicit concept relations such as disambiguation, synonymy, Wikipedia also provides implicit concept relations through cross-references between articles. In our previous work, we have built ontology with explicit relations from Wikipedia page contents. Even though the method works, mining explicit semantic relations from every Wikipedia concept page content has unsolved scalable issue when more concepts are involved. This paper describes our attempt to automatically build a concept thesaurus, which encodes both explicit and implicit semantic relations for a large-scale of concepts from Wikipedia. Our proposed thesaurus construction takes advantage of both structure and content features of the downloaded Wikipedia database, and defines concept entries with its related concepts and relations. This thesaurus is further used to exploit semantics from web page context to build a more semantic meaningful space. We move a step forward to combine the similarity distance from the image feature space to boost the performance. We evaluate our approach through application of the constructed concept thesaurus to web image retrieval. The results show that it is possible to use implicit semantic relations to improve the retrieval performance.

References

[1]
T. Gevers and A. W. M. Smeulders. Color-based object recognition. Pattern Recognition, 32(3):453--464, 1999.
[2]
J. Hu, L. Fang, Y. Cao, H.-J. Zeng, H. Li, Q. Yang, and Z. Chen. Enhancing text clustering by leveraging wikipedia semantics. In Proceedings of the Thirty-first Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 179--186, 2008.
[3]
D. Lenat and R. Guha. Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA, 1989.
[4]
M. Marszalek and C. Schmid. Semantic hierarchies for visual object recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1--7, 2007.
[5]
P. Martin. Using the wordnet concept catalog and a relation hierarchy for knowledge acquisition. 1995.
[6]
K. Mikolajczyk and C. Schmid. Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1):63--86, 2004.
[7]
S. Ponzetto and M. Strube. Deriving a large scale taxonomy from wikipedia. pages 22--26, 2007.
[8]
A. Popescu, P.-A. Mo-ellic, and C. Millet. Semretriev: an ontology driven image retrieval system. In Proceedings of the Sixth ACM International Conference on Image and Video Retrieval, pages 113--116, 2007.
[9]
J. van de Weijer and C. Schmid. Coloring local feature extraction. In Proceedings of the Ninth European Conference on Computer Vision, pages 334--348, 2006.
[10]
V. Varma. Building large scale ontology networks. Language Engineering Conference, 2002. Proceedings, pages 121--127, 2002.
[11]
H. Wang, X. Jiang, L.-T. Chia, and A.-H. Tan. Ontology enhanced web image retrieval: aided by wikipedia & spreading activation theory. In MIR '08: Proceeding of the 1st ACM international conference on Multimedia information retrieval, pages 195--201, New York, NY, USA, 2008. ACM.
[12]
P. Wang, J. Hu, H.-J. Zeng, L. C. 0002, and Z. Chen. Improving text classification by using encyclopedia knowledge. In Proceedings of the Seventh IEEE International Conference on Data Mining, pages 332--341, 2007.
[13]
X.-Y. Wei and C.-W. Ngo. Ontology-enriched semantic space for video search. In Proceedings of the Fifteenth International Conference on Multimedia, pages 981--990, 2007.
[14]
T. Zesch and I. Gurevych. Analysis of the Wikipedia Category Graph for NLP Applications. In Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing, pages 1--8, 2007.

Cited By

View all
  • (2020)Multimedia context interpretation: a semantics-based cooperative indexing approachNew Review of Hypermedia and Multimedia10.1080/13614568.2020.174590426:1-2(24-54)Online publication date: 31-Mar-2020
  • (2016)Multiple Ontology-Based Indexing of Multimedia Documents on the World Wide WebIntelligent Decision Technologies 201610.1007/978-3-319-39627-9_5(51-62)Online publication date: 9-Jun-2016
  • (2013)Bilddiskurse in den Wikimedia CommonsDie Dynamik sozialer und sprachlicher Netzwerke10.1007/978-3-531-93336-8_13(285-310)Online publication date: 2013
  • Show More Cited By

Index Terms

  1. Wikipedia-assisted concept thesaurus for better web media understanding

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MIR '10: Proceedings of the international conference on Multimedia information retrieval
    March 2010
    600 pages
    ISBN:9781605588155
    DOI:10.1145/1743384

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 March 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. semantic salient concept
    2. web image retrieval

    Qualifiers

    • Poster

    Conference

    MIR '10
    Sponsor:
    MIR '10: International Conference on Multimedia Information Retrieval
    March 29 - 31, 2010
    Pennsylvania, Philadelphia, USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Multimedia context interpretation: a semantics-based cooperative indexing approachNew Review of Hypermedia and Multimedia10.1080/13614568.2020.174590426:1-2(24-54)Online publication date: 31-Mar-2020
    • (2016)Multiple Ontology-Based Indexing of Multimedia Documents on the World Wide WebIntelligent Decision Technologies 201610.1007/978-3-319-39627-9_5(51-62)Online publication date: 9-Jun-2016
    • (2013)Bilddiskurse in den Wikimedia CommonsDie Dynamik sozialer und sprachlicher Netzwerke10.1007/978-3-531-93336-8_13(285-310)Online publication date: 2013
    • (2011)MultipediaProceedings of the sixth international conference on Knowledge capture10.1145/1999676.1999701(137-144)Online publication date: 26-Jun-2011

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media