skip to main content
10.1145/3014812.3014867acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesaus-cswConference Proceedingsconference-collections
research-article

Crowd-annotation and LoD-based semantic indexing of content in multi-disciplinary web repositories to improve search results

Published: 31 January 2017 Publication History

Abstract

Searching for relevant information in multi-disciplinary web repositories is becoming a topic of increasing interest among the computer science research community. To date, methods and techniques to extract useful and relevant information from online repositories of research data have largely been based on static full text indexing which entails a 'produce once and use forever' kind of strategy. That strategy is fast becoming insufficient due to increasing data volume, concept obsolescence, and complexity and heterogeneity of content types in web repositories. We propose that by automatic semantic annotation of content in web repositories (using Linked Open Data or LoD sources) without using domain-specific ontologies, we can sustain the performance of searching by retrieving highly relevant search results. Secondly, we claim that by expert crowd-annotation of content on top of automatic semantic annotation, we can enrich the semantic index over time to augment the contextual value of content in web repositories so that they remain findable despite changes in language, terminology and scientific concepts. We deployed a custom-built annotation, indexing and searching environment in a web repository website that has been used by expert annotators to annotate webpages using free text and vocabulary terms. We present our findings based on the annotation and tagging data on top of LoD-based annotations and the overall modus operandi. We also analyze and demonstrate that by adding expert annotations to the existing semantic index, we can improve the relationship between query and documents using Cosine Similarity Measures (CSM).

References

[1]
Fernandez, M., et al., Semantically enhanced Information Retrieval: An ontology-based approach. Web Semantics: Science, Services and Agents on the World Wide Web, 2011. 9(4): p. 434--452.
[2]
Wu, P., A. Heok, and I. Tamsir, Annotating the Web Archives - An Exploration of Web Archives Cataloging and Semantic Web Digital Libraries: Achievements, Challenges and Opportunities, S. Sugimoto, et al., Editors. 2006, Springer Berlin / Heidelberg. p. 12--21.
[3]
Khan, A., T. Tiropanis, and D. Martin. Exploiting Semantic Annotation of Content with Linked Open Data (LoD) to Improve Searching Performance in Web Repositories of Multi-disciplinary Research Data. in 9th Russian Summer School, RuSSIR 2015, Saint Petersburg, Russia, August 24--28, 2015. 2015. Springer International Publishing.
[4]
Mirizzia, R., A.R.T. Di Noiaa, and E. Di Sciascioa, Lookup, Explore, Discover: how DBpedia can improve your Web search. 2010.
[5]
Riggs, F.W., Interconcept report: a new paradigm for solving the terminology problems of the social sciences. Vol. 44. 1981: Unesco.
[6]
Snow, R., et al. Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks. in Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2008. Association for Computational Linguistics.
[7]
Royo, J.A., et al. Searching the Web: from keywords to semantic queries. in Information Technology and Applications, 2005. ICITA 2005. Third International Conference on. 2005.
[8]
Shabanzadeh, M., M.A. Nematbakhsh, and N. Nematbakhsh. A Semantic based query expansion to search. in Intelligent Control and Information Processing (ICICIP), 2010 International Conference on. 2010.
[9]
Wu, X., L. Zhang, and Y. Yu. Exploring social annotations for the semantic web. in Proceedings of the 15th international conference on World Wide Web. 2006. ACM.
[10]
Zervanou, K., et al., Enrichment and Structuring of Archival Description Metadata. ACL HLT 2011, 2011: p. 44.
[11]
Yang, C., K.-C. Yang, and H.-C. Yuan, Improving the search process through ontology-based adaptive semantic search. The Electronic Library, 2007. 25(2): p. 234--248.
[12]
Bao, S., et al., Optimizing web search using social annotations, in Proceedings of the 16th international conference on World Wide Web. 2007, ACM: Banff, Alberta, Canada. p. 501--510.
[13]
De Virgilio, R., RDFa Based Annotation of Web Pages through Keyphrases Extraction On the Move to Meaningful Internet Systems: OTM 2011, R. Meersman, et al., Editors. 2011, Springer Berlin / Heidelberg. p. 644--661.
[14]
Khan, A., D. Martin, and T. Tiropanis, Using Semantic Indexing to Improve Searching Performance in Web Archives, in International Journal on Advances in Internet Technology. 2012: Seville, Spain. p. 1--4.
[15]
Bontcheva, K., V. Tablan, and H. Cunningham, Semantic Search over Documents and Ontologies, in Bridging Between Information Retrieval and Databases, N. Ferro, Editor. 2014, Springer Berlin Heidelberg. p. 31--53.
[16]
Benjamins, R., et al., The six challenges of the Semantic Web. 2002.
[17]
Halpin, H., Social Semantics: The Search for Meaning on the Web. Vol. 13. 2013, USA: Springer US. 220.
[18]
Cappiello, C., et al. A Quality Model for Linked Data Exploration. in International Conference on Web Engineering. 2016. Springer.
[19]
Gangemi, A., A Comparison of Knowledge Extraction Tools for the Semantic Web, in The Semantic Web: Semantics and Big Data, P. Cimiano, et al., Editors. 2013, Springer Berlin Heidelberg. p. 351--366.
[20]
Rizzo, G., et al., NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud. LDOW, 2012. 937.
[21]
Gabrilovich, E. and S. Markovitch. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. in IJcAI. 2007.
[22]
Burghardt, M. Usability recommendations for annotation tools. in Proceedings of the Sixth Linguistic Annotation Workshop. 2012. Association for Computational Linguistics.

Cited By

View all
  • (2018)Heterogeneous Database System for Faster Data Querying Using Elasticsearch2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA)10.1109/ICCUBEA.2018.8697437(1-4)Online publication date: Aug-2018
  1. Crowd-annotation and LoD-based semantic indexing of content in multi-disciplinary web repositories to improve search results

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference
    January 2017
    615 pages
    ISBN:9781450347686
    DOI:10.1145/3014812
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 January 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crowd-annotation
    2. elasticsearch
    3. linked open data
    4. semantic annotations
    5. semantic search
    6. tagging and annotation
    7. web repositories search

    Qualifiers

    • Research-article

    Funding Sources

    • Economic & Social Research Council (ESRC) and ReStore project

    Conference

    ACSW 2017
    ACSW 2017: Australasian Computer Science Week 2017
    January 30 - February 3, 2017
    Geelong, Australia

    Acceptance Rates

    ACSW '17 Paper Acceptance Rate 78 of 156 submissions, 50%;
    Overall Acceptance Rate 204 of 424 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Heterogeneous Database System for Faster Data Querying Using Elasticsearch2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA)10.1109/ICCUBEA.2018.8697437(1-4)Online publication date: Aug-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media