ABSTRACT
Far from eliminating documents as some expected, the Internet has lead to a proliferation of digital documents, without a centralized control or indexing. Thus, identifying relevant documents becomes simultaneously more important and much harder, since what users require may be dispersed across many documents and many repositories. This paper describes Ontologic Anchoring, a technique to relate documents in domain ontologies, using named entity recognition (a natural-language processing approach) and semantic annotation to relate individual documents to elements in ontologies. This approach allows document retrieval using domain-level inferences, and integration of repositories with heterogeneous media, languages and structure. Ontological anchoring is a two-way street: ontologies allow semantic indexing of documents, and simultaneously new documents enrich ontologies. The approach is illustrated with an initial deployment for heritage documents in Spanish.
- Manola, F. and Miller, E. 2004. RDF Primer W3C Recommendation 10 February 2004. DOI= http://www.w3.org/TR/rdf-primer/.Google Scholar
- Brickley, D. and Guha R.V. 2004. RDF Vocabulary Description Language 1.0: RDF Schema. DOI= http://www.w3.org/TR/rdf-schema/.Google Scholar
- Kao, A. and Poteet, S. 2005. Text Mining and Natural Language Processing - Introduction for the Special Issue. In SIGKDD Explorations, Volume 7. Google ScholarDigital Library
- Cunningham, H. 2005. Automatic Information Extraction. In Encyclopedia of Language and Linguistics, 2nd Edition, Elsevier 2005.Google Scholar
- Berners-Lee, T., Hendler, J. and Lassila, O. 2001. The Semantic Web. In Scientific American, 284(5), pp. 34--43, May 2001.Google ScholarCross Ref
- Weller, K. Folksonomies and Ontologies. 2007. Two New Players in Indexing and Knowledge Representation. In Online Information 2007 Conference Proceedings.Google Scholar
- Peters, I and Stock, WG. 2007. Folksonomy and Information Retrieval. In: Proceedings of the 70th Annual Meeting of the American Society for Information Science and Technology (Vol. 45).Google Scholar
- Alani, H., Kin, S., Millard, D., Weal, M., Hall, W., Lewis, P. and Shadbolt, N. 2003. Automatic Ontology-Based Knowledge Extraction from Web Documents. In IEEE Intelligent Systems, Volume 18, Issue1. Google ScholarDigital Library
- Lieber, J., Napoli, A., Szathmary, L. and Toussaint, Y. 2008. First Elements on Knowledge Discovery Guided by Domain Knowledge (KDDK). Concept Lattices and Their Applications (CLA 06), 2008. Google ScholarDigital Library
- Astudillo, H., et al., Contexta/SR: A multi-institutional semantic integration platform, in J. Trant and D. Bearman (eds.). Museums and the Web 2008: Proceedings, Toronto: Archives & Museum Informatics. Published March 31, 2008.Google Scholar
- Guy, M., Tonkin, E. 2006. Folksonomies: Tidying up tags?. In D-Lib Magazine, 12(1).Google Scholar
- Yang H. and Lee, C. 2005. Automatic Metadata Generation for Web Pages Using a Text Mining Approach. In Proceedings of the 2005 International Workshop on Challenges in Web Information Retrieval and Integration. Google ScholarDigital Library
- Noll, M. and Meinel, C. 2007. Authors vs. Readers - A comparative Study of Document Metadata and Content in WWW. In Proceedings of DocEng 2007. Google ScholarDigital Library
- Fayerwayer. http://www.fayerwayer.com/Google Scholar
- Spanish Wikipedia. http:/es.wikipedia.com/Google Scholar
- Memoria Chilena. http://www.memoriachilena.cl/Google Scholar
- Yakoev, I. 2007. Web 2.0: Is it Evolutionary or Revolutionary? In IEEE IT Professional, Volume 9, Issue 6. Google ScholarDigital Library
- YouTube. http://www.youtube.com/Google Scholar
- Flickr. http://www.flickr.com/Google Scholar
- Del.icio.us. http://del.icio.us/Google Scholar
- Geonames. http://www.geonames.org/Google Scholar
- DBPedia. http://dbpedia.org/Google Scholar
- Vossen P., E. Agirre, N. Calzolari, C. Fellbaum, S. Hsieh, C. Huang, H. Isahara, K. Kanzaki, A. Marchetti, M. Monachini, F. Neri, R. Raffaelli, G. Rigau, M. Tescon. 2008. KYOTO: A system for Mining, Structuring and Distributing Knowledge Across Languages and Cultures. In Proceedings of LREC 2008, Marrakech, Morocco, May 28-30, 2008.Google Scholar
- Harry Kornilakis, Maria Grigoriadou, Kyparisia A. Papanikolaou, Evangelia Gouli, "Using WordNet to Support Interactive Concept Map Construction," icalt, pp. 600--604, Fourth IEEE International Conference on Advanced Learning Technologies (ICALT'04), 2004 Google ScholarDigital Library
- Li, B., Sugandh, N., Garcia, E. V., and Ram, A. 2007. Adapting associative classification to text categorization. In Proceedings of the 2007 ACM Symposium on Document Engineering (Winnipeg, Manitoba, Canada, August 28 - 31, 2007). DocEng '07. ACM, New York, NY, 205--208. DOI= http://doi.acm.org/10.1145/1284420.1284470 Google ScholarDigital Library
- Whiting, M. A., Cowley, W., Cramer, N., Gibson, A., Hohimer, R., Scott, R., and Tratz, S. 2005. Enabling massive scale document transformation for the semantic web: the universal parsing agent". In Proceedings of the 2005 ACM Symposium on Document Engineering (Bristol, United Kingdom, November 02 - 04, 2005). DocEng '05. ACM, New York, NY, 23--25. DOI= http://doi.acm.org/10.1145/1096601.1096608 Google ScholarDigital Library
- Christiaens,S. 2006. Metadata Mechanism: From Ontology to Folksonomy and Back. In On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops. 2006. Google ScholarDigital Library
- Reeve, L. and Han, H. 2005. Survey of semantic annotation platforms. In Proceedings of the 2005 ACM Symposium on Applied Computing (Santa Fe, New Mexico, March 13 - 17, 2005). L. M. Liebrock, Ed. SAC '05. ACM, New York, NY, 1634--1638. DOI= http://doi.acm.org/10.1145/1066677.1067049 Google ScholarDigital Library
- Cunningham, H., Maynard, D., Bontcheva, K. and Tablan, V., GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications in 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02), (2002).Google Scholar
- Wu, H., Zubair, M., and Maly, K. 2006. Harvesting social knowledge from folksonomies. In Proceedings of the Seventeenth Conference on Hypertext and Hypermedia (Odense, Denmark, August 22 - 25, 2006). HYPERTEXT '06. ACM, New York, NY, 111--114. DOI= http://doi.acm.org/10.1145/1149941.1149962 Google ScholarDigital Library
- Gruber, R. 1993. A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199--220, 1993. Google ScholarDigital Library
- Buiterman, D. C. A. 2004. Is it time for a moratorium of metadata?. In Multimedia IEEE, Volume 11, Issue 4. 2004. Google ScholarDigital Library
- Ding L., Zhou, L., Finin, T., Joshi A. 2005. How the Semantic Web is Being Used: An analysis of FOAF Documents. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences, 2005. HICSS '05. 2005. Google ScholarDigital Library
- Aufare, M., Bénédicte Le Grande, Soto, M., Bennacer, N. 2006. Metadata-and-Ontology-Based Semantic Web Mining. In Web Semantics & Ontology, D. Taniar & J. Wenny Rahayu. Idea Group Publishing: 259--295, 2006.Google Scholar
- Daumé III, Hal. 2006. Human in the loop learning. In Natural Language Processing Blog. Visited on 07-01-2008.Google Scholar
- Tesauro de Arte & Arquitectura. http://www.aatespanol.cl/Google Scholar
Index Terms
- No mining, no meaning: relating documents across repositories with ontology-driven information extraction
Recommendations
Towards a System for Ontology-Based Information Extraction from PDF Documents
OTM '08: Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet SystemsOntologies enable to directly encode domain knowledge in software applications, so ontology-based systems can exploit the meaning of information for providing advanced and intelligent functionalities. One of the most interesting and promising ...
Mining ontological knowledge using Nyaya framework
Ontology has become the buzzword of the knowledge and semantics community. The process of automatically constructing an ontology with completeness and reduced time has become the need of the hour. This paper presents the method for automatically ...
A Semi-automatic Framework Towards Building Electricity Grid Infrastructure Management Ontology: A Case Study and Retrospective
Web Information Systems Engineering – WISE 2023AbstractThanks to their extensive use in Internet-based applications, ontologies have gained significant popularity and recognition within the semantic web domain. They are widely regarded as valuable sources of semantics and interoperability in ...
Comments