skip to main content
10.1145/1410140.1410164acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

No mining, no meaning: relating documents across repositories with ontology-driven information extraction

Authors Info & Claims
Published:16 September 2008Publication History

ABSTRACT

Far from eliminating documents as some expected, the Internet has lead to a proliferation of digital documents, without a centralized control or indexing. Thus, identifying relevant documents becomes simultaneously more important and much harder, since what users require may be dispersed across many documents and many repositories. This paper describes Ontologic Anchoring, a technique to relate documents in domain ontologies, using named entity recognition (a natural-language processing approach) and semantic annotation to relate individual documents to elements in ontologies. This approach allows document retrieval using domain-level inferences, and integration of repositories with heterogeneous media, languages and structure. Ontological anchoring is a two-way street: ontologies allow semantic indexing of documents, and simultaneously new documents enrich ontologies. The approach is illustrated with an initial deployment for heritage documents in Spanish.

References

  1. Manola, F. and Miller, E. 2004. RDF Primer W3C Recommendation 10 February 2004. DOI= http://www.w3.org/TR/rdf-primer/.Google ScholarGoogle Scholar
  2. Brickley, D. and Guha R.V. 2004. RDF Vocabulary Description Language 1.0: RDF Schema. DOI= http://www.w3.org/TR/rdf-schema/.Google ScholarGoogle Scholar
  3. Kao, A. and Poteet, S. 2005. Text Mining and Natural Language Processing - Introduction for the Special Issue. In SIGKDD Explorations, Volume 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cunningham, H. 2005. Automatic Information Extraction. In Encyclopedia of Language and Linguistics, 2nd Edition, Elsevier 2005.Google ScholarGoogle Scholar
  5. Berners-Lee, T., Hendler, J. and Lassila, O. 2001. The Semantic Web. In Scientific American, 284(5), pp. 34--43, May 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. Weller, K. Folksonomies and Ontologies. 2007. Two New Players in Indexing and Knowledge Representation. In Online Information 2007 Conference Proceedings.Google ScholarGoogle Scholar
  7. Peters, I and Stock, WG. 2007. Folksonomy and Information Retrieval. In: Proceedings of the 70th Annual Meeting of the American Society for Information Science and Technology (Vol. 45).Google ScholarGoogle Scholar
  8. Alani, H., Kin, S., Millard, D., Weal, M., Hall, W., Lewis, P. and Shadbolt, N. 2003. Automatic Ontology-Based Knowledge Extraction from Web Documents. In IEEE Intelligent Systems, Volume 18, Issue1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lieber, J., Napoli, A., Szathmary, L. and Toussaint, Y. 2008. First Elements on Knowledge Discovery Guided by Domain Knowledge (KDDK). Concept Lattices and Their Applications (CLA 06), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Astudillo, H., et al., Contexta/SR: A multi-institutional semantic integration platform, in J. Trant and D. Bearman (eds.). Museums and the Web 2008: Proceedings, Toronto: Archives & Museum Informatics. Published March 31, 2008.Google ScholarGoogle Scholar
  11. Guy, M., Tonkin, E. 2006. Folksonomies: Tidying up tags?. In D-Lib Magazine, 12(1).Google ScholarGoogle Scholar
  12. Yang H. and Lee, C. 2005. Automatic Metadata Generation for Web Pages Using a Text Mining Approach. In Proceedings of the 2005 International Workshop on Challenges in Web Information Retrieval and Integration. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Noll, M. and Meinel, C. 2007. Authors vs. Readers - A comparative Study of Document Metadata and Content in WWW. In Proceedings of DocEng 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fayerwayer. http://www.fayerwayer.com/Google ScholarGoogle Scholar
  15. Spanish Wikipedia. http:/es.wikipedia.com/Google ScholarGoogle Scholar
  16. Memoria Chilena. http://www.memoriachilena.cl/Google ScholarGoogle Scholar
  17. Yakoev, I. 2007. Web 2.0: Is it Evolutionary or Revolutionary? In IEEE IT Professional, Volume 9, Issue 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. YouTube. http://www.youtube.com/Google ScholarGoogle Scholar
  19. Flickr. http://www.flickr.com/Google ScholarGoogle Scholar
  20. Del.icio.us. http://del.icio.us/Google ScholarGoogle Scholar
  21. Geonames. http://www.geonames.org/Google ScholarGoogle Scholar
  22. DBPedia. http://dbpedia.org/Google ScholarGoogle Scholar
  23. Vossen P., E. Agirre, N. Calzolari, C. Fellbaum, S. Hsieh, C. Huang, H. Isahara, K. Kanzaki, A. Marchetti, M. Monachini, F. Neri, R. Raffaelli, G. Rigau, M. Tescon. 2008. KYOTO: A system for Mining, Structuring and Distributing Knowledge Across Languages and Cultures. In Proceedings of LREC 2008, Marrakech, Morocco, May 28-30, 2008.Google ScholarGoogle Scholar
  24. Harry Kornilakis, Maria Grigoriadou, Kyparisia A. Papanikolaou, Evangelia Gouli, "Using WordNet to Support Interactive Concept Map Construction," icalt, pp. 600--604, Fourth IEEE International Conference on Advanced Learning Technologies (ICALT'04), 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Li, B., Sugandh, N., Garcia, E. V., and Ram, A. 2007. Adapting associative classification to text categorization. In Proceedings of the 2007 ACM Symposium on Document Engineering (Winnipeg, Manitoba, Canada, August 28 - 31, 2007). DocEng '07. ACM, New York, NY, 205--208. DOI= http://doi.acm.org/10.1145/1284420.1284470 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Whiting, M. A., Cowley, W., Cramer, N., Gibson, A., Hohimer, R., Scott, R., and Tratz, S. 2005. Enabling massive scale document transformation for the semantic web: the universal parsing agent". In Proceedings of the 2005 ACM Symposium on Document Engineering (Bristol, United Kingdom, November 02 - 04, 2005). DocEng '05. ACM, New York, NY, 23--25. DOI= http://doi.acm.org/10.1145/1096601.1096608 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christiaens,S. 2006. Metadata Mechanism: From Ontology to Folksonomy and Back. In On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Reeve, L. and Han, H. 2005. Survey of semantic annotation platforms. In Proceedings of the 2005 ACM Symposium on Applied Computing (Santa Fe, New Mexico, March 13 - 17, 2005). L. M. Liebrock, Ed. SAC '05. ACM, New York, NY, 1634--1638. DOI= http://doi.acm.org/10.1145/1066677.1067049 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Cunningham, H., Maynard, D., Bontcheva, K. and Tablan, V., GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications in 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02), (2002).Google ScholarGoogle Scholar
  30. Wu, H., Zubair, M., and Maly, K. 2006. Harvesting social knowledge from folksonomies. In Proceedings of the Seventeenth Conference on Hypertext and Hypermedia (Odense, Denmark, August 22 - 25, 2006). HYPERTEXT '06. ACM, New York, NY, 111--114. DOI= http://doi.acm.org/10.1145/1149941.1149962 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Gruber, R. 1993. A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199--220, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Buiterman, D. C. A. 2004. Is it time for a moratorium of metadata?. In Multimedia IEEE, Volume 11, Issue 4. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ding L., Zhou, L., Finin, T., Joshi A. 2005. How the Semantic Web is Being Used: An analysis of FOAF Documents. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences, 2005. HICSS '05. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Aufare, M., Bénédicte Le Grande, Soto, M., Bennacer, N. 2006. Metadata-and-Ontology-Based Semantic Web Mining. In Web Semantics & Ontology, D. Taniar & J. Wenny Rahayu. Idea Group Publishing: 259--295, 2006.Google ScholarGoogle Scholar
  35. Daumé III, Hal. 2006. Human in the loop learning. In Natural Language Processing Blog. Visited on 07-01-2008.Google ScholarGoogle Scholar
  36. Tesauro de Arte & Arquitectura. http://www.aatespanol.cl/Google ScholarGoogle Scholar

Index Terms

  1. No mining, no meaning: relating documents across repositories with ontology-driven information extraction

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              DocEng '08: Proceedings of the eighth ACM symposium on Document engineering
              September 2008
              312 pages
              ISBN:9781605580814
              DOI:10.1145/1410140

              Copyright © 2008 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 16 September 2008

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              DocEng '08 Paper Acceptance Rate21of62submissions,34%Overall Acceptance Rate178of537submissions,33%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader