Abstract
This work presents an experimental semantic approach for mining knowledge from the World Wide Web (WWW). The main goal is to build a context-specific knowledge base from web documents. The basic idea is to use a reference knowledge provided by a dictionary as the indexing structure of domain-specific computed knowledge instances organised in the form of interlinked text words. The WordNet lexical database has been used as reference knowledge for the English web documents. Both the reference and the computed knowledge are actually conceived as word graphs. Graph is considered here as a powerful way to represent structured knowledge. This assumption has many consequences on the way knowledge can be explored and similar knowledge patterns can be identified. In order to identify context-specific elements in knowledge graphs, the novel semantic concept of “minutia” has been introduced. A preliminary evaluation of the efficacy of the proposed approach has been carried out. A fair comparison strategy with other non-semantic competing approaches is currently under investigation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (May 2001)
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. In: Proc. of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA 1998), San Francisco, California, USA, January 1998, pp. 668–677 (1998); Journal of the ACM (JACM) 46, 604–632 (September 1999) (Extended version)
Cimiano, P., Hotho, A., Staab, S.: Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis. Journal of Artificial Intelligence Research (JAIR) 24, 305–339 (2005)
Seo, Y.W., Ankolekar, A., Sycara, K.: Feature Selection for Extracting Semantically Rich Words. Technical report CMU-RI-TR-04-18 Robotics Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania (March 2004)
Dellschaft, K., Staab, S.: On How to Perform a Gold Standard Based Evaluation of Ontology Learning. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 228–241. Springer, Heidelberg (2006)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30, 107–117 (1998)
Richardson, R., Smeaton, A.F., Murphy, J.: Using WordNet for Conceptual Distance Measurement. In: Proc. of the Annual BCS-IRSG Colloquium on IR Research, Glasgow, Scotland, pp. 100–123 (March 1994)
Fellbaum, C.: WordNet: An Electronic Lexical Database (May 1998) ISBN-10: 0-262-06197-X
Chakrabarti,S., Dom, B.E., Gibson, D., Kleinberg, J,M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Mining the Link Structure of the World Wide Web. IEEE Computer 32, 60–67 (1999)
Kosala, R., Blockeel, H.: Web Mining Research: A Survey. ACM SIGKDD Explorations Newsletter 2, 1–15 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Di Lecce, V., Calabrese, M., Soldo, D. (2008). Mining Context-Specific Web Knowledge: An Experimental Dictionary-Based Approach. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2008. Lecture Notes in Computer Science(), vol 5227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85984-0_108
Download citation
DOI: https://doi.org/10.1007/978-3-540-85984-0_108
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85983-3
Online ISBN: 978-3-540-85984-0
eBook Packages: Computer ScienceComputer Science (R0)