An Intelligent System for Semantic Information Retrieval Information from Textual Web Documents

Karthik, Mukundan; Marikkannan, Mariappan; Kannan, Arputharaj

doi:10.1007/978-3-540-85303-9_13

An Intelligent System for Semantic Information Retrieval Information from Textual Web Documents

Mukundan Karthik¹,
Mariappan Marikkannan² &
Arputharaj Kannan¹

Conference paper

912 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5158))

Abstract

Text data, which are represented as free text in World Wide Web (WWW), are inherently unstructured and hence it becomes difficult to directly process the text data by computer programs. There has been great interest in text mining techniques recently for helping users to quickly gain knowledge from the Web. Text mining technologies usually involve tasks such as text refining which transforms free text into an intermediate representation form which is machine-processable and knowledge distillation which deduces patterns or knowledge from the intermediate form. These text representation methodologies consider documents as bags of words and ignore the meanings and ideas their authors want to convey. As terms are treated as individual items in such simplistic representations, terms lose their semantic relations and texts lose their original meanings. In this paper, we propose a system that overcomes the limitations of the existing technologies to retrieve the information from the knowledge discovered through data mining based on the detailed meanings of the text. For this, we propose a Knowledge representation technique, which uses Resources Description Framework (RDF) metadata to represent the semantic relations, which are extracted from textual web document using natural language processing techniques. The main objective of the creation of RDF metadata in this system is to have flexibility for easy retrieval of the semantic information effectively. We also propose an effective SEMantic INformation RETrieval algorithm called SEMINRET algorithm. The experimental results obtained from this system show that the computations of Precision and Recall in RDF databases are highly accurate when compared to XML databases. Moreover, it is observed from our experiments that the document retrieval from the RDF database is more efficient than the document retrieval using XML databases.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jiang, T., Tan, A.-H., Senior Member, IEEE, Wang, K.: Mining Generalized Asso-ciations of Semantic Relations from Textual Web Content. IEEE Transactions on Knowledge and Data Engineering 19(2), 164–179 (2007)
Article Google Scholar
Appelt, D.: An Introduction to Information Extraction. Artificial Intelligence Communications 12(3), 161–172 (1999)
Google Scholar
Zhou, G., Su, J., Zhang, J., Zhang, M.: Combining Various Knowledge in Relation Extraction. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 427–434 (2005)
Google Scholar
Wang, T., Li, Y., Bontcheva, K., Cunningham, H., Wang, J.: Automatic Extraction of Hierarchical Relations from Text. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 215–229. Springer, Heidelberg (2006)
Chapter Google Scholar
Guarino, N., Masolo, C., Vetere, G.: Ontoseek: Content-Based Access to the Web. IEEE Intelligent Systems 14(3), 70–80 (1999)
Article Google Scholar
Berners-Lee, T., Hendler, J., Lassila, O.: Semantic Web 284(5), 35–43 (2001)
Google Scholar
Berners-Lee, T.: Conceptual Graphs and Semantic Web—Reflections on Web Architecture (2001), http://www.w3.org/DesignIssues/CG.html
Sowa, J.F.: Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley Longman, Amsterdam (1984)
MATH Google Scholar
Sowa, J.F.: Conceptual Graphs: Draft Proposed American National Standard. In: Proceeding of International Conference on Computational Science, pp. 1–65 (1999)
Google Scholar
W3C, W3c RDF Specification (2005), http://www.w3.org/RDF/
W3C, W3c RDF Schema Specification (2005), http://www.w3.org/TR/rdf-schema/
Li, Y., Bontcheva, K., Cunningham, H.: SVM Based Learning System For Information Extraction. In: Mauri, G., Păun, G., Jesús Pérez-Jímenez, M., Rozenberg, G., Salomaa, A. (eds.) WMC 2004. LNCS, vol. 3365, pp. 319–339. Springer, Heidelberg (2005)
Google Scholar
Freitag, D., McCallum, A.: Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the 7th Conference on Artificial Intelligence (AAAI 2000) and of the12th Conference on Innovative Applications of Artificial Intelligence (IAAI 2000), pp. 584–589. AAAI Press, Menlo Park (2000)
Google Scholar
Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. Journal of Machine Learning Research, 1083–1106 (2003)
Google Scholar
Zhou, G., Su, J., Zhang, J., Zhang, M.: Combining Various Knowledge in Relation Extraction. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (2005)
Google Scholar
Broekstra, J., Kampan, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF Schema. In: International Semantic Web Conference, pp. 54–68 (2002)
Google Scholar
Guha, R., McCool, R.: Tap: A Semantic Web Platform. Computer Networks 42(5), 557–577 (2003)
Article MATH Google Scholar
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, College of Engineering, Anna University, Chennai, 600025,
Mukundan Karthik & Arputharaj Kannan
Dept. of Computer Science and Engineering, I.R.T.T, Erode, 638 316,
Mariappan Marikkannan

Authors

Mukundan Karthik
View author publications
You can also search for this author in PubMed Google Scholar
Mariappan Marikkannan
View author publications
You can also search for this author in PubMed Google Scholar
Arputharaj Kannan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sargur N. Srihari Katrin Franke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karthik, M., Marikkannan, M., Kannan, A. (2008). An Intelligent System for Semantic Information Retrieval Information from Textual Web Documents. In: Srihari, S.N., Franke, K. (eds) Computational Forensics. IWCF 2008. Lecture Notes in Computer Science, vol 5158. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85303-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-85303-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85302-2
Online ISBN: 978-3-540-85303-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics