Ontology-Based Information Extraction for Populating the Intelligent Scientific Internet Resources

Akhmadeeva, Irina R.; Zagorulko, Yury A.; Mouromtsev, Dmitry I.

doi:10.1007/978-3-319-45880-9_10

Irina R. Akhmadeeva¹²,
Yury A. Zagorulko¹² &
Dmitry I. Mouromtsev¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 649))

Included in the following conference series:

International Conference on Knowledge Engineering and the Semantic Web

643 Accesses
2 Citations

Abstract

The paper considers the problems of ontology-based collection of information from the Internet about scientific activity for the population of the Intelligent Scientific Internet Resource. An approach to automating this process is proposed, which combines metasearch and information extraction methods based on ontology, thesaurus and pattern technique. In accordance with the approach, specific methods of information extraction adjustable to the knowledge area and types of information resources are developed for every type of entities (ontology class). Each of these methods includes a set of query templates and a set of information extraction patterns. The query templates constructed on the basis of an ontology class description are used to generate queries to search engines in order to collect web documents containing information about the individuals of this class. Web documents gathered using metasearch methods are analyzed by applying the information extraction patterns. For every kind of information to be extracted, these patterns give text markers defining their position in a web document. The patterns are generated on the basis of an ontology taking into consideration the structure of web documents. Several patterns can be combined together to extract information about related entities. To improve the recall of information extraction, the patterns use alternative terms in different languages from the thesaurus (synonyms and hyponyms) to describe the markers. Experiments showed that the proposed approach allows us to achieve an acceptable recall of the extraction from the Internet of information about scientific activity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Zagorulko, Y., Zagorulko, G.: Ontology-based technology for development of intelligent scientific internet resources. In: Fujita, H., Guizzi, G. (eds.) SoMeT 2015. CCIS, vol. 532, pp. 227–241. Springer, Heidelberg (2015)
Chapter Google Scholar
Guarino, N.: Formal ontology in information systems. In: Proceedings of FOIS 1998, Trento, Italy. IOS Press, Amsterdam, pp. 3–15 (1998)
Google Scholar
Zhai, Y., Liu, B.: Extracting web data using instance-based learning. In: Ngu, A.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 318–331. Springer, Heidelberg (2005)
Chapter Google Scholar
Meng, W., Yu, C., Liu, K.L.: Building efficient and effective metasearch engines. ACM Comput. Surv. (CSUR) 34(1), 48–89 (2002)
Article Google Scholar
Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Gentile, A.L., et al.: Unsupervised wrapper induction using linked data. In: Proceedings of the Seventh International Conference on Knowledge Capture, pp. 41–48. ACM (2013)
Google Scholar
Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 441–450. ACM (2010)
Google Scholar
Baroni, M., et al.: Cleaneval: a competition for cleaning web pages. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008) (2008)
Google Scholar
Evert, S.: A lightweight and efficient tool for cleaning web pages. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008) (2008)
Google Scholar
Ferrara, E., De Meo, P., Fiumara, G., Baumgartner, R.: Web data extraction, applications and techniques: a survey. Knowl.-Based Syst. 70, 301–323 (2014)
Article Google Scholar
Bernabe-Moreno, J., Tejeda-Lorente, A., Porcel, C., Fujita, H., Herrera-Viedma, E.: CARESOME: a system to enrich marketing customers acquisition and retention campaigns using social media information. Knowl.-Based Syst. 80, 163–179 (2015)
Article Google Scholar
Cobo, M.J., Martinez, M.A., Gutierrez-Salcedo, M., Fujita, H., Herrera-Viedma, E.: 25 years at knowledge-based systems: a bibliometric analysis. Knowl.-Based Syst. 80, 3–13 (2015)
Article Google Scholar
Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci. 36(3), 306–323 (2010)
Article Google Scholar
Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-based information extraction for business intelligence. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 843–856. Springer, Heidelberg (2007)
Chapter Google Scholar
McDowell, L.K., Cafarella, M.: Ontology-driven information extraction with OntoSyphon. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 428–444. Springer, Heidelberg (2006)
Chapter Google Scholar
Cimiano, P., Handschuh, S., Staab, S.: Towards the self-annotating web. In: Proceedings of the 13th International Conference on World Wide Web, pp. 462–471. ACM (2004)
Google Scholar
Buitelaar, P., et al.: Ontology-based information extraction with soba. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC) (2006)
Google Scholar

Download references

Acknowledgments

The authors are grateful to the Russian Foundation for Basic Research (grant № 16-07-00569) for financial support of this work.

Author information

Authors and Affiliations

A.P. Ershov Institute of Informatics Systems, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
Irina R. Akhmadeeva & Yury A. Zagorulko
ITMO University, St. Petersburg, Russia
Dmitry I. Mouromtsev

Authors

Irina R. Akhmadeeva
View author publications
You can also search for this author in PubMed Google Scholar
Yury A. Zagorulko
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry I. Mouromtsev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irina R. Akhmadeeva .

Editor information

Editors and Affiliations

Leipzig University , Leipzig, Germany
Axel-Cyrille Ngonga Ngomo
Czech Technical University in Prague , Praha, Czech Republic
Petr Křemen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akhmadeeva, I.R., Zagorulko, Y.A., Mouromtsev, D.I. (2016). Ontology-Based Information Extraction for Populating the Intelligent Scientific Internet Resources. In: Ngonga Ngomo, AC., Křemen, P. (eds) Knowledge Engineering and Semantic Web. KESW 2016. Communications in Computer and Information Science, vol 649. Springer, Cham. https://doi.org/10.1007/978-3-319-45880-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-45880-9_10
Published: 08 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45879-3
Online ISBN: 978-3-319-45880-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics