An Integrated Approach for Large-Scale Relation Extraction from the Web

Takhirov, Naimdjon; Duchateau, Fabien; Aalberg, Trond; Sølvberg, Ingeborg

doi:10.1007/978-3-642-37401-2_18

An Integrated Approach for Large-Scale Relation Extraction from the Web

Naimdjon Takhirov²⁰,
Fabien Duchateau²¹,
Trond Aalberg²⁰ &
…
Ingeborg Sølvberg²⁰

Conference paper

4581 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7808))

Abstract

Deriving knowledge from information stored in unstructured documents is a major challenge. More specifically, binary relationships representing facts between entities can be extracted to populate semantic triple stores or large knowledge bases. The main constraint of all knowledge extraction approaches is to find a trade-off between quality and scalability. Thus, we propose in this paper SPIDER, a novel integrated system for extracting binary relationships at large scale. Through series of experiments, we show the benefit of our approach, which in general, outperforms existing systems both in terms of quality (precision and the number of discovered facts) and scalability.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: Proc. of DL, pp. 85–94. ACM (2000)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proc. of IJCAI, pp. 2670–2676. Morgan Kaufmann (2007)
Google Scholar
Brin, S.: Extracting patterns and relations from the world wide web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)
Chapter Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proc. of AAAI. AAAI Press (2010)
Google Scholar
Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Communication of ACM 51, 68–74 (2008)
Article Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proc. of EMNLP, pp. 1535–1545. ACL (2011)
Google Scholar
Levenshtein, V.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Journal of Soviet Physics Doklady 10, 707 (1966)
MathSciNet Google Scholar
Lynam, T.R., Cormack, G.V., Cheriton, D.R.: On-line spam filter fusion. In: Proc. of SIGIR, pp. 123–130. ACM (2006)
Google Scholar
Mausam, Schmitz, M., Soderland, S., Bart, R., Etzioni, O.: Open language learning for information extraction. In: Proc. of EMNLP, pp. 523–534. ACL (2012)
Google Scholar
Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: Proc. of WSDM, pp. 227–236. ACM (2011)
Google Scholar
Pantel, P., Pennacchiotti, M.: Espresso: leveraging generic patterns for automatically harvesting semantic relations. In: Proc. of ACL, pp. 113–120. ACL (2006)
Google Scholar
Parameswaran, A., Garcia-Molina, H., Rajaraman, A.: Towards the web of concepts: extracting concepts from large datasets. VLDB Endowment 3, 566–577 (2010)
Google Scholar
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
MATH Google Scholar
Takhirov, N., Duchateau, F., Aalberg, T.: An evidence-based verification approach to extract entities and relations for knowledge base population. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 575–590. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Norwegian University of Science and Technology, 7491, Trondheim, Norway
Naimdjon Takhirov, Trond Aalberg & Ingeborg Sølvberg
LIRIS, UMR5205, Université Lyon 1, Lyon, France
Fabien Duchateau

Authors

Naimdjon Takhirov
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Duchateau
View author publications
You can also search for this author in PubMed Google Scholar
Trond Aalberg
View author publications
You can also search for this author in PubMed Google Scholar
Ingeborg Sølvberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, Nagoya University, 464-8601, Nagoya, Japan
Yoshiharu Ishikawa
Department of Computer Science and Technology, Harbin Institute of Technology, 150006, Harbin, China
Jianzhong Li
School of Computer Science and Engineering, University of New South Wales, 2031, Sydney, NSW, Australia
Wei Wang & Wenjie Zhang &
Department of Computing and Information Systems, University of Melbourne, 3052, Melbourne, VIC, Australia
Rui Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takhirov, N., Duchateau, F., Aalberg, T., Sølvberg, I. (2013). An Integrated Approach for Large-Scale Relation Extraction from the Web. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds) Web Technologies and Applications. APWeb 2013. Lecture Notes in Computer Science, vol 7808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37401-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-37401-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37400-5
Online ISBN: 978-3-642-37401-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics