Skip to main content

An Integrated Approach for Large-Scale Relation Extraction from the Web

  • Conference paper
  • 4581 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7808))

Abstract

Deriving knowledge from information stored in unstructured documents is a major challenge. More specifically, binary relationships representing facts between entities can be extracted to populate semantic triple stores or large knowledge bases. The main constraint of all knowledge extraction approaches is to find a trade-off between quality and scalability. Thus, we propose in this paper SPIDER, a novel integrated system for extracting binary relationships at large scale. Through series of experiments, we show the benefit of our approach, which in general, outperforms existing systems both in terms of quality (precision and the number of discovered facts) and scalability.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: Proc. of DL, pp. 85–94. ACM (2000)

    Google Scholar 

  2. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proc. of IJCAI, pp. 2670–2676. Morgan Kaufmann (2007)

    Google Scholar 

  3. Brin, S.: Extracting patterns and relations from the world wide web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  4. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proc. of AAAI. AAAI Press (2010)

    Google Scholar 

  5. Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Communication of ACM 51, 68–74 (2008)

    Article  Google Scholar 

  6. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proc. of EMNLP, pp. 1535–1545. ACL (2011)

    Google Scholar 

  7. Levenshtein, V.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Journal of Soviet Physics Doklady 10, 707 (1966)

    MathSciNet  Google Scholar 

  8. Lynam, T.R., Cormack, G.V., Cheriton, D.R.: On-line spam filter fusion. In: Proc. of SIGIR, pp. 123–130. ACM (2006)

    Google Scholar 

  9. Mausam, Schmitz, M., Soderland, S., Bart, R., Etzioni, O.: Open language learning for information extraction. In: Proc. of EMNLP, pp. 523–534. ACL (2012)

    Google Scholar 

  10. Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: Proc. of WSDM, pp. 227–236. ACM (2011)

    Google Scholar 

  11. Pantel, P., Pennacchiotti, M.: Espresso: leveraging generic patterns for automatically harvesting semantic relations. In: Proc. of ACL, pp. 113–120. ACL (2006)

    Google Scholar 

  12. Parameswaran, A., Garcia-Molina, H., Rajaraman, A.: Towards the web of concepts: extracting concepts from large datasets. VLDB Endowment 3, 566–577 (2010)

    Google Scholar 

  13. Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  14. Takhirov, N., Duchateau, F., Aalberg, T.: An evidence-based verification approach to extract entities and relations for knowledge base population. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 575–590. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Takhirov, N., Duchateau, F., Aalberg, T., Sølvberg, I. (2013). An Integrated Approach for Large-Scale Relation Extraction from the Web. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds) Web Technologies and Applications. APWeb 2013. Lecture Notes in Computer Science, vol 7808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37401-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37401-2_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37400-5

  • Online ISBN: 978-3-642-37401-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics