skip to main content
research-article

"Linked data as background knowledge for information extraction on the web" by Ziqi Zhang, Anna Lisa Gentile and Isabelle Augenstein with Martin Vesely as coordinator

Published: 01 July 2014 Publication History

Abstract

Information Extraction (IE) is the technique for transforming textual data into structured representation that can be understood by machines. It is a crucial technique in enabling the Semantic Web, where increasing interest has been seen in recent years. This article reports recent progress in the LODIE project - Linked Open Data for Information Extraction, aimed at advancing Web IE to a new frontier by exploiting largely available, semantically annotated, Linked Open Data as background knowledge. We cover topics of wrapper induction, IE from semi-structured content such as tables and lists, and IE from free-text. We describe new challenges in the research and methods proposed to address them, together with summaries of recent evaluations showing encouraging results.

References

[1]
Agichtein, E., Gravano, L., Pavel, J., Sokolova, V., and Voskoboynik, A. 2001. Snowball: a prototype system for extracting relations from large text collections. In SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data. ACM Press, New York, NY, USA, 612.
[2]
Augenstein, I. 2014a. Exploiting Linked Data for Web-Scale Relation Extraction. Submitted to ISWC Research Track 2014. http://staffwww.dcs.shef.ac.uk/people/I.Augenstein/ISWC2014-Augenstein.pdf.
[3]
Augenstein, I. 2014b. Seed Selection for Self-Supervised Web-Based Relation Extraction. Submitted to SWAIE 2014. http://staffwww.dcs.shef.ac.uk/people/I.Augenstein/SWAIE2014-Augenstein.pdf.
[4]
Balog, K. and Serdyukov, P. 2011. Overview of the TREC 2010 Entity Track. In Proceedings of the Nineteenth Text REtrieval Conference (TREC 2010). NIST.
[5]
Blomqvist, E., Zhang, Z., Gentile, A. L., Augenstein, I., and Ciravegna, F. 2013. Statistical knowledge patterns for characterizing linked data. In Proceedings of the Workshop on Ontology and Semantic Web Patterns (4th edition) - WOP2013. Lecture Notes in Computer Science. Springer.
[6]
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Jr., E. H., and Mitchell, T. 2010. Toward an architecture for never-ending language learning. In Proceedings of the Conference on Artificial Intelligence (AAAI). AAAI Press, 1306--1313.
[7]
Ciravegna, F., Gentile, A. L., and Zhang, Z. 2012. Lodie: Linked open data for web-scale information extraction. In SWAIE, D. Maynard, M. van Erp, and B. Davis, Eds. CEUR Workshop Proceedings, vol. 925. CEUR-WS.org, 11--22.
[8]
Etzioni, O., Banko, M., Soderland, S., and Weld, D. S. 2008. Open information extraction from the web. Commun. ACM 51, 12 (Dec.), 68--74.
[9]
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A. 2004. Web-scale information extraction in knowitall: (preliminary results). In Proceedings of the 13th International Conference on World Wide Web. WWW '04. ACM, New York, NY, USA, 100--110.
[10]
Freedman, M., Ramshaw, L., Boschee, E., Gabbard, R., Kratkiewicz, G., Ward, N., and Weischedel, R. 2011. Extreme extraction: Machine reading in a week. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP '11. Association for Computational Linguistics, Stroudsburg, PA, USA, 1437--1446.
[11]
Freitas, A., Oliveira, J. a. G., O'Riain, S., da Silva, J. C., and Curry, E. 2013. Querying linked data graphs using semantic relatedness: A vocabulary independent approach. Data and Knowledge Engineering 88, 126--141.
[12]
Gangemi, A. and Presutti, V. 2010. Towards a pattern science for the semantic web. Semant. web 1, 1,2 (Apr.), 61--68.
[13]
Gentile, A., Zhang, Z., and Ciravegna, F. 2013. Web scale information extraction with lodie. In AAAI Fall Symposium Series. AAAI, 24--27.
[14]
Gentile, A. L., Zhang, Z., Augenstein, I., and Ciravegna, F. 2013. Unsupervised wrapper induction using linked data. In Proc. of the seventh international conference on Knowledge capture. K-CAP '13. ACM, New York, NY, USA, 41--48.
[15]
Gentile, A. L., Zhang, Z., and Fabio, C. 2014. Self Training Wrapper Induction with Linked Data. In 17th International Conference on Text, Speech and Dialogue. Springer, To appear.
[16]
Halpin, H. and Hayes, P. J. 2010. When owl:sameAs isnt the same: An analysis of identity links on the semantic web. In ESWC2010.
[17]
Kushmerick, N. 1997. Wrapper induction for information extraction. Ph.D. thesis. AAI9819266.
[18]
Limaye, G., Sarawagi, S., and Chakrabarti, S. 2010. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proceedings of the VLDB Endowment 3, 1--2, 1338--1347.
[19]
Lu, C., Bing, L., Lam, W., Chan, K., and Gu, Y. 2013. Web entity detection for semi-structured text data records with unlabeled data. International Journal of Computational Linguistics and Applications To appear.
[20]
Mintz, M., Bills, S., Snow, R., and Jurafsky, D. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, 1003--1011.
[21]
Mulwad, V., Finin, T., and Joshi, A. 2013. Semantic message passing for generating linked data from tables. In International Semantic Web Conference (1), H. Alani, L. Kagal, A. Fokoue, P. T. Groth, C. Biemann, J. X. Parreira, L. Aroyo, N. F. Noy, C. Welty, and K. Janowicz, Eds. Lecture Notes in Computer Science, vol. 8218. Springer, 363--378.
[22]
Nadeau, D. and Sekine, S. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes 30, 1 (January), 3--26. Publisher: John Benjamins Publishing Company.
[23]
Nakashole, N., Theobald, M., and Weikum, G. 2011. Scalable knowledge harvesting with high precision and high recall. In Proc. of the fourth ACM international conference on Web search and data mining. WSDM '11. ACM, New York, NY, USA, 227--236.
[24]
Nuzzolese, A. G., Gangemi, A., Presutti, V., and Ciancarini, P. 2011. Encyclopedic knowledge patterns from wikipedia links. In Proc. of the 10th international conference on The semantic web - Volume Part I. ISWC'11. Springer-Verlag, Berlin, Heidelberg, 520--536.
[25]
Riedel, S., Yao, L., and McCallum, A. 2010. Modeling Relations and Their Mentions without Labeled Text. In Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III. ECML PKDD'10. Springer-Verlag, 148--163.
[26]
Roth, B. and Klakow, D. 2013. Combining Generative and Discriminative Model Scores for Distant Supervision. In EMNLP. ACL, 24--29.
[27]
Surdeanu, M., Tibshirani, J., Nallapati, R., and Manning, C. D. 2012. Multi-instance Multi-label Learning for Relation Extraction. In EMNLP-CoNLL. ACL, 455--465.
[28]
Wijaya, D., Talukdar, P. P., and Mitchell, T. 2013. Pidgin: Ontology alignment using web text as interlingua. In Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management. CIKM '13. ACM, New York, NY, USA, 589--598.
[29]
Zhang, Z. 2014. Start small, build complete: Effective and efficient semantic table interpretation using tableminer. In Under transparent review: The Semantic Web Journal. http://www.semantic-web-journal.net/content/start-small-build-complete-effective-and-efficient-semantic-table-interpretation-using.
[30]
Zhang, Z., Gentile, A. L., Augenstein, I., Blomqvist, E., and Ciravegna, F. 2013. Mining equivalent relations from linked data. In Proc. of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Sofia, Bulgaria, 289--293.
[31]
Zhang, Z., Gentile, A. L., Blomqvist, E., Augenstein, I., and Ciravegna, F. 2013. Statistical knowledge patterns: Identifying synonymous relations in large linked datasets. In International Semantic Web Conference (1), H. Alani, L. Kagal, A. Fokoue, P. T. Groth, C. Biemann, J. X. Parreira, L. Aroyo, N. F. Noy, C. Welty, and K. Janowicz, Eds. Lecture Notes in Computer Science, vol. 8218. Springer, 703--719.
[32]
Zhou, G., Su, J., Zhang, J., and Zhang, M. 2005. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. ACL '05. Association for Computational Linguistics, Stroudsburg, PA, USA, 427--434.
[33]
Zhu, J., Nie, Z., Liu, X., Zhang, B., and Wen, J.-R. 2009. Statsnowball: A statistical approach to extracting entity relationships. In Proceedings of the 18th International Conference on World Wide Web. WWW '09. ACM, New York, NY, USA, 101--110.

Cited By

View all
  • (2016)Semantic Web in data mining and knowledge discoveryWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2016.01.00136:C(1-22)Online publication date: 1-Jan-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGWEB Newsletter
ACM SIGWEB Newsletter  Volume 2014, Issue Summer
Summer 2014
30 pages
ISSN:1931-1745
EISSN:1931-1435
DOI:10.1145/2656899
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2014
Published in SIGWEB Volume 2014, Issue Summer

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Semantic Web in data mining and knowledge discoveryWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2016.01.00136:C(1-22)Online publication date: 1-Jan-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media