skip to main content
10.1145/2509558.2509569acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Using natural language to integrate, evaluate, and optimize extracted knowledge bases

Published:27 October 2013Publication History

ABSTRACT

Web Information Extraction (WIE) systems extract billions of unique facts, but integrating the assertions into a coherent knowledge base and evaluating across different WIE techniques remains a challenge. We propose a framework that utilizes natural language to integrate and evaluate extracted knowledge bases (KBs). In the framework, KBs are integrated by exchanging probability distributions over natural language, and evaluated by how well the output distributions predict held-out text. We describe the advantages of the approach, and detail remaining research challenges.

References

  1. Cynthia Matuszek Michael, Michael Witbrock, Robert C. Kahlert, John Cabral, Dave Schneider, Purvesh Shah, and Doug Lenat. Searching for common sense: Populating cyc from the web. In In Proceedings of the Twentieth National Conference on Artificial Intelligence, pages 1430--1435, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91--134, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kenneth D Forbus, Christopher Riesbeck, Lawrence Birnbaum, Kevin Livingston, Abhishek Sharma, and Leo Ureel. Integrating natural language, knowledge representation and reasoning, and analogical processing to learn by reading. In PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, volume 22, page 1542. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Procs. of IJCAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Wolfgang Gatterbauer, Paul Bohunsky, Marcus Herzog, Bernhard Krüpl, and Bernhard Pollak. Towards domain-independent information extraction from web tables. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 71--80, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael J. Cafarella, Alon Y. Halevy, Daisy Z. Wang, Eugene W. 0002, and Yang Zhang. Webtables: exploring the power of tables on the web. PVLDB, 1(1):538--549, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fei Wu and Daniel S. Weld. Automatically refining the wikipedia infobox ontology. In Proc. of WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F.M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge. In Procs. of WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010), 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sören Auer and Jens Lehmann. What have innsbruck and leipzig in common? extracting semantics from wiki content. In Proc. of ESWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. James Fan, David Ferrucci, David Gondek, and Aditya Kalyanpur. Prismatic: Inducing knowledge from a large scale lexicalized relation resource. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 122--127. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A Kalyanpur, Adam Lally, J William Murdock, Eric Nyberg, John Prager, et al. Building watson: An overview of the deepqa project. AI magazine, 31(3):59--79, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora. In Procs. of the 14th International Conference on Computational Linguistics, pages 539--545, Nantes, France, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Doug Downey, Oren Etzioni, and Stephen Soderland. Analysis of a probabilistic model of redundancy in unsupervised information extraction. Artificial Intelligence, 174(11):726 -- 748, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain. Organizing and searching the world wide web of facts - step one: The one-million fact extraction challenge. In AAAI 2006. AAAI Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fei Wu, Raphael Hoffmann, and Daniel S. Weld. Information extraction from wikipedia: moving down the long tail. In Proc. of KDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fei Wu and Daniel S. Weld. Autonomously semantifying wikipedia. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 41--50, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hector Gonzalez, Alon Y Halevy, Christian S Jensen, Anno Langen, Jayant Madhavan, Rebecca Shapley, Warren Shen, and Jonathan Goldberg-Kidon. Google fusion tables: web-centered data management and collaboration. In Proceedings of the 2010 international conference on Management of data, pages 1061--1066. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Push Singh, Thomas Lin, Erik T Mueller, Grace Lim, Travell Perkins, and Wan Li Zhu. Open mind common sense: Knowledge acquisition from the general public. In On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE, pages 1223--1237. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L.K. Schubert and M.H. Tong. Extracting and evaluating general world knowledge from the brown corpus. In Proc. of the HLT/NAACL Workshop on Text Meaning, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. AnHai Doan and Alon Y. Halevy. Semantic-integration research in the database community. AI Mag., 26(1):83--94, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Christian Bizer, Tom Heath, Kingsley Idehen, and Tim Berners-Lee. Linked data on the web (ldow2008). In Proceedings of the 17th international conference on World Wide Web, pages 1265--1266. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. Medelyan and C. Legg. Integrating cyc and wikipedia: Folksonomy meets rigorously defined common-sense. In Proc. of WIKIAI, 2008.Google ScholarGoogle Scholar
  24. D. Downey, A. Ahuja, and M. Anderson. Learning to integrate relational databases with wikipedia. In Proc. of WIKIAI, 2009.Google ScholarGoogle Scholar
  25. Thomas Lin, Oren Etzioni, et al. Entity linking at web scale. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 84--88. Association for Computational Linguistics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Harris. Distributional structure. In J. J. Katz, editor, The Philosophy of Linguistics, pages 26--47. New York: Oxford University Press, 1985.Google ScholarGoogle Scholar
  27. Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. Methods for Exploring and Mining Tables on Wikipedia. In Proceedings of the ACM SIGKDD Interactive Data Exploration and Analytics (IDEA). ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hoifung Poon, Janara Christensen, Pedro Domingos, Oren Etzioni, Raphael Hoffmann, Chloe Kiddon, Thomas Lin, Xiao Ling, Alan Ritter, Stefan Schoenmackers, et al. Machine reading at the university of washington. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 87--95. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jonathan Gordon and Benjamin Van Durme. Reporting bias and knowledge acquisition. In Automated Knowledge Base Construction (AKBC): The 3rd Workshop on Knowledge Extraction at CIKM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Fei Huang, Arun Ahuja, Doug Downey, Yi Yang, Yuhong Guo, and Alexander Yates. Learning Representations for Weakly Supervised Natural Language Processing Tasks. Computational Linguistics, xx:yy, 2013.Google ScholarGoogle Scholar
  31. Noah A Smith. Adversarial evaluation for models of natural language. arXiv preprint arXiv:1207.0245, 2012.Google ScholarGoogle Scholar
  32. Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jeff Mitchell and Mirella Lapata. Composition in distributional models of semantics. Cognitive Science, 34(8):1388--1429, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  34. Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1201--1211. Association for Computational Linguistics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jason Wolfe, Aria Haghighi, and Dan Klein. Fully distributed em for very large datasets. In ICML, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yi Yang, Alexander Yates, and Doug Downey. Overcoming the memory bottleneck in distributed training of latent variable models of text. In Proceedings of NAACL-HLT, pages 579--584, 2013.Google ScholarGoogle Scholar
  37. Burr Settles. Active learning literature survey. University of Wisconsin, Madison, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Michael Lucas and Doug Downey. Scaling semi-supervised naive bayes with feature marginals. In Proceedings of ACL, 2013.Google ScholarGoogle Scholar

Index Terms

  1. Using natural language to integrate, evaluate, and optimize extracted knowledge bases

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction
          October 2013
          124 pages
          ISBN:9781450324113
          DOI:10.1145/2509558

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 October 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          AKBC '13 Paper Acceptance Rate9of19submissions,47%Overall Acceptance Rate9of19submissions,47%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader