invited-talk

Harvesting, searching, and ranking knowledge on the web: invited talk

Author:

Gerhard WeikumAuthors Info & Claims

WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Pages 3 - 4

https://doi.org/10.1145/1498759.1498763

Published: 09 February 2009 Publication History

Abstract

There are major trends to advance the functionality of search engines to a more expressive semantic level (e.g., [2, 4, 6, 7, 8, 9, 13, 14, 18]). This is enabled by employing large-scale information extraction [1, 11, 20] of entities and relationships from semistructured as well as natural-language Web sources. In addition, harnessing Semantic-Web-style ontologies [22] and reaching into Deep-Web sources [16] can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision.

This talk presents ongoing research towards this objective, with emphasis on our work on the YAGO knowledge base [23, 24] and the NAGA search engine [14] but also covering related projects. YAGO is a large collection of entities and relational facts that are harvested from Wikipedia and WordNet with high accuracy and reconciled into a consistent RDF-style "semantic" graph. For further growing YAGO from Web sources while retaining its high quality, pattern-based extraction is combined with logic-based consistency checking in a unified framework [25]. NAGA provides graph-template-based search over this data, with powerful ranking capabilities based on a statistical language model for graphs. Advanced queries and the need for ranking approximate matches pose efficiency and scalability challenges that are addressed by algorithmic and indexing techniques [15, 17].

YAGO is publicly available and has been imported into various other knowledge-management projects including DB-pedia. YAGO shares many of its goals and methodologies with parallel projects along related lines. These include Avatar [19], Cimple/DBlife [10, 21], DBpedia [3], Know-ItAll/TextRunner [12, 5], Kylin/KOG [26, 27], and the Libra technology [18, 28] (and more). Together they form an exciting trend towards providing comprehensive knowledge bases with semantic search capabilities.

References

[1]

Eugene Agichtein: Scaling Information Extraction to Large Document Collections. IEEE Data Eng. Bull. 28(4), 2005

[2]

Kemafor Anyanwu, Angela Maduko, Amit P. Sheth: SemRank: Ranking Complex Relationship Search Results on the Semantic Web. WWW 2005

Digital Library

[3]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary G. Ives: DBpedia: A Nucleus for a Web of Open Data. ISWC/ASWC 2007

Digital Library

[4]

Ricardo A. Baeza-Yates, Massimiliano Ciaramita, Peter Mika, Hugo Zaragoza: Towards Semantic Search. NLDB 2008

Digital Library

[5]

Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni: Open Information Extraction from the Web. IJCAI 2007

Digital Library

[6]

Holger Bast, Alexandru Chitea, Fabian M. Suchanek, Ingmar Weber: ESTER: Efficient search on Text, Entities, and Relations. SIGIR 2007

Digital Library

[7]

Michael J. Cafarella: Extracting and Querying a Comprehensive Web Database. CIDR 2009

[8]

Soumen Chakrabarti: Breaking Through the Syntax Barrier: Searching with Entities and Relations. ECML 2004

[9]

Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang: EntityRank: Searching Entities Directly and Holistically. VLDB 2007

Digital Library

[10]

Pedro DeRose, Warren Shen, Fei Chen, AnHai Doan, Raghu Ramakrishnan: Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach. VLDB 2007

Digital Library

[11]

AnHai Doan, Luis Gravano, Raghu Ramakrishnan, Shivakumar Vaithyanathan (Editors): Special Issue on Information Extraction, SIGMOD Record 37(4), December 2008

Digital Library

[12]

Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artif. Intell. 165(1), 2005

Digital Library

[13]

Jens Graupmann, Ralf Schenkel, Gerhard Weikum: The SphereSearch Engine for Unified Ranked Retrieval of Heterogeneous XML and Web Documents. VLDB 2005

Digital Library

[14]

Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, Gerhard Weikum: NAGA: Searching and Ranking Knowledge. ICDE 2008

Digital Library

[15]

Gjergji Kasneci, Maya Ramanath, Mauro Sozio, Fabian M. Suchanek, Gerhard Weikum: STAR: Steiner Tree Approximation in Relationship-Graphs. ICDE 2009

Digital Library

[16]

Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy: Harnessing the Deep Web: Present and Future. CIDR 2009

[17]

Thomas Neumann, Gerhard Weikum. RDF-3X: a RISC-style Engine for RDF. PVLDB 1(1), 2008

Digital Library

[18]

Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-Rong Wen, Wei-Ying Ma: Web Object Retrieval. WWW 2007

Digital Library

[19]

Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, Shivakumar Vaithyanathan: An Algebraic Approach to Rule-Based Information Extraction. ICDE 2008

Digital Library

[20]

Sunita Sarawagi: Information Extraction. Foundations and Trends in Databases 2(1), 2008

Digital Library

[21]

Warren Shen, AnHai Doan, Jeffrey F. Naughton, Raghu Ramakrishnan: Declarative Information Extraction Using Datalog with Embedded Extraction Predicates. VLDB 2007

Digital Library

[22]

Steffen Staab, Rudi Studer: Handbook on Ontologies, 2nd Edition. Springer 2008

Digital Library

[23]

Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum: YAGO: a Core of Semantic Knowledge. WWW 2007

Digital Library

[24]

Fabian Suchanek, Gjergji Kasneci, Gerhard Weikum: YAGO: A Large Ontology from Wikipedia and WordNet. Journal of Web Semantics 6(39, 2008

Digital Library

[25]

Fabian Suchanek, Mauro Sozio, Gerhard Weikum: SOFIE: a Self-Organizing Framework for Information Extraction. Technical Report MPI-I-2008-5-004, 2008

[26]

Fei Wu, Daniel S. Weld: Autonomously Semantifying Wikipedia. CIKM 2007

Digital Library

[27]

Fei Wu, Daniel S. Weld: Automatically Refining the wikipedia Infobox Ontology. WWW 2008

Digital Library

[28]

Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma: Simultaneous Record Detection and Attribute Labeling in Web Data Extraction. KDD 2006

Digital Library

Cited By

Rosaci D(2015)Finding semantic associations in hierarchically structured groups of Web dataFormal Aspects of Computing10.1007/s00165-015-0337-z27:5-6(867-884)Online publication date: 9-Jul-2015
https://doi.org/10.1007/s00165-015-0337-z
Weikum GTheobald MParedaens JVan Gucht D(2010)From information to knowledgeProceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems10.1145/1807085.1807097(65-76)Online publication date: 6-Jun-2010
https://dl.acm.org/doi/10.1145/1807085.1807097
Tellioğlu H(2009)Knowledge Management with SnapshotsLeveraging Knowledge for Innovation in Collaborative Networks10.1007/978-3-642-04568-4_31(293-300)Online publication date: 2009
https://doi.org/10.1007/978-3-642-04568-4_31

Index Terms

Harvesting, searching, and ranking knowledge on the web: invited talk
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation theory
      1. Systems theory
2. Mathematics of computing
  1. Information theory

Recommendations

Automatic gazette creation for named entity recognition and application to resume processing
COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies

Named entities are important content-carrying units within documents. Consequently named entity recognition (NER) is an important part of information extraction. One fast and accurate approach to NER uses a list or gazette consisting of known instances. ...
Entity query feature expansion using knowledge base links
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the ...
Co-occurrence and ranking of entities based on semantic annotation

This paper presents an extension of the KIM semantic annotation and search platform called CORE: Co-Occurrence and Ranking of Entities. It enables popularity timeline analysis and a novel faceted search interface. The idea and its advantages to the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

February 2009

314 pages

ISBN:9781605583907

DOI:10.1145/1498759

Editors:
Ricardo Baeza-Yates
Yahoo! Research, Spain
,
Paolo Boldi
Universita degli Studi di Milano, Italy
,
Berthier Ribeiro-Neto
Google Engineering, Brazil & CS Dept., Univ. Fed. de Minas Gerais, Brazil
,
B. Barla Cambazoglu
Yahoo! Research

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Yahoo! Research
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Nokia
Google Inc.
SIGIR: ACM Special Interest Group on Information Retrieval
Microsoft: Microsoft

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Invited-talk

Conference

WSDM'09

Sponsor:

WSDM'09: Second ACM International Conference on Web Search and Web Data Mining

February 9 - 12, 2009

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
714
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rosaci D(2015)Finding semantic associations in hierarchically structured groups of Web dataFormal Aspects of Computing10.1007/s00165-015-0337-z27:5-6(867-884)Online publication date: 9-Jul-2015
https://doi.org/10.1007/s00165-015-0337-z
Weikum GTheobald MParedaens JVan Gucht D(2010)From information to knowledgeProceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems10.1145/1807085.1807097(65-76)Online publication date: 6-Jun-2010
https://dl.acm.org/doi/10.1145/1807085.1807097
Tellioğlu H(2009)Knowledge Management with SnapshotsLeveraging Knowledge for Innovation in Collaborative Networks10.1007/978-3-642-04568-4_31(293-300)Online publication date: 2009
https://doi.org/10.1007/978-3-642-04568-4_31

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten