Abstract:
Information extraction at Web scale has become one of the most important research topics in data management since major commercial search engines started incorporating kn...Show MoreMetadata
Abstract:
Information extraction at Web scale has become one of the most important research topics in data management since major commercial search engines started incorporating knowledge in their search results a couple of years ago [1]. Users increasingly expect structured knowledge as answers to their search needs. Using Bing as an example, the result page for “Lionel Messi” is full of structured knowledge facts, such as his birthday and awards. The research efforts towards improving the accuracy and coverage of such knowledge bases have led to significant advances in Information Extraction techniques [2], [3]. As the initial challenge of accurately extracting facts for popular entities are being addressed, more difficult challenges have emerged such as extending knowledge coverage to long tail entities and domains, understanding interestingness and usefulness of facts within a given context, and addressing information-seeking needs more directly and accurately. In this tutorial, we will survey the recent research efforts and provide an introduction to the techniques that address those challenges, and the applications that benefit from the adoption of those techniques. In particular, this tutorial will focus on a variety of techniques that can be broadly viewed as knowledge inferencing, i.e., combining multiple data sources and extraction techniques to verify existing knowledge and derive new knowledge. More specifically, we focus on four main categories of inferencing techniques: 1) deep natural language processing using machine learning techniques, 2) data cleaning using integrity constraints, 3) large-scale probabilistic reasoning, and 4) leveraging human expertise for domain knowledge extraction.
Date of Conference: 13-17 April 2015
Date Added to IEEE Xplore: 01 June 2015
Electronic ISBN:978-1-4799-7964-6