ABSTRACT
Businesses and professional organizations from a variety of different domains such as finance, weather, healthcare, social networks, etc., produce massive amounts of unstructured, semi-structured and structured data. Knowledge bases, enable querying and analysis of integrated content derived from such data available as open, third party and propriety data sets. Many knowledge bases today, provide an entity-centric view over the integrated content by using domain-specific ontologies. These entity-centric views enable querying individual real-world entities, as well as exploring exact information (such as address or net revenue of a company) through explicit querying using languages such as SQL or SPARQL. Although very useful for many business and commercial applications, this may not be sufficient for the exploration of relevant and context specific information associated with real-world entities stored in these knowledge bases. Users often need to resort to a manual and tedious process of exploration using ad-hoc queries to gather the required information.
To enhance user experience and ameliorate the problem of relevant data exploration, we propose the concept of Rich Entities. These rich entities comprise of all the relevant and context specific information grouped together around real-world entities and served as efficient and meaningful responses to user queries against these entities in a knowledge base. These rich entities are created by grouping together information not only from a single entity represented as an ontology concept, but also related concepts and properties as specified by the domain ontology. In this paper we propose several novel techniques and algorithms to automatically detect, learn, and create domain-specific rich entities. We use inputs from query patterns in existing query workloads against knowledge bases, and leverage the structure and relationships between entities defined in the domain ontology. Our techniques are very effective and can be applied to a wide variety of application domains thus adding great value to data exploration and information extraction from entity-centric real-world knowledge bases.
- Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Proc. of 20th Intl. Conf. on VLDB. 487--499. Google ScholarDigital Library
- Ulrik Brandes. 2001. A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology 25 (2001), 163--177.Google ScholarCross Ref
- Michael J. Cafarella, Doug Downey, Stephen Soderland, and Oren Etzioni. 2005. KnowItNow: Fast, Scalable Information Extraction from the Web. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT '05). Association for Computational Linguistics, Stroudsburg, PA, USA, 563--570. Google ScholarDigital Library
- Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale Information Extraction in Knowitall: (Preliminary Results). In Proceedings of the 13th International Conference on World Wide Web (WWW '04). ACM, New York, NY, USA, 100--110. Google ScholarDigital Library
- Christiane Fellbaum (Ed.). 1998. WordNet: an electronic lexical database. MIT Press.Google Scholar
- Dániel Fogaras. 2003. Where to Start Browsing the Web? Springer Berlin Heidelberg, Berlin, Heidelberg, 65--79.Google Scholar
- Thomas R. Gruber. 1993. A Translation Approach to Portable Ontology Specifications. Knowl. Acquis. 5, 2 (June 1993), 199--220. Google ScholarDigital Library
- Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM 46, 5 (Sept. 1999), 604--632. Google ScholarDigital Library
- W3C OWL Working Group. 27 October 2009. OWL 2 Web Ontology Language: Document Overview. W3C Recommendation. Available at http://www.w3.org/TR/owl2-overview/.Google Scholar
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66. Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/Previous number=SIDL-WP-1999-0120.Google Scholar
- Abdul Quamar, Fatma Ozcan, and Konstantinos Xirogiannopoulos. {n. d.}. Discovery and Creation of Rich Entities for Knowledge Bases, Technical Report: IBM Research, 2017. https://ibm.box.com/s/ghccpqept9b8zsxnrc15zrwvc5v5zhi9. ({n. d.}).Google Scholar
- Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R. Mittal, and Fatma Özean. 2016. ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores. PVLDB 9, 12 (2016), 1209--1220. http://www.vldb.org/pvldb/vol9/p1209-saha.pdf Google ScholarDigital Library
- Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A Core of Semantic Knowledge. In Proceedings of the 16th International Conference on World Wide Web (WWW '07). ACM, New York, NY, USA, 697--706. Google ScholarDigital Library
Index Terms
Discovery and Creation of Rich Entities for Knowledge Bases
Recommendations
Discovering and disambiguating named entities in text
SIGMOD'13 PhD Symposium: Proceedings of the 2013 SIGMOD/PODS Ph.D. symposiumDisambiguating named entities in natural language texts maps ambiguous names to canonical entities registered in a knowledge base such as DBpedia, Freebase, or YAGO. Knowing the specific entity is an important asset for several other tasks, e.g. entity-...
Ranking Entities for Web Queries Through Text and Knowledge
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementWhen humans explain complex topics, they naturally talk about involved entities, such as people, locations, or events. In this paper, we aim at automating this process by retrieving and ranking entities that are relevant to understand free-text web-...
Question Answering over Knowledge Bases
The Semantic Web: ESWC 2018 Satellite EventsAbstractThe fast growth of the Semantic Web has unleashed its potentialities, leading to the development of many tools and services that can exploit the huge amount of information it contains. As more semantic information is available online, mainly in ...
Comments