ABSTRACT
Entity information management (EIM) deals with organizing, processing and delivering information about entities. Its emergence is a result of satisfying more sophisticated information needs that go beyond document search. In the recent years, entity retrieval has attracted much attention in the IR community. INEX has started the XML Entity Ranking track since 2007 and TREC has launched the Entity track since 2009 to investigate the problem of related entity finding. Some EIM problems go beyond retrieval and ranking such as: 1) entity profiling, which is about characterizing a specific entity, and 2) entity distillation, which is about discovering the trend about an entity. These problems have received less attention while they have many important applications.
On the other hand, the entities in the real world or in the Web environment are usually not isolated. They are connected or related with each other in one way or another. For example, the coauthorship makes the authors with similar research interests be connected. The emergence of social media such as Facebook, Twitter and Youtube has further interweaved the related entities in a much larger scale. Millions of users in these sites can become friends, fans or followers of others, or taggers or commenters of different types of entities (e.g., bookmarks, photos and videos). These networks are complex in the sense that they are heterogeneous with multiple types of entities and of interactions, they are large-scale, they are multi-lingual, and they are dynamic. These features of the complex networks go beyond traditional social network analysis and require further research.
In this proposed research, I investigate entity information management in the environment of complex networks. The main research question is: how can the EIM tasks be facilitated by modeling the content and structure of complex networks? The research is in the intersection of content based information retrieval and complex network analysis, which deals with both unstructured text data and structured networks. The specific targeting EIM tasks are entity retrieval, entity profiling and entity distillation. In addition to the main research question, the following questions are considered: How can we accomplish a EIM task involving diverse entity and interaction types? How to model the evolution of entity profiles as well as the underlying complex networks? How can the existing cross-language IR work be leveraged to build entity profiles with multi-lingual evidence?
I propose to use probabilistic models and discriminative models in particular to address the above research questions. In my research, I have developed discriminative models for expert search to integrate arbitrary document features [3] and to learn flexible combination strategies to rank experts in heterogeneous information sources [1]. Discriminative graphical models are proposed to jointly discover homepages by inference on the homepage dependence network [2]. The dependence of table elements is exploited to collectively perform the entity retrieval task [4]. These works have shown the power of discriminative models for entity search and the benefits of utilizing the dependencies among related entities. What I would like to do next is to develop a unified probabilistic framework to investigate the research questions raised in this proposal.
- Y. Fang, L. Si, and A. Mathur. Ranking experts with discriminative probabilistic models. In Proceedings of SIGIR Workshops, 2009.Google Scholar
- Y. Fang, L. Si, and A. Mathur. Discriminative graphical models for faculty homepage discovery. Information Retrieval, 2010. Google ScholarDigital Library
- Y. Fang, L. Si, and A. Mathur. Discriminative models of integrating document evidence and document-candidate associations for expert search. In Proceedings of SIGIR, 2010. Google ScholarDigital Library
- Y. Fang, L. Si, Z. Yu, Y. Xian, and Y. Xu. Entity retrieval by hierarchical relevance model, exploiting the structure of tables and learning homepage classifiers. In Proceedings of TREC-18, 2009.Google Scholar
Index Terms
- Entity information management in complex networks
Recommendations
Entity linking and retrieval
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalThis full-day tutorial presents a comprehensive introduction to entity linking and retrieval. Part I provides a detailed overview of entity linking: identifying and disambiguating entity occurrences in unstructured text. Part II focuses on entity ...
Entity profiling with varying source reliabilities
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningThe rapid growth of information sources on the Web has intensified the problem of data quality. In particular, the same real world entity may be described by different sources in various ways with overlapping information, and possibly conflicting or ...
Explore Entity Embedding Effectiveness in Entity Retrieval
Chinese Computational LinguisticsAbstractThis paper explores entity embedding effectiveness in ad-hoc entity retrieval, which introduces distributed representation of entities into entity retrieval. The knowledge graph contains lots of knowledge and models entity semantic relations with ...
Comments