Abstract
Most entity ranking research aims to retrieve a ranked list of entities from a Web corpus given a user query. The rank order of entities is determined by the relevance between the query and contexts of entities. However, entities can be ranked directly based on their relative importance in a document collection, independent of any queries. In this paper, we introduce an entity ranking algorithm named NERank+. Given a document collection, NERank+ first constructs a graph model called Topical Tripartite Graph, consisting of document, topic and entity nodes. We design separate ranking functions to compute the prior ranks of entities and topics, respectively. A meta-path constrained random walk algorithm is proposed to propagate prior entity and topic ranks based on the graph model.We evaluate NERank+ over real-life datasets and compare it with baselines. Experimental results illustrate the effectiveness of our approach.
Similar content being viewed by others
References
Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN System, 1998, 30(1–7): 107–117
Ganesan K, Zhai C X. Opinion-based entity ranking. Information Retrieval, 2013, 15(2): 116–150
Mihalcea R, Tarau P. Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004, 404–411
Vries A, Vercoustre A, Thom J, Craswell N, Lalmas M. Overview of the INEX 2007 entity ranking track. In: Proceedings of the 6th International Workshop of the Initiative for the Evaluation of XML Retrieval. 2007, 245–251
Balog K, Vries A, Serdyukov P, Thomas P, Westerveld T. Overview of the TREC 2009 entity track. In: Proceedings of the 18th Text REtrieval Conference. 2009, 245–251
Wang C Y, Zhang R, He X F, Zhou G M, Zhou A Y. NERank: bringing order to named entities from texts. In: Proceedings of the 18th Asia-Pacific Web Conference. 2016, 1–13
Balog K, Rijke M. Determining expert profiles (with an application to expert finding). In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007, 2657–2662
Nie Z Q, Zhang Y Z, Wen J R, Ma WY. Object-level ranking: bringing order to Web objects. In: Proceedings of the 14th international conference on World Wide Web. 2005, 567–574
Lee S, Song S, Kahng M, Lee D, Lee S. Random walk based entity ranking on graph for multidimensional recommendation. In: Proceedings of ACM Conference on Recommender Systems. 2011, 93–100
Haveliwala T. Topic-sensitive pagerank. In: Proceedings of the 11th International World Wide Web Conference. 2002, 517–526
Kaptein R, Serdyukov P, Vries A, Kamps J. Entity ranking using Wikipedia as a pivot. In: Proceedings of the 19th ACM Conference on Information and Knowledge Management. 2010, 69–78
Ilieva E, Michel S, Stupar A. The essence of knowledge (bases) through entity rankings. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 1537–1540
Erkan G, Radev D. Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 2004, 22: 457–479
Zhang W, Feng W, Wang J Y. Integrating semantic relatedness and words’ intrinsic features for keyword extraction. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 2225–2231
Kim Y, Kim M, Cattle A, Otmakhova J, Park S, Shin H. Applying graph-based keyword extraction to document retrieval. In: Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013, 864–868
Wang J H, Liu J Y, Wang C. Keyword extraction based on pagerank. In: Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2007, 857–864
Wang R, Liu W, McDonald C. Using word embeddings to enhance keyword identification for scientific publications. In: Proceedings of the 26th Australasian Database Conference. 2015, 257–268
Hofmann K, Tsagkias M, Meij E, Rijke M. The impact of document structure on keyphrase extraction. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 1725–1728
Meij E, Weerkamp W, Rijke M. Adding semantics to microblog posts. In: Proceedings of the 5th International Conference onWeb Search and Web Data Mining. 2012, 563–572
Cornolti M, Ferragina P, Ciaramita M. A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd International World Wide Web Conference. 2013, 249–260
Usbeck R, Röder M, Ngomo A, Baron C, Both A, Brümmer M, Ceccarelli D, Cornolti M, Cherix D, Eickmann B, Ferragina P, Lemke C, Moro A, Navigli R, Piccinno F, Rizzo G, Sack H, Speck R, Troncy R, Waitelonis J, Wesemann L. GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th International Conference on World Wide Web. 2015, 1133–1143
Finkel J, Grenager T, Manning C. Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. 2005, 363–370
Jijkoun V, Khalid M, Marx M, Rijke M. Named entity normalization in user generated content. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data. 2008, 23–30
Blei D, Ng A, Jordan M. Latent dirichlet allocation. Journal ofMachine Learning Research, 2003, 3: 993–1022
Shen W, Wang J Y, Luo P, Wang M. LINDEN: linking named entities with knowledge base via semantic knowledge. In: Proceedings of the 21st World Wide Web Conference. 2012, 449–458
Lao N, Cohen W. Relational retrieval using a combination of pathconstrained random walks. Machine Learning, 2010, 81(1): 53–67
Tran G, Alrifai M, Nguyen D. Predicting relevant news events for timeline summaries. In: Proceedings of the 22nd International World Wide Web Conference (Companion Volume). 2013, 91–92
Tran G, Alrifai M, Herder E. Timeline summarization from relevant headlines. In: Proceedings of the 37th European Conference on Information Retrieval Research. 2015, 245–256
Zaragoza H, Rode H, Mika P, Atserias J, Ciaramita M, Attardi G. Ranking very many typed entities on wikipedia. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management. 2007, 1015–1018
Acknowledgements
This work was partially supported by the National Key Research and Development Program of China (2016YFB1000904), Shanghai Agriculture Applied Technology Development Program, China (T20150302) and NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization (U1509219). Chengyu Wang would like to thank the East China Normal University Outstanding Doctoral Dissertation Cultivation Plan of Action (YB2016040) for the support of his research. An earlier version of this paper “NERank: Bringing Order to Named Entities from Texts” was presented at the 18th Asia-Pacific Web Conference.
Author information
Authors and Affiliations
Corresponding author
Additional information
Chengyu Wang is a PhD candidate in School of Computer Science and Software Engineering, East China Normal University (ECNU), China. He received his BE degree in software engineering from ECNU in 2015. His research interests include Web data mining, information extraction, and natural language processing. He is working on the construction and application of large-scale knowledge graphs.
Guomin Zhou is an associate professor and vice director at Department of Computer and Information Technology, Zhejiang Police College, China. His research interests include intelligent information processing and big data informatization.
Xiaofeng He is a professor in computer science at School of Computer Science and Software Engineering, East China Normal University, China. He obtained his PhD degree from Pennsylvania State University, USA. His research interests include machine learning, data mining, and information retrieval. Prior to joining ECNU, he worked at Microsoft, Yahoo Labs and Lawrence Berkeley National Laboratory.
Aoying Zhou is a professor in computer science at East China Normal University (ECNU), China where he is heading the School of Data Science and Engineering. He got his master and bachelor degree in computer science from Sichuan University, China in 1988 and 1985 respectively, and won his PhD degree from Fudan University, China in 1993. Before joining ECNU in 2008, he worked for Fudan University at the Computer Science Department from 1993 to 2007, where he served as the department chair from 1999 to 2002. He worked as a visiting scholar under the Berkeley Scholar Program in UC Berkeley, USA in 2005. He is the winner of the National Science Fund for Distinguished Young Scholars supported by NSFC and the professorship appointment under Changjiang Scholars Program of Ministry of Education. He is now acting as the vice-director of ACMSIGMOD China and Technology Committee on Database of China Computer Federation. He is serving as member of the editorial boards of some prestigious academic journals, such as VLDB Journal, WWW Journal. His research interests include Web data management, data management for data-intensive computing, and in-memory data analytics.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Wang, C., Zhou, G., He, X. et al. NERank+: a graph-based approach for entity ranking in document collections. Front. Comput. Sci. 12, 504–517 (2018). https://doi.org/10.1007/s11704-017-6471-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-017-6471-4