NERank+: a graph-based approach for entity ranking in document collections

Wang, Chengyu; Zhou, Guomin; He, Xiaofeng; Zhou, Aoying

doi:10.1007/s11704-017-6471-4

NERank+: a graph-based approach for entity ranking in document collections

Research Article
Published: 11 May 2018

Volume 12, pages 504–517, (2018)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Chengyu Wang¹,
Guomin Zhou²,
Xiaofeng He¹ &
…
Aoying Zhou³

126 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Most entity ranking research aims to retrieve a ranked list of entities from a Web corpus given a user query. The rank order of entities is determined by the relevance between the query and contexts of entities. However, entities can be ranked directly based on their relative importance in a document collection, independent of any queries. In this paper, we introduce an entity ranking algorithm named NERank+. Given a document collection, NERank+ first constructs a graph model called Topical Tripartite Graph, consisting of document, topic and entity nodes. We design separate ranking functions to compute the prior ranks of entities and topics, respectively. A meta-path constrained random walk algorithm is proposed to propagate prior entity and topic ranks based on the graph model.We evaluate NERank+ over real-life datasets and compare it with baselines. Experimental results illustrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of density based clustering algorithms

Article 29 September 2020

Panthadeep Bhattacharjee & Pinaki Mitra

Named Entity Recognition Datasets: A Classification Framework

Article Open access 28 March 2024

Ying Zhang & Gang Xiao

A two-stage entity event deduplication method based on graph node selection and node optimization strategy

Article 07 February 2024

Wei Ai, Jia Xu, … Keqin Li

References

Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN System, 1998, 30(1–7): 107–117
Article Google Scholar
Ganesan K, Zhai C X. Opinion-based entity ranking. Information Retrieval, 2013, 15(2): 116–150
Article Google Scholar
Mihalcea R, Tarau P. Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004, 404–411
Google Scholar
Vries A, Vercoustre A, Thom J, Craswell N, Lalmas M. Overview of the INEX 2007 entity ranking track. In: Proceedings of the 6th International Workshop of the Initiative for the Evaluation of XML Retrieval. 2007, 245–251
Google Scholar
Balog K, Vries A, Serdyukov P, Thomas P, Westerveld T. Overview of the TREC 2009 entity track. In: Proceedings of the 18th Text REtrieval Conference. 2009, 245–251
Google Scholar
Wang C Y, Zhang R, He X F, Zhou G M, Zhou A Y. NERank: bringing order to named entities from texts. In: Proceedings of the 18th Asia-Pacific Web Conference. 2016, 1–13
Google Scholar
Balog K, Rijke M. Determining expert profiles (with an application to expert finding). In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007, 2657–2662
Google Scholar
Nie Z Q, Zhang Y Z, Wen J R, Ma WY. Object-level ranking: bringing order to Web objects. In: Proceedings of the 14th international conference on World Wide Web. 2005, 567–574
Chapter Google Scholar
Lee S, Song S, Kahng M, Lee D, Lee S. Random walk based entity ranking on graph for multidimensional recommendation. In: Proceedings of ACM Conference on Recommender Systems. 2011, 93–100
Google Scholar
Haveliwala T. Topic-sensitive pagerank. In: Proceedings of the 11th International World Wide Web Conference. 2002, 517–526
Google Scholar
Kaptein R, Serdyukov P, Vries A, Kamps J. Entity ranking using Wikipedia as a pivot. In: Proceedings of the 19th ACM Conference on Information and Knowledge Management. 2010, 69–78
Google Scholar
Ilieva E, Michel S, Stupar A. The essence of knowledge (bases) through entity rankings. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 1537–1540
Google Scholar
Erkan G, Radev D. Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 2004, 22: 457–479
Google Scholar
Zhang W, Feng W, Wang J Y. Integrating semantic relatedness and words’ intrinsic features for keyword extraction. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 2225–2231
Google Scholar
Kim Y, Kim M, Cattle A, Otmakhova J, Park S, Shin H. Applying graph-based keyword extraction to document retrieval. In: Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013, 864–868
Google Scholar
Wang J H, Liu J Y, Wang C. Keyword extraction based on pagerank. In: Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2007, 857–864
Chapter Google Scholar
Wang R, Liu W, McDonald C. Using word embeddings to enhance keyword identification for scientific publications. In: Proceedings of the 26th Australasian Database Conference. 2015, 257–268
Google Scholar
Hofmann K, Tsagkias M, Meij E, Rijke M. The impact of document structure on keyphrase extraction. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 1725–1728
Google Scholar
Meij E, Weerkamp W, Rijke M. Adding semantics to microblog posts. In: Proceedings of the 5th International Conference onWeb Search and Web Data Mining. 2012, 563–572
Google Scholar
Cornolti M, Ferragina P, Ciaramita M. A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd International World Wide Web Conference. 2013, 249–260
Chapter Google Scholar
Usbeck R, Röder M, Ngomo A, Baron C, Both A, Brümmer M, Ceccarelli D, Cornolti M, Cherix D, Eickmann B, Ferragina P, Lemke C, Moro A, Navigli R, Piccinno F, Rizzo G, Sack H, Speck R, Troncy R, Waitelonis J, Wesemann L. GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th International Conference on World Wide Web. 2015, 1133–1143
Chapter Google Scholar
Finkel J, Grenager T, Manning C. Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. 2005, 363–370
Google Scholar
Jijkoun V, Khalid M, Marx M, Rijke M. Named entity normalization in user generated content. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data. 2008, 23–30
Chapter Google Scholar
Blei D, Ng A, Jordan M. Latent dirichlet allocation. Journal ofMachine Learning Research, 2003, 3: 993–1022
MATH Google Scholar
Shen W, Wang J Y, Luo P, Wang M. LINDEN: linking named entities with knowledge base via semantic knowledge. In: Proceedings of the 21st World Wide Web Conference. 2012, 449–458
Google Scholar
Lao N, Cohen W. Relational retrieval using a combination of pathconstrained random walks. Machine Learning, 2010, 81(1): 53–67
Article MathSciNet Google Scholar
Tran G, Alrifai M, Nguyen D. Predicting relevant news events for timeline summaries. In: Proceedings of the 22nd International World Wide Web Conference (Companion Volume). 2013, 91–92
Chapter Google Scholar
Tran G, Alrifai M, Herder E. Timeline summarization from relevant headlines. In: Proceedings of the 37th European Conference on Information Retrieval Research. 2015, 245–256
Google Scholar
Zaragoza H, Rode H, Mika P, Atserias J, Ciaramita M, Attardi G. Ranking very many typed entities on wikipedia. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management. 2007, 1015–1018
Google Scholar

Download references

Acknowledgements

This work was partially supported by the National Key Research and Development Program of China (2016YFB1000904), Shanghai Agriculture Applied Technology Development Program, China (T20150302) and NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization (U1509219). Chengyu Wang would like to thank the East China Normal University Outstanding Doctoral Dissertation Cultivation Plan of Action (YB2016040) for the support of his research. An earlier version of this paper “NERank: Bringing Order to Named Entities from Texts” was presented at the 18th Asia-Pacific Web Conference.

Author information

Authors and Affiliations

Shanghai Key Laboratory of Trustworthy Computing, School of Computer Science and Software Engineering, East China Normal University, Shanghai, 200062, China
Chengyu Wang & Xiaofeng He
Department of Computer and Information Technology, Zhejiang Police College, Hangzhou, 310053, China
Guomin Zhou
School of Data Science and Engineering, East China Normal University, Shanghai, 200062, China
Aoying Zhou

Authors

Chengyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guomin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng He
View author publications
You can also search for this author in PubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofeng He.

Additional information

Chengyu Wang is a PhD candidate in School of Computer Science and Software Engineering, East China Normal University (ECNU), China. He received his BE degree in software engineering from ECNU in 2015. His research interests include Web data mining, information extraction, and natural language processing. He is working on the construction and application of large-scale knowledge graphs.

Guomin Zhou is an associate professor and vice director at Department of Computer and Information Technology, Zhejiang Police College, China. His research interests include intelligent information processing and big data informatization.

Xiaofeng He is a professor in computer science at School of Computer Science and Software Engineering, East China Normal University, China. He obtained his PhD degree from Pennsylvania State University, USA. His research interests include machine learning, data mining, and information retrieval. Prior to joining ECNU, he worked at Microsoft, Yahoo Labs and Lawrence Berkeley National Laboratory.

Aoying Zhou is a professor in computer science at East China Normal University (ECNU), China where he is heading the School of Data Science and Engineering. He got his master and bachelor degree in computer science from Sichuan University, China in 1988 and 1985 respectively, and won his PhD degree from Fudan University, China in 1993. Before joining ECNU in 2008, he worked for Fudan University at the Computer Science Department from 1993 to 2007, where he served as the department chair from 1999 to 2002. He worked as a visiting scholar under the Berkeley Scholar Program in UC Berkeley, USA in 2005. He is the winner of the National Science Fund for Distinguished Young Scholars supported by NSFC and the professorship appointment under Changjiang Scholars Program of Ministry of Education. He is now acting as the vice-director of ACMSIGMOD China and Technology Committee on Database of China Computer Federation. He is serving as member of the editorial boards of some prestigious academic journals, such as VLDB Journal, WWW Journal. His research interests include Web data management, data management for data-intensive computing, and in-memory data analytics.

Electronic supplementary material

Supplementary material, approximately 250 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Zhou, G., He, X. et al. NERank+: a graph-based approach for entity ranking in document collections. Front. Comput. Sci. 12, 504–517 (2018). https://doi.org/10.1007/s11704-017-6471-4

Download citation

Received: 27 September 2016
Accepted: 22 February 2017
Published: 11 May 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11704-017-6471-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NERank+: a graph-based approach for entity ranking in document collections

Abstract

Access this article

Similar content being viewed by others

A survey of density based clustering algorithms

Named Entity Recognition Datasets: A Classification Framework

A two-stage entity event deduplication method based on graph node selection and node optimization strategy

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 250 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

NERank+: a graph-based approach for entity ranking in document collections

Abstract

Access this article

Similar content being viewed by others

A survey of density based clustering algorithms

Named Entity Recognition Datasets: A Classification Framework

A two-stage entity event deduplication method based on graph node selection and node optimization strategy

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 250 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation