Skip to main content
Log in

NERank+: a graph-based approach for entity ranking in document collections

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Most entity ranking research aims to retrieve a ranked list of entities from a Web corpus given a user query. The rank order of entities is determined by the relevance between the query and contexts of entities. However, entities can be ranked directly based on their relative importance in a document collection, independent of any queries. In this paper, we introduce an entity ranking algorithm named NERank+. Given a document collection, NERank+ first constructs a graph model called Topical Tripartite Graph, consisting of document, topic and entity nodes. We design separate ranking functions to compute the prior ranks of entities and topics, respectively. A meta-path constrained random walk algorithm is proposed to propagate prior entity and topic ranks based on the graph model.We evaluate NERank+ over real-life datasets and compare it with baselines. Experimental results illustrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN System, 1998, 30(1–7): 107–117

    Article  Google Scholar 

  2. Ganesan K, Zhai C X. Opinion-based entity ranking. Information Retrieval, 2013, 15(2): 116–150

    Article  Google Scholar 

  3. Mihalcea R, Tarau P. Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004, 404–411

    Google Scholar 

  4. Vries A, Vercoustre A, Thom J, Craswell N, Lalmas M. Overview of the INEX 2007 entity ranking track. In: Proceedings of the 6th International Workshop of the Initiative for the Evaluation of XML Retrieval. 2007, 245–251

    Google Scholar 

  5. Balog K, Vries A, Serdyukov P, Thomas P, Westerveld T. Overview of the TREC 2009 entity track. In: Proceedings of the 18th Text REtrieval Conference. 2009, 245–251

    Google Scholar 

  6. Wang C Y, Zhang R, He X F, Zhou G M, Zhou A Y. NERank: bringing order to named entities from texts. In: Proceedings of the 18th Asia-Pacific Web Conference. 2016, 1–13

    Google Scholar 

  7. Balog K, Rijke M. Determining expert profiles (with an application to expert finding). In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2007, 2657–2662

    Google Scholar 

  8. Nie Z Q, Zhang Y Z, Wen J R, Ma WY. Object-level ranking: bringing order to Web objects. In: Proceedings of the 14th international conference on World Wide Web. 2005, 567–574

    Chapter  Google Scholar 

  9. Lee S, Song S, Kahng M, Lee D, Lee S. Random walk based entity ranking on graph for multidimensional recommendation. In: Proceedings of ACM Conference on Recommender Systems. 2011, 93–100

    Google Scholar 

  10. Haveliwala T. Topic-sensitive pagerank. In: Proceedings of the 11th International World Wide Web Conference. 2002, 517–526

    Google Scholar 

  11. Kaptein R, Serdyukov P, Vries A, Kamps J. Entity ranking using Wikipedia as a pivot. In: Proceedings of the 19th ACM Conference on Information and Knowledge Management. 2010, 69–78

    Google Scholar 

  12. Ilieva E, Michel S, Stupar A. The essence of knowledge (bases) through entity rankings. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 1537–1540

    Google Scholar 

  13. Erkan G, Radev D. Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 2004, 22: 457–479

    Google Scholar 

  14. Zhang W, Feng W, Wang J Y. Integrating semantic relatedness and words’ intrinsic features for keyword extraction. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 2225–2231

    Google Scholar 

  15. Kim Y, Kim M, Cattle A, Otmakhova J, Park S, Shin H. Applying graph-based keyword extraction to document retrieval. In: Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013, 864–868

    Google Scholar 

  16. Wang J H, Liu J Y, Wang C. Keyword extraction based on pagerank. In: Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2007, 857–864

    Chapter  Google Scholar 

  17. Wang R, Liu W, McDonald C. Using word embeddings to enhance keyword identification for scientific publications. In: Proceedings of the 26th Australasian Database Conference. 2015, 257–268

    Google Scholar 

  18. Hofmann K, Tsagkias M, Meij E, Rijke M. The impact of document structure on keyphrase extraction. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 1725–1728

    Google Scholar 

  19. Meij E, Weerkamp W, Rijke M. Adding semantics to microblog posts. In: Proceedings of the 5th International Conference onWeb Search and Web Data Mining. 2012, 563–572

    Google Scholar 

  20. Cornolti M, Ferragina P, Ciaramita M. A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd International World Wide Web Conference. 2013, 249–260

    Chapter  Google Scholar 

  21. Usbeck R, Röder M, Ngomo A, Baron C, Both A, Brümmer M, Ceccarelli D, Cornolti M, Cherix D, Eickmann B, Ferragina P, Lemke C, Moro A, Navigli R, Piccinno F, Rizzo G, Sack H, Speck R, Troncy R, Waitelonis J, Wesemann L. GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th International Conference on World Wide Web. 2015, 1133–1143

    Chapter  Google Scholar 

  22. Finkel J, Grenager T, Manning C. Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. 2005, 363–370

    Google Scholar 

  23. Jijkoun V, Khalid M, Marx M, Rijke M. Named entity normalization in user generated content. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data. 2008, 23–30

    Chapter  Google Scholar 

  24. Blei D, Ng A, Jordan M. Latent dirichlet allocation. Journal ofMachine Learning Research, 2003, 3: 993–1022

    MATH  Google Scholar 

  25. Shen W, Wang J Y, Luo P, Wang M. LINDEN: linking named entities with knowledge base via semantic knowledge. In: Proceedings of the 21st World Wide Web Conference. 2012, 449–458

    Google Scholar 

  26. Lao N, Cohen W. Relational retrieval using a combination of pathconstrained random walks. Machine Learning, 2010, 81(1): 53–67

    Article  MathSciNet  Google Scholar 

  27. Tran G, Alrifai M, Nguyen D. Predicting relevant news events for timeline summaries. In: Proceedings of the 22nd International World Wide Web Conference (Companion Volume). 2013, 91–92

    Chapter  Google Scholar 

  28. Tran G, Alrifai M, Herder E. Timeline summarization from relevant headlines. In: Proceedings of the 37th European Conference on Information Retrieval Research. 2015, 245–256

    Google Scholar 

  29. Zaragoza H, Rode H, Mika P, Atserias J, Ciaramita M, Attardi G. Ranking very many typed entities on wikipedia. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management. 2007, 1015–1018

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Key Research and Development Program of China (2016YFB1000904), Shanghai Agriculture Applied Technology Development Program, China (T20150302) and NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization (U1509219). Chengyu Wang would like to thank the East China Normal University Outstanding Doctoral Dissertation Cultivation Plan of Action (YB2016040) for the support of his research. An earlier version of this paper “NERank: Bringing Order to Named Entities from Texts” was presented at the 18th Asia-Pacific Web Conference.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng He.

Additional information

Chengyu Wang is a PhD candidate in School of Computer Science and Software Engineering, East China Normal University (ECNU), China. He received his BE degree in software engineering from ECNU in 2015. His research interests include Web data mining, information extraction, and natural language processing. He is working on the construction and application of large-scale knowledge graphs.

Guomin Zhou is an associate professor and vice director at Department of Computer and Information Technology, Zhejiang Police College, China. His research interests include intelligent information processing and big data informatization.

Xiaofeng He is a professor in computer science at School of Computer Science and Software Engineering, East China Normal University, China. He obtained his PhD degree from Pennsylvania State University, USA. His research interests include machine learning, data mining, and information retrieval. Prior to joining ECNU, he worked at Microsoft, Yahoo Labs and Lawrence Berkeley National Laboratory.

Aoying Zhou is a professor in computer science at East China Normal University (ECNU), China where he is heading the School of Data Science and Engineering. He got his master and bachelor degree in computer science from Sichuan University, China in 1988 and 1985 respectively, and won his PhD degree from Fudan University, China in 1993. Before joining ECNU in 2008, he worked for Fudan University at the Computer Science Department from 1993 to 2007, where he served as the department chair from 1999 to 2002. He worked as a visiting scholar under the Berkeley Scholar Program in UC Berkeley, USA in 2005. He is the winner of the National Science Fund for Distinguished Young Scholars supported by NSFC and the professorship appointment under Changjiang Scholars Program of Ministry of Education. He is now acting as the vice-director of ACMSIGMOD China and Technology Committee on Database of China Computer Federation. He is serving as member of the editorial boards of some prestigious academic journals, such as VLDB Journal, WWW Journal. His research interests include Web data management, data management for data-intensive computing, and in-memory data analytics.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Zhou, G., He, X. et al. NERank+: a graph-based approach for entity ranking in document collections. Front. Comput. Sci. 12, 504–517 (2018). https://doi.org/10.1007/s11704-017-6471-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-017-6471-4

Keywords

Navigation