Skip to main content

NERank: Bringing Order to Named Entities from Texts

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9931))

Included in the following conference series:

Abstract

Most entity ranking research aims to retrieve a ranked list of entities from a Web corpus given a query. However, entities in plain documents can be ranked directly based on their relative importance, in order to support entity-oriented Web applications. In this paper, we introduce an entity ranking algorithm NERank to address this issue. NERank first constructs a graph model called Topical Tripartite Graph from a document collection. A ranking function is designed to compute the prior ranks of topics based on three quality metrics. We further propose a meta-path constrained random walk method to propagate prior topic ranks to entities. We evaluate NERank over real-life datasets and compare it with baselines. Experimental results illustrate the effectiveness of our approach.

A preliminary version of this paper has been presented in WWW’16 [6]. This work is partially supported by NSFC under Grant No. 61402180, the Natural Science Foundation of Shanghai under Grant No. 14ZR1412600, Shanghai Agriculture Science Program (2015) Number 3-2 and NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization under Grant No. U1509219.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See background info at:

    https://en.wikipedia.org/wiki/Egyptian_Revolution_of_2011.

References

  1. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)

    Google Scholar 

  2. Ganesan, K., Zhai, C.: Opinion-based entity ranking. Inf. Retr. 15(2), 116–150 (2013)

    Article  Google Scholar 

  3. Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: EMNLP, pp. 404–411 (2004)

    Google Scholar 

  4. Kaptein, R., Serdyukov, P., de Vries, A.P., Kamps, J.: Entity ranking using wikipedia as a pivot. In: CIKM, pp. 69–78 (2010)

    Google Scholar 

  5. de Vries, A.P., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 entity ranking track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 245–251. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Wang, C., Zhang, R., He, X., Zhou, A.: NERank: ranking named entities in document collections. In: WWW, pp. 123–124 (2016)

    Google Scholar 

  7. Balog, K., de Rijke, M.: Determining expert profiles (with an application to expert finding). In: IJCAI, pp. 2657–2662 (2007)

    Google Scholar 

  8. Nie, Z., Zhang, Y., Wen, J., Ma, W.: Object-level ranking: bringing order to web objects. In: WWW, pp. 567–574 (2005)

    Google Scholar 

  9. Lee, S., Song, S., Kahng, M., Lee, D., Lee, S.: Random walk based entity ranking on graph for multidimensional recommendation. In: RecSys, pp. 93–100 (2011)

    Google Scholar 

  10. Haveliwala, T.H.: Topic-sensitive pagerank. In: WWW, pp. 517–526 (2002)

    Google Scholar 

  11. Ilieva, E., Michel, S., Stupar, A.: The essence of knowledge (bases) through entity rankings. In: CIKM, pp. 1537–1540 (2013)

    Google Scholar 

  12. Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: WSDM, pp. 563–572 (2012)

    Google Scholar 

  13. Cornolti, M., Ferragina, P., Ciaramita, M.: A framework for benchmarking entity-annotation systems. In: WWW, pp. 249–260 (2013)

    Google Scholar 

  14. Usbeck, R., Röder, M., Ngomo, A.N., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL: general entity annotator benchmarking framework. In: WWW, pp. 1133–1143 (2015)

    Google Scholar 

  15. Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL (2005)

    Google Scholar 

  16. Jijkoun, V., Khalid, M.A., Marx, M., de Rijke, M.: Named entity normalization in user generated content. In: AND, pp. 23–30 (2008)

    Google Scholar 

  17. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  18. Shen, W., Wang, J., Luo, P., Wang, M.: LINDEN: linking named entities with knowledge base via semantic knowledge. In: WWW, pp. 449–458 (2012)

    Google Scholar 

  19. Lao, N., Cohen, W.W.: Relational retrieval using a combination of path-constrained random walks. Mach. Learn. 81(1), 53–67 (2010)

    Article  MathSciNet  Google Scholar 

  20. Tran, G.B., Alrifai, M., Nguyen, D.Q.: Predicting relevant news events for timeline summaries. In: WWW, pp. 91–92 (2013)

    Google Scholar 

  21. Tran, G., Alrifai, M., Herder, E.: Timeline summarization from relevant headlines. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 245–256. Springer, Heidelberg (2015)

    Google Scholar 

  22. Zaragoza, H., Rode, H., Mika, P., Atserias, J., Ciaramita, M., Attardi, G.: Ranking very many typed entities on wikipedia. In: CIKM, pp. 1015–1018 (2007)

    Google Scholar 

  23. Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22, 457–479 (2004)

    Google Scholar 

  24. Kim, Y., Kim, M., Cattle, A., Otmakhova, J., Park, S., Shin, H.: Applying graph-based keyword extraction to document retrieval. In: IJCNLP, pp. 864–868 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng He .

Editor information

Editors and Affiliations

Appendix: Mathematical Analysis of NERank

Appendix: Mathematical Analysis of NERank

We prove that the random walk algorithm of NERank will converge and derive the close-form solution. Let \(\mathbf {T_n}\) denote the \(|T|\times 1\) matrix which represents the ranks of topics in the \(n^{th}\) iteration. Specially, \(\mathbf {T_0}\) is the prior rank matrix for topics. Let \(\mathbf {E_n}\) denote the \(|E|\times 1\) entity rank matrix in the \(n^{th}\) iteration. Based on the random walk process, the rank update of topics for TDT meta-path is formulated as: \(\mathbf {T_{n}}=\mathbf {\Theta _R}^T\mathbf {\Theta }\cdot \mathbf {T_{n-1}}\) where \(\mathbf {\Theta _R}\) is the row-normalized matrix of \(\mathbf {\Theta }\). Similarly, for TET meta-path, we have \(\mathbf {T_{n}}=\mathbf {\hat{\Phi }_C}\mathbf {\hat{\Phi }_R}^T\cdot \mathbf {T_{n-1}}\) where \(\mathbf {\hat{\Phi }_R}\) and \(\mathbf {\hat{\Phi }_C}\) are the row-normalized and column-normalized matrices of \(\mathbf {\hat{\Phi }}\), respectively. The update rule in one iteration is formulated as:

$$\begin{aligned} \mathbf {T_{n}}=\alpha \cdot \mathbf {\Theta _R}^T\mathbf {\Theta }\cdot \mathbf {T_{n-1}}+\beta \cdot \mathbf {\hat{\Phi }_C}\mathbf {\hat{\Phi }_R}^T\cdot \mathbf {T_{n-1}}+(1-\alpha -\beta )\cdot \mathbf {T_{0}} \end{aligned}$$

For simplicity, we define \(\mathbf {M}=\alpha \cdot \mathbf {\Theta _R}^T\mathbf {\Theta }+\beta \cdot \mathbf {\hat{\Phi }_C}\mathbf {\hat{\Phi }_R}^T\). By iteration, we have \(\mathbf {T_{n}}=\mathbf {M}^n \cdot \mathbf {T_{0}}+(1-\alpha -\beta )\cdot \sum _{i=0}^{n-1}\mathbf {M}^i\cdot \mathbf {T_{0}}\). Because \(\lim _ {n\rightarrow \infty }\mathbf {M}^n=\mathbf {0}\) and \(\lim _ {n\rightarrow \infty }\sum _{i=0}^{n-1}\mathbf {M}^i=(\mathbf {I}-\mathbf {M})^{-1}\), the limit of matrix series \(\{\mathbf {T_n}\}\) is derived as:

$$\begin{aligned} \lim _ {n\rightarrow \infty }\mathbf {T_{n}} =\lim _ {n\rightarrow \infty }\mathbf {M}^n \cdot \mathbf {T_{0}}+(1-\alpha -\beta )\lim _ {n\rightarrow \infty }\sum _{i=0}^{n-1}\mathbf {M}^i\cdot \mathbf {T_{0}}=(1-\alpha -\beta )(\mathbf {I}-\mathbf {M})^{-1}\mathbf {T_{0}} \end{aligned}$$

where \(\mathbf {I}\) is the \(|T|\times |T|\) identity matrix. Therefore, the ranks of topics will converge in NERank. Because the rank of entities \(\mathbf {E_n}\) can be computed by \(\mathbf {E_n}=\mathbf {\hat{\Phi }_R}^T\cdot \mathbf {T_n}\). Denote \(\mathbf {E^{*}}\) as the close form solution vector for entity ranks. We have

$$\begin{aligned} \mathbf {E^{*}}=(1-\alpha -\beta )\cdot \mathbf {\hat{\Phi }_R}^T(\mathbf {I}-\alpha \cdot \mathbf {\Theta _R}^T\mathbf {\Theta }-\beta \cdot \mathbf {\hat{\Phi }_C}\mathbf {\hat{\Phi }_R}^T)^{-1}\cdot \mathbf {T_{0}} \end{aligned}$$

where the rank of entity \(e_i\) (i.e., \(r(e_i)\)) is the \(i^{th}\) element in \(\mathbf {E^{*}}\).

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, C., Zhang, R., He, X., Zhou, G., Zhou, A. (2016). NERank: Bringing Order to Named Entities from Texts. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9931. Springer, Cham. https://doi.org/10.1007/978-3-319-45814-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45814-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45813-7

  • Online ISBN: 978-3-319-45814-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics