NERank: Bringing Order to Named Entities from Texts

Wang, Chengyu; Zhang, Rong; He, Xiaofeng; Zhou, Guomin; Zhou, Aoying

doi:10.1007/978-3-319-45814-4_2

Chengyu Wang¹⁷,
Rong Zhang¹⁷,
Xiaofeng He¹⁷,
Guomin Zhou¹⁸ &
…
Aoying Zhou¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9931))

Included in the following conference series:

Asia-Pacific Web Conference

2270 Accesses
1 Citations

Abstract

Most entity ranking research aims to retrieve a ranked list of entities from a Web corpus given a query. However, entities in plain documents can be ranked directly based on their relative importance, in order to support entity-oriented Web applications. In this paper, we introduce an entity ranking algorithm NERank to address this issue. NERank first constructs a graph model called Topical Tripartite Graph from a document collection. A ranking function is designed to compute the prior ranks of topics based on three quality metrics. We further propose a meta-path constrained random walk method to propagate prior topic ranks to entities. We evaluate NERank over real-life datasets and compare it with baselines. Experimental results illustrate the effectiveness of our approach.

A preliminary version of this paper has been presented in WWW’16 [6]. This work is partially supported by NSFC under Grant No. 61402180, the Natural Science Foundation of Shanghai under Grant No. 14ZR1412600, Shanghai Agriculture Science Program (2015) Number 3-2 and NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization under Grant No. U1509219.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See background info at:
https://en.wikipedia.org/wiki/Egyptian_Revolution_of_2011.

References

Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)
Google Scholar
Ganesan, K., Zhai, C.: Opinion-based entity ranking. Inf. Retr. 15(2), 116–150 (2013)
Article Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: EMNLP, pp. 404–411 (2004)
Google Scholar
Kaptein, R., Serdyukov, P., de Vries, A.P., Kamps, J.: Entity ranking using wikipedia as a pivot. In: CIKM, pp. 69–78 (2010)
Google Scholar
de Vries, A.P., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 entity ranking track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 245–251. Springer, Heidelberg (2008)
Chapter Google Scholar
Wang, C., Zhang, R., He, X., Zhou, A.: NERank: ranking named entities in document collections. In: WWW, pp. 123–124 (2016)
Google Scholar
Balog, K., de Rijke, M.: Determining expert profiles (with an application to expert finding). In: IJCAI, pp. 2657–2662 (2007)
Google Scholar
Nie, Z., Zhang, Y., Wen, J., Ma, W.: Object-level ranking: bringing order to web objects. In: WWW, pp. 567–574 (2005)
Google Scholar
Lee, S., Song, S., Kahng, M., Lee, D., Lee, S.: Random walk based entity ranking on graph for multidimensional recommendation. In: RecSys, pp. 93–100 (2011)
Google Scholar
Haveliwala, T.H.: Topic-sensitive pagerank. In: WWW, pp. 517–526 (2002)
Google Scholar
Ilieva, E., Michel, S., Stupar, A.: The essence of knowledge (bases) through entity rankings. In: CIKM, pp. 1537–1540 (2013)
Google Scholar
Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: WSDM, pp. 563–572 (2012)
Google Scholar
Cornolti, M., Ferragina, P., Ciaramita, M.: A framework for benchmarking entity-annotation systems. In: WWW, pp. 249–260 (2013)
Google Scholar
Usbeck, R., Röder, M., Ngomo, A.N., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL: general entity annotator benchmarking framework. In: WWW, pp. 1133–1143 (2015)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL (2005)
Google Scholar
Jijkoun, V., Khalid, M.A., Marx, M., de Rijke, M.: Named entity normalization in user generated content. In: AND, pp. 23–30 (2008)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Shen, W., Wang, J., Luo, P., Wang, M.: LINDEN: linking named entities with knowledge base via semantic knowledge. In: WWW, pp. 449–458 (2012)
Google Scholar
Lao, N., Cohen, W.W.: Relational retrieval using a combination of path-constrained random walks. Mach. Learn. 81(1), 53–67 (2010)
Article MathSciNet Google Scholar
Tran, G.B., Alrifai, M., Nguyen, D.Q.: Predicting relevant news events for timeline summaries. In: WWW, pp. 91–92 (2013)
Google Scholar
Tran, G., Alrifai, M., Herder, E.: Timeline summarization from relevant headlines. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 245–256. Springer, Heidelberg (2015)
Google Scholar
Zaragoza, H., Rode, H., Mika, P., Atserias, J., Ciaramita, M., Attardi, G.: Ranking very many typed entities on wikipedia. In: CIKM, pp. 1015–1018 (2007)
Google Scholar
Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22, 457–479 (2004)
Google Scholar
Kim, Y., Kim, M., Cattle, A., Otmakhova, J., Park, S., Shin, H.: Applying graph-based keyword extraction to document retrieval. In: IJCNLP, pp. 864–868 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Data Science and Engineering, East China Normal University, Shanghai, China
Chengyu Wang, Rong Zhang, Xiaofeng He & Aoying Zhou
Zhejiang Police College, Hangzhou, Zhejiang Province, China
Guomin Zhou

Authors

Chengyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng He
View author publications
You can also search for this author in PubMed Google Scholar
Guomin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofeng He .

Editor information

Editors and Affiliations

School of Computing, University of Utah, Salt Lake City, Utah, USA
Feifei Li
School of Electrical Engineering, Seoul National University, Seoul, Korea (Republic of)
Kyuseok Shim
Soochow University , Suzhou, China
Kai Zheng
Soochow University , Suzhou, China
Guanfeng Liu

Appendix: Mathematical Analysis of NERank

We prove that the random walk algorithm of NERank will converge and derive the close-form solution. Let $\mathbf {T_n}$ denote the $|T|\times 1$ matrix which represents the ranks of topics in the $n^{th}$ iteration. Specially, $\mathbf {T_0}$ is the prior rank matrix for topics. Let $\mathbf {E_n}$ denote the $|E|\times 1$ entity rank matrix in the $n^{th}$ iteration. Based on the random walk process, the rank update of topics for TDT meta-path is formulated as: $\mathbf {T_{n}}=\mathbf {\Theta _R}^T\mathbf {\Theta }\cdot \mathbf {T_{n-1}}$ where $\mathbf {\Theta _R}$ is the row-normalized matrix of $\mathbf {\Theta }$. Similarly, for TET meta-path, we have $\mathbf {T_{n}}=\mathbf {\hat{\Phi }_C}\mathbf {\hat{\Phi }_R}^T\cdot \mathbf {T_{n-1}}$ where $\mathbf {\hat{\Phi }_R}$ and $\mathbf {\hat{\Phi }_C}$ are the row-normalized and column-normalized matrices of $\mathbf {\hat{\Phi }}$, respectively. The update rule in one iteration is formulated as:

$$\begin{aligned} \mathbf {T_{n}}=\alpha \cdot \mathbf {\Theta _R}^T\mathbf {\Theta }\cdot \mathbf {T_{n-1}}+\beta \cdot \mathbf {\hat{\Phi }_C}\mathbf {\hat{\Phi }_R}^T\cdot \mathbf {T_{n-1}}+(1-\alpha -\beta )\cdot \mathbf {T_{0}} \end{aligned}$$

For simplicity, we define $\mathbf {M}=\alpha \cdot \mathbf {\Theta _R}^T\mathbf {\Theta }+\beta \cdot \mathbf {\hat{\Phi }_C}\mathbf {\hat{\Phi }_R}^T$. By iteration, we have $\mathbf {T_{n}}=\mathbf {M}^n \cdot \mathbf {T_{0}}+(1-\alpha -\beta )\cdot \sum _{i=0}^{n-1}\mathbf {M}^i\cdot \mathbf {T_{0}}$. Because $\lim _ {n\rightarrow \infty }\mathbf {M}^n=\mathbf {0}$ and $\lim _ {n\rightarrow \infty }\sum _{i=0}^{n-1}\mathbf {M}^i=(\mathbf {I}-\mathbf {M})^{-1}$, the limit of matrix series $\{\mathbf {T_n}\}$ is derived as:

$$\begin{aligned} \lim _ {n\rightarrow \infty }\mathbf {T_{n}} =\lim _ {n\rightarrow \infty }\mathbf {M}^n \cdot \mathbf {T_{0}}+(1-\alpha -\beta )\lim _ {n\rightarrow \infty }\sum _{i=0}^{n-1}\mathbf {M}^i\cdot \mathbf {T_{0}}=(1-\alpha -\beta )(\mathbf {I}-\mathbf {M})^{-1}\mathbf {T_{0}} \end{aligned}$$

where $\mathbf {I}$ is the $|T|\times |T|$ identity matrix. Therefore, the ranks of topics will converge in NERank. Because the rank of entities $\mathbf {E_n}$ can be computed by $\mathbf {E_n}=\mathbf {\hat{\Phi }_R}^T\cdot \mathbf {T_n}$. Denote $\mathbf {E^{*}}$ as the close form solution vector for entity ranks. We have

$$\begin{aligned} \mathbf {E^{*}}=(1-\alpha -\beta )\cdot \mathbf {\hat{\Phi }_R}^T(\mathbf {I}-\alpha \cdot \mathbf {\Theta _R}^T\mathbf {\Theta }-\beta \cdot \mathbf {\hat{\Phi }_C}\mathbf {\hat{\Phi }_R}^T)^{-1}\cdot \mathbf {T_{0}} \end{aligned}$$

where the rank of entity $e_i$ (i.e., $r(e_i)$) is the $i^{th}$ element in $\mathbf {E^{*}}$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, C., Zhang, R., He, X., Zhou, G., Zhou, A. (2016). NERank: Bringing Order to Named Entities from Texts. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9931. Springer, Cham. https://doi.org/10.1007/978-3-319-45814-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-45814-4_2
Published: 17 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45813-7
Online ISBN: 978-3-319-45814-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

NERank: Bringing Order to Named Entities from Texts

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Mathematical Analysis of NERank

Appendix: Mathematical Analysis of NERank

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation