Abstract
Most entity ranking research aims to retrieve a ranked list of entities from a Web corpus given a query. However, entities in plain documents can be ranked directly based on their relative importance, in order to support entity-oriented Web applications. In this paper, we introduce an entity ranking algorithm NERank to address this issue. NERank first constructs a graph model called Topical Tripartite Graph from a document collection. A ranking function is designed to compute the prior ranks of topics based on three quality metrics. We further propose a meta-path constrained random walk method to propagate prior topic ranks to entities. We evaluate NERank over real-life datasets and compare it with baselines. Experimental results illustrate the effectiveness of our approach.
A preliminary version of this paper has been presented in WWW’16 [6]. This work is partially supported by NSFC under Grant No. 61402180, the Natural Science Foundation of Shanghai under Grant No. 14ZR1412600, Shanghai Agriculture Science Program (2015) Number 3-2 and NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization under Grant No. U1509219.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
See background info at:
References
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)
Ganesan, K., Zhai, C.: Opinion-based entity ranking. Inf. Retr. 15(2), 116–150 (2013)
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: EMNLP, pp. 404–411 (2004)
Kaptein, R., Serdyukov, P., de Vries, A.P., Kamps, J.: Entity ranking using wikipedia as a pivot. In: CIKM, pp. 69–78 (2010)
de Vries, A.P., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 entity ranking track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 245–251. Springer, Heidelberg (2008)
Wang, C., Zhang, R., He, X., Zhou, A.: NERank: ranking named entities in document collections. In: WWW, pp. 123–124 (2016)
Balog, K., de Rijke, M.: Determining expert profiles (with an application to expert finding). In: IJCAI, pp. 2657–2662 (2007)
Nie, Z., Zhang, Y., Wen, J., Ma, W.: Object-level ranking: bringing order to web objects. In: WWW, pp. 567–574 (2005)
Lee, S., Song, S., Kahng, M., Lee, D., Lee, S.: Random walk based entity ranking on graph for multidimensional recommendation. In: RecSys, pp. 93–100 (2011)
Haveliwala, T.H.: Topic-sensitive pagerank. In: WWW, pp. 517–526 (2002)
Ilieva, E., Michel, S., Stupar, A.: The essence of knowledge (bases) through entity rankings. In: CIKM, pp. 1537–1540 (2013)
Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: WSDM, pp. 563–572 (2012)
Cornolti, M., Ferragina, P., Ciaramita, M.: A framework for benchmarking entity-annotation systems. In: WWW, pp. 249–260 (2013)
Usbeck, R., Röder, M., Ngomo, A.N., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL: general entity annotator benchmarking framework. In: WWW, pp. 1133–1143 (2015)
Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL (2005)
Jijkoun, V., Khalid, M.A., Marx, M., de Rijke, M.: Named entity normalization in user generated content. In: AND, pp. 23–30 (2008)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Shen, W., Wang, J., Luo, P., Wang, M.: LINDEN: linking named entities with knowledge base via semantic knowledge. In: WWW, pp. 449–458 (2012)
Lao, N., Cohen, W.W.: Relational retrieval using a combination of path-constrained random walks. Mach. Learn. 81(1), 53–67 (2010)
Tran, G.B., Alrifai, M., Nguyen, D.Q.: Predicting relevant news events for timeline summaries. In: WWW, pp. 91–92 (2013)
Tran, G., Alrifai, M., Herder, E.: Timeline summarization from relevant headlines. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 245–256. Springer, Heidelberg (2015)
Zaragoza, H., Rode, H., Mika, P., Atserias, J., Ciaramita, M., Attardi, G.: Ranking very many typed entities on wikipedia. In: CIKM, pp. 1015–1018 (2007)
Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22, 457–479 (2004)
Kim, Y., Kim, M., Cattle, A., Otmakhova, J., Park, S., Shin, H.: Applying graph-based keyword extraction to document retrieval. In: IJCNLP, pp. 864–868 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Mathematical Analysis of NERank
Appendix: Mathematical Analysis of NERank
We prove that the random walk algorithm of NERank will converge and derive the close-form solution. Let \(\mathbf {T_n}\) denote the \(|T|\times 1\) matrix which represents the ranks of topics in the \(n^{th}\) iteration. Specially, \(\mathbf {T_0}\) is the prior rank matrix for topics. Let \(\mathbf {E_n}\) denote the \(|E|\times 1\) entity rank matrix in the \(n^{th}\) iteration. Based on the random walk process, the rank update of topics for TDT meta-path is formulated as: \(\mathbf {T_{n}}=\mathbf {\Theta _R}^T\mathbf {\Theta }\cdot \mathbf {T_{n-1}}\) where \(\mathbf {\Theta _R}\) is the row-normalized matrix of \(\mathbf {\Theta }\). Similarly, for TET meta-path, we have \(\mathbf {T_{n}}=\mathbf {\hat{\Phi }_C}\mathbf {\hat{\Phi }_R}^T\cdot \mathbf {T_{n-1}}\) where \(\mathbf {\hat{\Phi }_R}\) and \(\mathbf {\hat{\Phi }_C}\) are the row-normalized and column-normalized matrices of \(\mathbf {\hat{\Phi }}\), respectively. The update rule in one iteration is formulated as:
For simplicity, we define \(\mathbf {M}=\alpha \cdot \mathbf {\Theta _R}^T\mathbf {\Theta }+\beta \cdot \mathbf {\hat{\Phi }_C}\mathbf {\hat{\Phi }_R}^T\). By iteration, we have \(\mathbf {T_{n}}=\mathbf {M}^n \cdot \mathbf {T_{0}}+(1-\alpha -\beta )\cdot \sum _{i=0}^{n-1}\mathbf {M}^i\cdot \mathbf {T_{0}}\). Because \(\lim _ {n\rightarrow \infty }\mathbf {M}^n=\mathbf {0}\) and \(\lim _ {n\rightarrow \infty }\sum _{i=0}^{n-1}\mathbf {M}^i=(\mathbf {I}-\mathbf {M})^{-1}\), the limit of matrix series \(\{\mathbf {T_n}\}\) is derived as:
where \(\mathbf {I}\) is the \(|T|\times |T|\) identity matrix. Therefore, the ranks of topics will converge in NERank. Because the rank of entities \(\mathbf {E_n}\) can be computed by \(\mathbf {E_n}=\mathbf {\hat{\Phi }_R}^T\cdot \mathbf {T_n}\). Denote \(\mathbf {E^{*}}\) as the close form solution vector for entity ranks. We have
where the rank of entity \(e_i\) (i.e., \(r(e_i)\)) is the \(i^{th}\) element in \(\mathbf {E^{*}}\).
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, C., Zhang, R., He, X., Zhou, G., Zhou, A. (2016). NERank: Bringing Order to Named Entities from Texts. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9931. Springer, Cham. https://doi.org/10.1007/978-3-319-45814-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-45814-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45813-7
Online ISBN: 978-3-319-45814-4
eBook Packages: Computer ScienceComputer Science (R0)