Abstract
Linked Data (LD) has become a valuable source of factual records, and entity search is a fundamental task in LD. The task is, given a query consisting of a set of keywords, to retrieve a set of relevant entities in LD. The state-of-the-art approaches for entity search are based on information retrieval techniques. This paper first examines these approaches with a traditional evaluation metric, recall@k, to reveal their potential for improvement. To obtain evidence for the potentials, an investigation is carried out on the relationship between queries and answer entities in terms of path lengths on a graph of LD. On the basis of the investigation, learning representations of entities are dealt with. The existing methods of entity search are based on heuristics that determine relevant fields (i.e., predicates and related entities) to constitute entity representations. Since the heuristics require burdensome human decisions, this paper is aimed at removing the burden with a graph proximity measurement. To this end, in this paper, RWRDoc is proposed. It is an RWR (random walk with restart)-based representation learning method that learns representations of entities by using weighted combinations of representations of reachable entities w.r.t. RWR. RWRDoc is mainly designed to improve recall scores; therefore, as shown in experiments, it lacks capability in ranking. In order to improve the ranking qualities, this paper proposes a personalized PageRank-based re-ranking method, PPRSD (Personalized PageRank-based Score Distribution), for the retrieved results. PPRSD distributes relevance scores calculated by text-based entity search methods in a personalized PageRank manner. Experimental evaluations showcase that RWRDoc can improve search qualities in terms of recall@1000 and PPRSD can compensate for RWRDoc’s insufficient ranking capability, and the evaluations confirmed this compensation.
Similar content being viewed by others
Notes
References
Balaneshinkordan S, Kotov A, Nikolaev F (2018) Attentive neural architecture for ad-hoc structured document retrieval. In: CIKM 2018, pp 1173–1182
Balmin A, Hristidis V, Papakonstantinou Y (2004) ObjectRank: authority-based keyword search in databases. In: VLDB 2004, pp 564–575
Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5(3):1–22
Burges CJC, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender GN (2005) Learning to rank using gradient descent. In: ICML 2005, pp 89–96
Chen J, Xiong C, Callan J (2016) An empirical study of learning to rank for entity search. In: SIGIR 2016, pp 737–740
Ciglan M, Nørvåg K, Hluchý L (2012) The SemSets model for ad-hoc semantic list search. In: WWW 2012, pp 131–140
Dali L, Fortuna B (2011) Learning to rank for semantic search. In: SemSearch@WWW2011
Delbru R, Toupikov N, Catasta M, Tummarello G, Decker S (2010) Hierarchical link analysis for ranking web data. In: ESWC 2010, pp 225–239
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: SIGKDD 2016, pp 855–864
Hasibi F (2018) Semantic search with knowledge bases. PhD thesis, Norwegian University of Science and Technology, Trondheim, Norway
Hasibi F, Balog K, Bratsberg SE (2016) Exploiting entity linking in queries for entity retrieval. In: ICTIR 2016, pp 209–218
Hasibi F, Nikolaev F, Xiong C, Balog K, Bratsberg SE, Kotov A, Callan J (2017) DBpedia-entity v2: a test collection for entity search. In: SIGIR 2017, pp 1265–1268
Haveliwala TH (2002) Topic-sensitive PageRank. In: WWW 2002, pp 517–526
Hogan A, Harth A, Decker S (2006) ReConRank: a scalable ranking method for semantic web data with context. In: SSWS 2006
Interdonato R, Tagarelli A (2015) Multi-relational PageRank for tree structure sense ranking. World Wide Web 18(5):1301–1329
Ito H, Komamizu T, Amagasa T, Kitagawa H (2018) Community detection and correlated attribute cluster analysis on multi-attributed graphs. In: DARLI-AP@EDBT/ICDT 2018, pp 2–9
Ito H, Komamizu T, Amagasa T, Kitagawa H (2018) Network-word embedding for dynamic text attributed networks. In: SCSN@ICSC 2018, pp 334–339
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446
Kim J, Xue X, Croft WB (2009) A probabilistic retrieval model for semistructured data. In: ECIR 2009, pp 228–239
Komamizu T, Okumura S, Amagasa T, Kitagawa H (2017) FORK: feedback-aware ObjectRank-based keyword search over linked data. In: AIRS 2017, pp 58–70
Li J, Dani H, Hu X, Tang J, Chang Y, Liu H (2017) Attributed network embedding for learning in a dynamic environment. In: CIKM 2017, pp 387–396
Lin X, Lam W, Lai KP (2018) Entity retrieval in the knowledge graph with hierarchical entity type and content. In: ICTIR 2018, pp 211–214
Metzler D, Croft WB (2005) A Markov random field model for term dependencies. In: SIGIR 2005, pp 472–479
Nikolaev F, Kotov A, Zhiltsov N (2016) Parameterized fielded term dependence models for ad-hoc entity retrieval from knowledge graph. In: SIGIR 2016, pp 435–444
Ogilvie P, Callan JP (2003) Combining document representations for known-item search. In: SIGIR 2003, pp 143–150
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Technical report 1999-66
Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: online learning of social representations. In: SIGKDD 2014, pp 701–710
Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: SIGIR 1998, pp 275–281
Pound J, Mika P, Zaragoza H (2010) Ad-hoc object retrieval in the web of data. In: WWW 2010, pp 771–780
Robertson SE, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. FTIR 3(4):333–389
Shijia E, Xiang Y (2017) Entity search based on the representation learning model with different embedding strategies. IEEE Access 5:15174–15183
Tong H, Faloutsos C, Pan J (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3):327–346
Usbeck R, Ngomo AN, Haarmann B, Krithara A, Röder M, Napolitano G (2017) 7th open challenge on question answering over linked data (QALD-7). In: ESWC 2017, pp 59–69
Wang Q, Kamps J, Camps GR, Marx M, Schuth A, Theobald M, Gurajada S, Mishra A (2012) Overview of the INEX 2012 linked data track. In: CLEF 2012 evaluation labs and workshop
Yang C, Liu Z, Zhao D, Sun M, Chang EY (2015) Network representation learning with rich text information. In: IJCAI 2015, pp 2111–2117
Yoon M, Jung J, Kang U (2018) TPA: fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In: ICDE 2018, pp 1132–1143
Zhang Z, Yang H, Bu J, Zhou S, Yu P, Zhang J, Ester M, Wang C (2018) ANRL: attributed network representation learning via deep neural networks. In: IJCAI 2018, pp 3155–3161
Zhiltsov N, Kotov A, Nikolaev F (2015) Fielded sequential dependence model for ad-hoc entity retrieval in the web of data. In: SIGIR 2015, pp 253–262
Acknowledgements
This work was partly supported by JSPS KAKENHI Grant Number JP18K18056 and the Artificial Intelligence Research Promotion Foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Komamizu, T. Random walk-based entity representation learning and re-ranking for entity search. Knowl Inf Syst 62, 2989–3013 (2020). https://doi.org/10.1007/s10115-020-01445-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-020-01445-4