Document Ranking for Curated Document Databases Using BERT and Knowledge Graph Embeddings: Introducing GRAB-Rank

Muhammad, Iqra; Bollegala, Danushka; Coenen, Frans; Gamble, Carrol; Kearney, Anna; Williamson, Paula

doi:10.1007/978-3-030-86534-4_10

Iqra Muhammad¹³,
Danushka Bollegala¹³,
Frans Coenen¹³,
Carrol Gamble¹⁴,
Anna Kearney¹⁴ &
…
Paula Williamson¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12925))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

819 Accesses
1 Citations

Abstract

Curated Document Databases (CDD) play an important role in helping researchers find relevant articles in scientific literature. Considerable recent attention has been given to the use of various document ranking algorithms to support the maintenance of CDDs. The typical approach is to represent the update document collection using a form of word embedding and to input this into a ranking model; the resulting document rankings can then be used to decide which documents should be added to the CDD and which should be rejected. The hypothesis considered in this paper is that a better ranking model can be produced if a hybrid embedding is used. To this end the Knowledge Graph And BERT Ranking (GRAB-Rank) approach is presented. The Online Resource for Recruitment research in Clinical trials (ORRCA) CDD was used as a focus for the work and as a means of evaluating the proposed technique. The GRAB-Rank approach is fully described and evaluated in the context of learning to rank for the purpose of maintaining CDDs. The evaluation indicates that the hypothesis is correct, hybrid embedding outperforms individual embeddings used in isolation. The evaluation also indicates that GRAB-Rank outperforms a traditional approach based on BM25 and a ngram-based SVR document ranking approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.orrca.org.uk/.
2.
https://www.nltk.org/.

References

Bagheri, E., Ensan, F., Al-Obeidat, F.: Neural word and entity embeddings for ad hoc retrieval. Inf. Process. Manag. 54(4), 657–673 (2018)
Article Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250 (2008)
Google Scholar
Dai, Z., Xiong, C., Callan, J., Liu, Z.: Convolutional neural networks for soft-matching n-grams in ad-hoc search. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 126–134 (2018)
Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Google Scholar
Guthrie, D., Allison, B., Liu, W., Guthrie, L., Wilks, Y.: A closer look at skip-gram modelling. In: LREC, vol. 6, pp. 1222–1225. Citeseer (2006)
Google Scholar
Jabri, S., Dahbi, A., Gadi, T., Bassir, A.: Ranking of text documents using TF-IDF weighting and association rules mining. In: 2018 4th International Conference on Optimization and Applications (ICOA), pp. 1–6. IEEE (2018)
Google Scholar
Kearney, A., et al.: Development of an online resource for recruitment research in clinical trials to organise and map current literature. Clin. Trials 15(6), 533–542 (2018)
Article Google Scholar
Li, Z., Guangluan, X., Liang, X., Li, F., Wang, L., Zhang, D.: Exploring the importance of entities in semantic ranking. Information 10(2), 39 (2019)
Article Google Scholar
Liu, Z., Xiong, C., Sun, M., Liu, Z.: Entity-duet neural ranking: understanding the role of knowledge graph semantics in neural information retrieval. arXiv preprint arXiv:1805.07591 (2018)
MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR: contextualized embeddings for document ranking. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1101–1104 (2019)
Google Scholar
Mitra, M., Chaudhuri, B.B.: Information retrieval from documents: a survey. Inf. Retrieval 2(2–3), 141–163 (2000). https://doi.org/10.1023/A:1009950525500
Article Google Scholar
Muhammad, I., Bollegala, D., Coenen, F., Gamble, C., Kearney, A., Williamson, P.: Maintaining curated document databases using a learning to rank model: the ORRCA experience. In: Bramer, M., Ellis, R. (eds.) SGAI 2020. LNCS (LNAI), vol. 12498, pp. 345–357. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63799-6_26
Chapter Google Scholar
Muhammad, I., Kearney, A., Gamble, C., Coenen, F., Williamson, P.: Open information extraction for knowledge graph construction. In: Kotsis, G., et al. (eds.) DEXA 2020. CCIS, vol. 1285, pp. 103–113. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59028-4_10
Chapter Google Scholar
Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 83–84 (2016)
Google Scholar
Nogueira, R., Cho, K.: Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019)
Norman, C.R., Gargon, E., Leeflang, M.M.G., Névéol, A., Williamson, P.R.: Evaluation of an automatic article selection method for timelier updates of the COMET Core Outcome set database. Database 2019, 1–9 (2019). Article ID baz109. https://doi.org/10.1093/database/baz109
Padigela, H., Zamani, H., Croft, W.B.: Investigating the successes and failures of BERT for passage re-ranking. arXiv preprint arXiv:1905.01758 (2019)
Paik, J.H.: A novel TF-IDF weighting scheme for effective ranking. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 343–352 (2013)
Google Scholar
Pearson, K.: The problem of the random walk. Nature 72(1867), 342–342 (1905)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Rebele, T., Suchanek, F., Hoffart, J., Biega, J., Kuzey, E., Weikum, G.: YAGO: a multilingual knowledge base from Wikipedia, Wordnet, and Geonames. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 177–185. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_19
Chapter Google Scholar
Shan, X., et al.: BISON: BM25-weighted self-attention framework for multi-fields document search. arXiv preprint arXiv:2007.05186 (2020)
Stanovsky, G., Michael, J., Zettlemoyer, L., Dagan, I.: Supervised open information extraction. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 885–895 (2018)
Google Scholar
Uman, L.S.: Systematic reviews and meta-analyses. J. Can. Acad. Child Adolesc. Psychiatry 20(1), 57 (2011)
Google Scholar
Xiong, C., Dai, Z., Callan, J., Liu, Z., Power, R.: End-to-end neural ad-hoc ranking with kernel pooling. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 55–64 (2017)
Google Scholar
Xiong, C., Power, R., Callan, J.: Explicit semantic ranking for academic search via knowledge graph embedding. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1271–1279 (2017)
Google Scholar
Zamani, H., Croft, W.B.: Relevance-based word embedding. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 505–514 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Liverpool, Liverpool, L69 3BX, UK
Iqra Muhammad, Danushka Bollegala & Frans Coenen
Department of Biostatistics, Institute of Translational Medicine, The University of Liverpool, Liverpool, L69 3BX, UK
Carrol Gamble, Anna Kearney & Paula Williamson

Authors

Iqra Muhammad
View author publications
You can also search for this author in PubMed Google Scholar
Danushka Bollegala
View author publications
You can also search for this author in PubMed Google Scholar
Frans Coenen
View author publications
You can also search for this author in PubMed Google Scholar
Carrol Gamble
View author publications
You can also search for this author in PubMed Google Scholar
Anna Kearney
View author publications
You can also search for this author in PubMed Google Scholar
Paula Williamson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iqra Muhammad .

Editor information

Editors and Affiliations

University of Bologna, Bologna, Forli/Cesena, Italy
Matteo Golfarelli
Poznań University of Technology, Poznan, Poland
Robert Wrembel
Johannes Kepler University Linz, Linz, Austria
Gabriele Kotsis
TU Wien, Vienna, Austria
A Min Tjoa
Johannes Kepler University Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muhammad, I., Bollegala, D., Coenen, F., Gamble, C., Kearney, A., Williamson, P. (2021). Document Ranking for Curated Document Databases Using BERT and Knowledge Graph Embeddings: Introducing GRAB-Rank. In: Golfarelli, M., Wrembel, R., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2021. Lecture Notes in Computer Science(), vol 12925. Springer, Cham. https://doi.org/10.1007/978-3-030-86534-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-86534-4_10
Published: 05 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86533-7
Online ISBN: 978-3-030-86534-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics