Skip to main content
Log in

A qualitative study of large-scale recommendation algorithms for biomedical knowledge bases

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

The frequency at which new research documents are being published causes challenges for researchers who increasingly need access to relevant documents in order to conduct their research. Searching across a variety of databases and browsing millions of documents to find semantically relevant material is a time-consuming task. Recently, there has been a focus on recommendation algorithms that suggest relevant documents based on the current interests of the researchers. In this paper, we describe the implementation of seven commonly used algorithms and three aggregation algorithms. We evaluate the recommendation algorithms in a large-scale biomedical knowledge base with the goal of identifying relative weaknesses and strengths of each algorithm. We analyze the recommendations from each algorithm based on assessments of output as evaluated by 14 biomedical researchers. The results of our research provide unique insights into the performance of recommendation algorithms against the needs of modern-day biomedical researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Acharya, A.: Follow related research for key authors, October 13, 2017. https://scholar.googleblog.com/2017/10/follow-related-research-for-key-authors.html. Last accessed 4 Dec 2017

  2. Aggarwal, C.C., et al.: Recommender Systems, vol. 1. Springer (2016)

  3. Agmon, S.: The relaxation method for linear inequalities. Can. J. Math. 6, 382–392 (1954)

    Article  MathSciNet  MATH  Google Scholar 

  4. AI2: Leverage AI to combat information overload (2017). http://allenai.org/semantic-scholar/. Last accessed 11 Sept 2017

  5. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM 55(5), 1–27 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ali, A., Meilă, M.: Experiments with Kemeny ranking: what works when? Math. Soc. Sci. 64, 28–40 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  7. Apache: Introduction to item-based recommendations with hadoop (2019). http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html/. Last accessed 21 Feb 2019

  8. Bartholdi, J., III., Tovey, C., Trick, M.: Voting schemes for which it is can be difficult to tell who won the election. Soc. Choice Welf. 6, 157–165 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  9. Beel, J., Gipp, B., Langer, S., Breitinger, C.: paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2016)

    Article  Google Scholar 

  10. Beel, J., Langer, S.: A comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems. In: International Conference on Theory and Practice of Digital Libraries, pp. 153–168. Springer (2015)

  11. Beel, J., Langer, S., Genzmehr, M., Gipp, B., Breitinger, C., Nürnberger, A.: Research paper recommender system evaluation: a quantitative literature survey. In: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation, Ser. RepSys ’13. New York, NY, USA, pp. 15–22. ACM (2013)

  12. Beel, J., Langer, S., Gipp, B., Nürnberger, A.: The architecture and datasets of Docear’s research paper recommender system. D-Lib Mag. 20(11), 1 (2014)

    Google Scholar 

  13. Bergstrom, C.T., West, J.D., Wiseman, M.A.: The eigenfactor metrics. Int. J. Neurosci. 28(45), 11 33-11 434 (2008)

    Google Scholar 

  14. Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Recommender systems survey. Knowl.-Based Syst. 46, 109–132 (2013)

    Article  Google Scholar 

  15. Bodenreider, O., Nelson, S.J., Hole, W.T., Chang, H.F.: Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies. In: Proceedings of the AMIA Symposium, p. 815. American Medical Informatics Association (1998)

  16. Bollacker, K.D., Lawrence, S., Giles, C.L.: CiteSeer: an autonomous web agent for automatic retrieval and identification of interesting publications. In: Proceedings of the 2nd International Conference on Autonomous Agents, pp. 116–123. ACM (1998)

  17. Box, G., Hunter, W., Hunter, J.: Statistics for Experimenters. Wiley (1978)

  18. Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F.: Extensible markup language (xml) 1.0 (2000)

  19. Breese, J.S., Heckerman, D., Kadie, C.M.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (1998)

  20. Campos, D., Matos, S., Oliveira, J.L.: A modular framework for biomedical concept recognition. BMC Bioinform. 14(1), 281 (2013)

    Article  Google Scholar 

  21. Cañamares, R., Castells, P., Moffat, A.: Offline evaluation options for recommender systems. Inf. Retr. J. 23, 1–24 (2020)

    Google Scholar 

  22. Canese, K., Weis, S.: PubMed: the bibliographic database. The NCBI Handbook (2013). http://www.ncbi.nlm.nih.gov/books/NBK153385/. Last accessed 15 Dec 2017

  23. Cision: Acquisition of the Thomson Reuters intellectual property and science business by Onex and Baring Asia completed (2016). http://www.prnewswire.com/. Last accessed 15 Dec 2017

  24. Clarivate, Web of Science: Core collection help (2017). https://images.webofknowledge.com/images/help/WOS/hp_full_record.html. Last accessed 15 Jan 2019

  25. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press (2009)

  26. Crossref: Crossref (2019). http://www.crossref.org/

  27. de Borda, J.-C.: Mémoire sur les élections au scrutin, Histoire de l’Académie Royale des Sciences, Paris, pp. 657–664 (1781)

  28. Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., Zhai, C.: Content-based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65(9), 1820–1833 (2014)

    Article  Google Scholar 

  29. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International Conference on World Wide Web, pp. 613–622. ACM (2001)

  30. Ekstrand, M.D., Kannan, P., Stemper, J.A., Butler, J.T., Konstan, J.A., Riedl, J.T.: Automatically building research reading lists. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 159–166. ACM (2010)

  31. Elsevier: The largest up-to-date collection of global, unbiased and expertly sourced research (2017). https://www.elsevier.com/solutions/scopus/content. Last accessed 2018 Dec 15

  32. Fafalios, P., Tzitzikas, Y.: Stochastic reranking of biomedical search results based on extracted entities. J. Assoc. Inf. Sci. Technol. 68(11), 2572–2586 (2017)

    Article  Google Scholar 

  33. Falagas, M.E., Pitsouni, E.I., Malietzis, G.A., Pappas, G.: Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. J. Fed. Am. Soc. Exp. Biol. 22(2), 338–342 (2008)

    Google Scholar 

  34. Ge, M., Delgado-Battenfeld, C., Jannach, D.: Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 257–260 (2010)

  35. Gipp, B., Beel, J.: Citation proximity analysis (CPA): a new approach for identifying related work based on co-citation analysis. In: ISSI’09: 12th International Conference on Scientometrics and Informetrics, pp. 571–575 (2009)

  36. Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)

    Google Scholar 

  37. Google: Google scholar: about (2020). https://scholar.google.ca/intl/en/scholar/about.html

  38. Greenhalgh, T.: How to read a paper: the medline database. BMJ 315(7101), 180–183 (1997)

    Article  Google Scholar 

  39. Gruson, A., Chandar, P., Charbuillet, C., McInerney, J., Hansen, S., Tardieu, D., Carterette, B.: Offline evaluation to make decisions about playlistrecommendation algorithms. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 420–428 (2019)

  40. Hakenberg, J., Plake, C., Leaman, R., Schroeder, M., Gonzalez, G.: Inter-species normalization of gene mentions with GNAT. Bioinformatics 24(16), i126–i132 (2008)

    Article  Google Scholar 

  41. Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22(1), 5–53 (2004)

    Article  Google Scholar 

  42. Ho, T. K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)

  43. Hotho, A., Jäschke, R., Schmitz, C., Stumme, G.: Information retrieval in folksonomies: search and ranking. In: European Semantic Web Conference, pp. 411–426. Springer (2006)

  44. Huang, Y., Contractor, N., Yao, Y.: CI-KNOW: recommendation based on social networks. In: Proceedings of the International Conference on Digital Government Research, pp. 27–33. Digital Government Society of North America (2008)

  45. Ishida, Y., Shimizu, T., Yoshikawa, M.: An analysis and comparison of keyword recommendation methods for scientific data. Int. J. Digit. Libr. 21, 1–21 (2020)

    Article  Google Scholar 

  46. Jack, K.: Mendeley: crowdsourcing and recommending research on a large scale (2011). http://www.slideshare.net/KrisJack/mendeley-crowdsourcing-and-recommending-research-on-a-large-scale. Accessed 2015-02-25

  47. Jack, K.: Mahout becomes a researcher: large scale recommendations at Mendeley (2012). http://www.slideshare.net/KrisJack/mahout-becomes-a-researcher-large-scale-recommendations-at-mendeley. Last accessed 15 Dec2017

  48. Jack, K.: Mendeley: recommendation systems for academic literature (2012). http://www.slideshare.net/KrisJack/mendeley-recommendation-systems-for-academic-literature. Last accessed 15 Dec 2017

  49. Jannach, D., Zanker, M., Felfernig, A., Friedrich, G.: An Introduction to Recommender Systems. Cambridge, New York (2011)

    Google Scholar 

  50. Jolliffe, I.: Principal Component Analysis. Springer (2011)

  51. Jones, N.: AI science search engines expand their reach, November 11, 2016. http://www.nature.com/news/ai-science-search-engines-expand-their-reach-1.20964. Last accessed 15 Dec 2017

  52. Kaminskas, M., Bridge, D.: Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Trans. Interact. Intell. Syst. 7(1), 2 (2017)

    Article  Google Scholar 

  53. Kemeny, J., Snell, J.: Mathematical Models in Social Sciences. Blaisdell, New York (1962)

    MATH  Google Scholar 

  54. Kessler, M.M.: Bibliographic coupling between scientific papers. Am. Doc. 14(1), 10–25 (1963)

    Article  Google Scholar 

  55. Klavans, R., Boyack, K.W.: Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? J. Assoc. Inf. Sci. Technol. 68(4), 984–998 (2017)

    Article  Google Scholar 

  56. Konstan, J.A., McNee, S.M., Ziegler, C.-N., Torres, R., Kapoor, N., Riedl, J.: Lessons on applying automated recommender systems to information-seeking tasks. AAAI 6, 1630–1633 (2006)

    Google Scholar 

  57. Kotkov, D., Wang, S., Veijalainen, J.: A survey of serendipity in recommender systems. Knowl.-Based Syst. 111, 180–192 (2016)

    Article  Google Scholar 

  58. Kreisman, R.: Thomson Reuters-Google Scholar linkage offers big win for STM users and publishers (2013)

  59. Krishnan, V., Narayanashetty, P.K., Nathan, M., Davies, R.T., Konstan, J.A.: Who predicts better? results from an online study comparing humans and an online recommender system. In: Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 211–218 (2008)

  60. Küçüktunç, O., Saule, E., Kaya, K., Çatalyürek, Ü.V.: Towards a personalized, scalable, and exploratory academic recommendation service. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 636–641. ACM (2013)

  61. Kunaver, M., Požrl, T.: Diversity in recommender systems—a survey. Knowl.-Based Syst. 123, 154–162 (2017)

    Article  Google Scholar 

  62. Lawrence, S., Giles, C.L., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Comput. 32(6), 67–71 (1999)

    Article  Google Scholar 

  63. Leaman, R., Doğan, R.I., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)

    Article  Google Scholar 

  64. Lee, B.-H., Kim, H.-N., Jung, J.-G., Jo, G.-S.: Location-based service with context data for a restaurant recommendation. In: International Conference on Database and Expert Systems Applications, pp. 430–438. Springer (2006)

  65. Li, C.-L., Su, Y.-C., Lin, T.-W., Tsai, C.-H., Chang, W.-C., Huang, K.-H., Kuo, T.-M., Lin, S.-W., Lin, Y.-S., Lu, Y.-C. et al.: Combination of feature engineering and ranking models for paper-author identification in KDD cup 2013. In: Proceedings of the 2013 KDD Cup Workshop, p. 2. ACM (2013)

  66. Liu, J., Lei, K.H., Liu, J.Y., Wang, C., Han, J.: Ranking-based name matching for author disambiguation in bibliographic data. In: Proceedings of the 2013 KDD Cup Workshop, p. 8. ACM (2013)

  67. Ma, Z., Pant, G., Sheng, O.R.L.: Interest-based personalized search. ACM Trans. Inf. Syst. 25(1), 5 (2007)

    Article  Google Scholar 

  68. Manning, C.D., Raghavan, P., Schütze, H.: Scoring, term weighting and the vector space model. Introd. Inf. Retr. 100, 2–4 (2008)

    Google Scholar 

  69. Marshakova-Shaikevich, I.: System of document connections based on references. Sci. Tech. Inf. Ser. VINITI 6, 3–8 (1973)

    Google Scholar 

  70. McNee, S.M., Cosley, Istvan, D., Gopalkrishnan, P., Lam, S.K., Rashid, A.M., Konstan, J.A., Riedl, J.: On the recommending of citations for research papers. In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work (2002)

  71. McNee, S.M., Riedl, J., Konstan, J.A.: Being accurate is not enough: how accuracy metrics have hurt recommender systems. In: CHI’06 Extended Abstracts on Human Factors in Computing Systems, pp. 1097–1101 (2006)

  72. Meta: Meta (2020). https://www.meta.org/

  73. Middleton, S.E., Shadbolt, N.R., De Roure, D.C.: Ontological user profiling in recommender systems. ACM Trans. Inf. Syst. 22(1), 54–88 (2004)

    Article  Google Scholar 

  74. Mogenet, A., Pham, T.A.N., Kazama, M., Kong, J.: Predicting online performance of job recommender systems with offline evaluation. In: Proceedings of the 13th ACM Conference on Recommender Systems, pp. 477–480 (2019)

  75. Molyneux, S.D., Molyneux, A.C.: System and method for establishing a dynamic meta-knowledge network. US Patent 9,613,321. (Apr. 4 2017)

  76. Moskovitch, R., Wang, F., Pei, J., Friedman, C.: JASIST special issue on biomedical information retrieval. J. Assoc. Inf. Sci. Technol. 68(11), 2525–2528 (2017)

    Article  Google Scholar 

  77. Nelson, S.J.: Medical terminologies that work: the example of MeSH. In: Proceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN), pp. 380–384. IEEE (2009)

  78. Newman, M.E.: The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. 98(2), 404–409 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  79. Noei, E., Heydarnoori, A.: Exaf: a search engine for sample applications of object-oriented framework-provided concepts. Inf. Softw. Technol. 75, 135–147 (2016)

    Article  Google Scholar 

  80. Noei, E., Zhang, F., Wang, S., Zou, Y.: Towards prioritizing user-related issue reports of mobile applications. Empir. Softw. Eng. 24, 1–33 (2018)

    Google Scholar 

  81. Plume, A., van Weijen, D.: Publish or perish? The rise of the fractional author. Res. Trends 38(3), 16–18 (2014)

    Google Scholar 

  82. PubMed Help, November 27, 2017. http://www.ncbi.nlm.nih.gov/books/NBK3827/. Last accessed 15 Dec 2017

  83. Raamkumar, A.S., Foo, S., Pang, N.: Can i have more of these please? Assisting researchers in finding similar research papers from a seed basket of papers. Electron. Libr. 36(3), 568–587 (2018)

    Article  Google Scholar 

  84. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press (2011)

  85. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work. New York, NY, USA, pp. 175–186. ACM (1994)

  86. Said, A., Fields, B., Jain, B.J., Albayrak, S.: User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1399–1408 (2013)

  87. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web, pp. 285–295. ACM (2001)

  88. Schalekamp, F., Zuylen, A.: Rank aggregation: together we are strong. In: Proceedings of the 11th Workshop on Algorithm Engineering and Experiments, pp. 38–51 (1998)

  89. Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 253–260 (2002)

  90. Scott, A.J., Knott, M.: A cluster analysis method for grouping means in the analysis of variance. Biometrics 30, 507–512 (1974)

    Article  MATH  Google Scholar 

  91. Semantic Scholar: Semantic scholar (2019). https://www.semanticscholar.org/. Last accessed 15 Jan 2019

  92. Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Recommender Systems Handbook, pp. 257–297. Springer (2011)

  93. Shvachko, K., Kuang, H., Radia, S., Chansler, R., et al.: The hadoop distributed file system. MSST 10, 1–10 (2010)

    Google Scholar 

  94. Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24(4), 265–269 (1973)

    Article  Google Scholar 

  95. Smyth, B., McClave, P.: Similarity vs. diversity. In: International Conference on Case-Based Reasoning, pp. 347–361. Springer (2001)

  96. Sugiyama, K., Kan, M.-Y.: Serendipitous recommendation for scholarly papers considering relations among researchers. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, pp. 307–310. ACM (2011)

  97. Sugiyama, K., Kan, M.-Y.: A comprehensive evaluation of scholarly paper recommendation using potential citation papers. Int. J. Digit. Libr. 16(2), 91–109 (2015)

    Article  Google Scholar 

  98. Tan, P.-N.: Introduction to Data Mining. Pearson Education India (2018)

  99. Testa, J.: The Thomson Reuters journal selection process (2016). http://scientific.thomsonreuters.com/wok/benefits/essays/journalselection/. Last accessed 15 Dec 2017

  100. Zar, J.H.: Significance testing of the spearman rank correlation coefficient. J. Am. Stat. Assoc. 67(339), 578–580 (1972)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported in part by a Natural Sciences and Engineering Research Council (NSERC) Engage Grant and an NSERC Strategic Partnership Project Grant. The authors wish to recognize contributions of Bahar Ghadiri Bashardoost and Yuyang Liu.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ehsan Noei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Noei, E., Hayat, T., Perrie, J. et al. A qualitative study of large-scale recommendation algorithms for biomedical knowledge bases. Int J Digit Libr 22, 197–215 (2021). https://doi.org/10.1007/s00799-021-00300-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-021-00300-3

Keywords

Navigation