Abstract
The frequency at which new research documents are being published causes challenges for researchers who increasingly need access to relevant documents in order to conduct their research. Searching across a variety of databases and browsing millions of documents to find semantically relevant material is a time-consuming task. Recently, there has been a focus on recommendation algorithms that suggest relevant documents based on the current interests of the researchers. In this paper, we describe the implementation of seven commonly used algorithms and three aggregation algorithms. We evaluate the recommendation algorithms in a large-scale biomedical knowledge base with the goal of identifying relative weaknesses and strengths of each algorithm. We analyze the recommendations from each algorithm based on assessments of output as evaluated by 14 biomedical researchers. The results of our research provide unique insights into the performance of recommendation algorithms against the needs of modern-day biomedical researchers.
Similar content being viewed by others
References
Acharya, A.: Follow related research for key authors, October 13, 2017. https://scholar.googleblog.com/2017/10/follow-related-research-for-key-authors.html. Last accessed 4 Dec 2017
Aggarwal, C.C., et al.: Recommender Systems, vol. 1. Springer (2016)
Agmon, S.: The relaxation method for linear inequalities. Can. J. Math. 6, 382–392 (1954)
AI2: Leverage AI to combat information overload (2017). http://allenai.org/semantic-scholar/. Last accessed 11 Sept 2017
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. J. ACM 55(5), 1–27 (2008)
Ali, A., Meilă, M.: Experiments with Kemeny ranking: what works when? Math. Soc. Sci. 64, 28–40 (2012)
Apache: Introduction to item-based recommendations with hadoop (2019). http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html/. Last accessed 21 Feb 2019
Bartholdi, J., III., Tovey, C., Trick, M.: Voting schemes for which it is can be difficult to tell who won the election. Soc. Choice Welf. 6, 157–165 (1989)
Beel, J., Gipp, B., Langer, S., Breitinger, C.: paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2016)
Beel, J., Langer, S.: A comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems. In: International Conference on Theory and Practice of Digital Libraries, pp. 153–168. Springer (2015)
Beel, J., Langer, S., Genzmehr, M., Gipp, B., Breitinger, C., Nürnberger, A.: Research paper recommender system evaluation: a quantitative literature survey. In: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation, Ser. RepSys ’13. New York, NY, USA, pp. 15–22. ACM (2013)
Beel, J., Langer, S., Gipp, B., Nürnberger, A.: The architecture and datasets of Docear’s research paper recommender system. D-Lib Mag. 20(11), 1 (2014)
Bergstrom, C.T., West, J.D., Wiseman, M.A.: The eigenfactor metrics. Int. J. Neurosci. 28(45), 11 33-11 434 (2008)
Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Recommender systems survey. Knowl.-Based Syst. 46, 109–132 (2013)
Bodenreider, O., Nelson, S.J., Hole, W.T., Chang, H.F.: Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies. In: Proceedings of the AMIA Symposium, p. 815. American Medical Informatics Association (1998)
Bollacker, K.D., Lawrence, S., Giles, C.L.: CiteSeer: an autonomous web agent for automatic retrieval and identification of interesting publications. In: Proceedings of the 2nd International Conference on Autonomous Agents, pp. 116–123. ACM (1998)
Box, G., Hunter, W., Hunter, J.: Statistics for Experimenters. Wiley (1978)
Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F.: Extensible markup language (xml) 1.0 (2000)
Breese, J.S., Heckerman, D., Kadie, C.M.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (1998)
Campos, D., Matos, S., Oliveira, J.L.: A modular framework for biomedical concept recognition. BMC Bioinform. 14(1), 281 (2013)
Cañamares, R., Castells, P., Moffat, A.: Offline evaluation options for recommender systems. Inf. Retr. J. 23, 1–24 (2020)
Canese, K., Weis, S.: PubMed: the bibliographic database. The NCBI Handbook (2013). http://www.ncbi.nlm.nih.gov/books/NBK153385/. Last accessed 15 Dec 2017
Cision: Acquisition of the Thomson Reuters intellectual property and science business by Onex and Baring Asia completed (2016). http://www.prnewswire.com/. Last accessed 15 Dec 2017
Clarivate, Web of Science: Core collection help (2017). https://images.webofknowledge.com/images/help/WOS/hp_full_record.html. Last accessed 15 Jan 2019
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press (2009)
Crossref: Crossref (2019). http://www.crossref.org/
de Borda, J.-C.: Mémoire sur les élections au scrutin, Histoire de l’Académie Royale des Sciences, Paris, pp. 657–664 (1781)
Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., Zhai, C.: Content-based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65(9), 1820–1833 (2014)
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International Conference on World Wide Web, pp. 613–622. ACM (2001)
Ekstrand, M.D., Kannan, P., Stemper, J.A., Butler, J.T., Konstan, J.A., Riedl, J.T.: Automatically building research reading lists. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 159–166. ACM (2010)
Elsevier: The largest up-to-date collection of global, unbiased and expertly sourced research (2017). https://www.elsevier.com/solutions/scopus/content. Last accessed 2018 Dec 15
Fafalios, P., Tzitzikas, Y.: Stochastic reranking of biomedical search results based on extracted entities. J. Assoc. Inf. Sci. Technol. 68(11), 2572–2586 (2017)
Falagas, M.E., Pitsouni, E.I., Malietzis, G.A., Pappas, G.: Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. J. Fed. Am. Soc. Exp. Biol. 22(2), 338–342 (2008)
Ge, M., Delgado-Battenfeld, C., Jannach, D.: Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 257–260 (2010)
Gipp, B., Beel, J.: Citation proximity analysis (CPA): a new approach for identifying related work based on co-citation analysis. In: ISSI’09: 12th International Conference on Scientometrics and Informetrics, pp. 571–575 (2009)
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Google: Google scholar: about (2020). https://scholar.google.ca/intl/en/scholar/about.html
Greenhalgh, T.: How to read a paper: the medline database. BMJ 315(7101), 180–183 (1997)
Gruson, A., Chandar, P., Charbuillet, C., McInerney, J., Hansen, S., Tardieu, D., Carterette, B.: Offline evaluation to make decisions about playlistrecommendation algorithms. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 420–428 (2019)
Hakenberg, J., Plake, C., Leaman, R., Schroeder, M., Gonzalez, G.: Inter-species normalization of gene mentions with GNAT. Bioinformatics 24(16), i126–i132 (2008)
Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22(1), 5–53 (2004)
Ho, T. K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)
Hotho, A., Jäschke, R., Schmitz, C., Stumme, G.: Information retrieval in folksonomies: search and ranking. In: European Semantic Web Conference, pp. 411–426. Springer (2006)
Huang, Y., Contractor, N., Yao, Y.: CI-KNOW: recommendation based on social networks. In: Proceedings of the International Conference on Digital Government Research, pp. 27–33. Digital Government Society of North America (2008)
Ishida, Y., Shimizu, T., Yoshikawa, M.: An analysis and comparison of keyword recommendation methods for scientific data. Int. J. Digit. Libr. 21, 1–21 (2020)
Jack, K.: Mendeley: crowdsourcing and recommending research on a large scale (2011). http://www.slideshare.net/KrisJack/mendeley-crowdsourcing-and-recommending-research-on-a-large-scale. Accessed 2015-02-25
Jack, K.: Mahout becomes a researcher: large scale recommendations at Mendeley (2012). http://www.slideshare.net/KrisJack/mahout-becomes-a-researcher-large-scale-recommendations-at-mendeley. Last accessed 15 Dec2017
Jack, K.: Mendeley: recommendation systems for academic literature (2012). http://www.slideshare.net/KrisJack/mendeley-recommendation-systems-for-academic-literature. Last accessed 15 Dec 2017
Jannach, D., Zanker, M., Felfernig, A., Friedrich, G.: An Introduction to Recommender Systems. Cambridge, New York (2011)
Jolliffe, I.: Principal Component Analysis. Springer (2011)
Jones, N.: AI science search engines expand their reach, November 11, 2016. http://www.nature.com/news/ai-science-search-engines-expand-their-reach-1.20964. Last accessed 15 Dec 2017
Kaminskas, M., Bridge, D.: Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Trans. Interact. Intell. Syst. 7(1), 2 (2017)
Kemeny, J., Snell, J.: Mathematical Models in Social Sciences. Blaisdell, New York (1962)
Kessler, M.M.: Bibliographic coupling between scientific papers. Am. Doc. 14(1), 10–25 (1963)
Klavans, R., Boyack, K.W.: Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? J. Assoc. Inf. Sci. Technol. 68(4), 984–998 (2017)
Konstan, J.A., McNee, S.M., Ziegler, C.-N., Torres, R., Kapoor, N., Riedl, J.: Lessons on applying automated recommender systems to information-seeking tasks. AAAI 6, 1630–1633 (2006)
Kotkov, D., Wang, S., Veijalainen, J.: A survey of serendipity in recommender systems. Knowl.-Based Syst. 111, 180–192 (2016)
Kreisman, R.: Thomson Reuters-Google Scholar linkage offers big win for STM users and publishers (2013)
Krishnan, V., Narayanashetty, P.K., Nathan, M., Davies, R.T., Konstan, J.A.: Who predicts better? results from an online study comparing humans and an online recommender system. In: Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 211–218 (2008)
Küçüktunç, O., Saule, E., Kaya, K., Çatalyürek, Ü.V.: Towards a personalized, scalable, and exploratory academic recommendation service. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 636–641. ACM (2013)
Kunaver, M., Požrl, T.: Diversity in recommender systems—a survey. Knowl.-Based Syst. 123, 154–162 (2017)
Lawrence, S., Giles, C.L., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Comput. 32(6), 67–71 (1999)
Leaman, R., Doğan, R.I., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Lee, B.-H., Kim, H.-N., Jung, J.-G., Jo, G.-S.: Location-based service with context data for a restaurant recommendation. In: International Conference on Database and Expert Systems Applications, pp. 430–438. Springer (2006)
Li, C.-L., Su, Y.-C., Lin, T.-W., Tsai, C.-H., Chang, W.-C., Huang, K.-H., Kuo, T.-M., Lin, S.-W., Lin, Y.-S., Lu, Y.-C. et al.: Combination of feature engineering and ranking models for paper-author identification in KDD cup 2013. In: Proceedings of the 2013 KDD Cup Workshop, p. 2. ACM (2013)
Liu, J., Lei, K.H., Liu, J.Y., Wang, C., Han, J.: Ranking-based name matching for author disambiguation in bibliographic data. In: Proceedings of the 2013 KDD Cup Workshop, p. 8. ACM (2013)
Ma, Z., Pant, G., Sheng, O.R.L.: Interest-based personalized search. ACM Trans. Inf. Syst. 25(1), 5 (2007)
Manning, C.D., Raghavan, P., Schütze, H.: Scoring, term weighting and the vector space model. Introd. Inf. Retr. 100, 2–4 (2008)
Marshakova-Shaikevich, I.: System of document connections based on references. Sci. Tech. Inf. Ser. VINITI 6, 3–8 (1973)
McNee, S.M., Cosley, Istvan, D., Gopalkrishnan, P., Lam, S.K., Rashid, A.M., Konstan, J.A., Riedl, J.: On the recommending of citations for research papers. In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work (2002)
McNee, S.M., Riedl, J., Konstan, J.A.: Being accurate is not enough: how accuracy metrics have hurt recommender systems. In: CHI’06 Extended Abstracts on Human Factors in Computing Systems, pp. 1097–1101 (2006)
Meta: Meta (2020). https://www.meta.org/
Middleton, S.E., Shadbolt, N.R., De Roure, D.C.: Ontological user profiling in recommender systems. ACM Trans. Inf. Syst. 22(1), 54–88 (2004)
Mogenet, A., Pham, T.A.N., Kazama, M., Kong, J.: Predicting online performance of job recommender systems with offline evaluation. In: Proceedings of the 13th ACM Conference on Recommender Systems, pp. 477–480 (2019)
Molyneux, S.D., Molyneux, A.C.: System and method for establishing a dynamic meta-knowledge network. US Patent 9,613,321. (Apr. 4 2017)
Moskovitch, R., Wang, F., Pei, J., Friedman, C.: JASIST special issue on biomedical information retrieval. J. Assoc. Inf. Sci. Technol. 68(11), 2525–2528 (2017)
Nelson, S.J.: Medical terminologies that work: the example of MeSH. In: Proceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN), pp. 380–384. IEEE (2009)
Newman, M.E.: The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. 98(2), 404–409 (2001)
Noei, E., Heydarnoori, A.: Exaf: a search engine for sample applications of object-oriented framework-provided concepts. Inf. Softw. Technol. 75, 135–147 (2016)
Noei, E., Zhang, F., Wang, S., Zou, Y.: Towards prioritizing user-related issue reports of mobile applications. Empir. Softw. Eng. 24, 1–33 (2018)
Plume, A., van Weijen, D.: Publish or perish? The rise of the fractional author. Res. Trends 38(3), 16–18 (2014)
PubMed Help, November 27, 2017. http://www.ncbi.nlm.nih.gov/books/NBK3827/. Last accessed 15 Dec 2017
Raamkumar, A.S., Foo, S., Pang, N.: Can i have more of these please? Assisting researchers in finding similar research papers from a seed basket of papers. Electron. Libr. 36(3), 568–587 (2018)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press (2011)
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work. New York, NY, USA, pp. 175–186. ACM (1994)
Said, A., Fields, B., Jain, B.J., Albayrak, S.: User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1399–1408 (2013)
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web, pp. 285–295. ACM (2001)
Schalekamp, F., Zuylen, A.: Rank aggregation: together we are strong. In: Proceedings of the 11th Workshop on Algorithm Engineering and Experiments, pp. 38–51 (1998)
Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 253–260 (2002)
Scott, A.J., Knott, M.: A cluster analysis method for grouping means in the analysis of variance. Biometrics 30, 507–512 (1974)
Semantic Scholar: Semantic scholar (2019). https://www.semanticscholar.org/. Last accessed 15 Jan 2019
Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Recommender Systems Handbook, pp. 257–297. Springer (2011)
Shvachko, K., Kuang, H., Radia, S., Chansler, R., et al.: The hadoop distributed file system. MSST 10, 1–10 (2010)
Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24(4), 265–269 (1973)
Smyth, B., McClave, P.: Similarity vs. diversity. In: International Conference on Case-Based Reasoning, pp. 347–361. Springer (2001)
Sugiyama, K., Kan, M.-Y.: Serendipitous recommendation for scholarly papers considering relations among researchers. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, pp. 307–310. ACM (2011)
Sugiyama, K., Kan, M.-Y.: A comprehensive evaluation of scholarly paper recommendation using potential citation papers. Int. J. Digit. Libr. 16(2), 91–109 (2015)
Tan, P.-N.: Introduction to Data Mining. Pearson Education India (2018)
Testa, J.: The Thomson Reuters journal selection process (2016). http://scientific.thomsonreuters.com/wok/benefits/essays/journalselection/. Last accessed 15 Dec 2017
Zar, J.H.: Significance testing of the spearman rank correlation coefficient. J. Am. Stat. Assoc. 67(339), 578–580 (1972)
Acknowledgements
This research was supported in part by a Natural Sciences and Engineering Research Council (NSERC) Engage Grant and an NSERC Strategic Partnership Project Grant. The authors wish to recognize contributions of Bahar Ghadiri Bashardoost and Yuyang Liu.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Noei, E., Hayat, T., Perrie, J. et al. A qualitative study of large-scale recommendation algorithms for biomedical knowledge bases. Int J Digit Libr 22, 197–215 (2021). https://doi.org/10.1007/s00799-021-00300-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-021-00300-3