Hidden features identification for designing an efficient research article recommendation system

Chaudhuri, Arpita; Sinhababu, Nilanjan; Sarma, Monalisa; Samanta, Debasis

doi:10.1007/s00799-021-00301-2

Hidden features identification for designing an efficient research article recommendation system

Published: 30 April 2021

Volume 22, pages 233–249, (2021)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

Arpita Chaudhuri¹,
Nilanjan Sinhababu¹,
Monalisa Sarma¹ &
…
Debasis Samanta ORCID: orcid.org/0000-0002-6104-3771¹

576 Accesses
10 Citations
2 Altmetric
Explore all metrics

Abstract

The digital repository of research articles is increasing at a rapid rate and hence searching the right paper becoming a tedious task for researchers. A research paper recommendation system is advocated to help researchers in this context. In the process of designing such a system, proper representation of articles, more specifically, feature identification and extraction are two essential tasks. The existing approaches mainly consider direct features which are readily available from research articles. However, there are certain features which are not readily available from a paper, but may greatly influence the performance of recommendation systems. This paper proposes four indirect features: keyword diversification, text complexity, citation analysis over time, and scientific quality measurement to represent a research article. The keyword diversification measures the uniqueness of the keywords of a paper which helps variation in recommendation. The text complexity measurement helps to provide a paper by matching the user’s understandability level. The citation analysis over time decides the relevancy of a paper. The scientific quality measurement helps to measure the scientific values of papers. Formal definitions of the proposed indirect features, schemes to extract the feature values given a research article, and metrics to measure them quantitatively are discussed in this paper. To substantiate the efficacy of the proposed features, a number of experiments have been carried out. The experimental results reveal that the proposed indirect features uniquely define a research article than the direct features. Given a research paper, extraction of feature vector is computationally fast and thus feasible to filter a large corpus of papers in real time. More significantly, indirect features are matchable with user’s profile features, thus satisfying an important criterion in collaborative filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recommender Systems: Techniques, Applications, and Challenges

A systematic review: machine learning based recommendation systems for e-learning

Article 14 December 2019

A systematic review and research perspective on recommender systems

Article Open access 03 May 2022

Notes

References

Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Recommender systems survey. Knowl. Based Syst. 46, 109 (2013)
Article Google Scholar
Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734 (2005)
Article Google Scholar
Bai, X., Wang, M., Lee, I., Yang, Z., Kong, X., Xia, F.: Scientific paper recommendation: a survey. IEEE Access 7, 9324 (2019)
Article Google Scholar
Cohan, A., Goharian, N.: Scientific document summarization via citation contextualization and scientific discourse. Int. J. Digit. Libr. 19(2–3), 287 (2018)
Article Google Scholar
Beierle, F., Aizawa, A., Collins, A., Beel, J.: Choice overload and recommendation effectiveness in related-article recommendations. Int. J. Digit. Libr. 21, 231–246 (2019). https://doi.org/10.1007/s00799-019-00270-7
Article Google Scholar
Ishida, Y., Shimizu, T., Yoshikawa, M.: An analysis and comparison of keyword recommendation methods for scientific data. Int. J. Digit. Libr. 21, 307–327 (2020). https://doi.org/10.1007/s00799-020-00279-3
Article Google Scholar
Sugiyama, K., Kan, M.Y.: Scholarly paper recommendation via user’s recent research interests. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries (JCDL’10), pp. 29–38. ACM, Queensland (2010)
Beel, J., Gipp, B., Langer, S., Breitinger, C.: Int. J. Digit. Libr. 17(4), 305 (2016)
Article Google Scholar
Chakraborty, T., Krishna, A., Singh, M., Ganguly, N., Goyal, P., Mukherjee, A.: In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 528–541. Springer, Auckland (2016)
Sugiyama, K., Kan, M.Y.: A comprehensive evaluation of scholarly paper recommendation using potential citation papers. Int. J. Digit. Libr. 16(2), 91 (2015)
Article Google Scholar
Lops, P., Jannach, D., Musto, C., Bogers, T., Koolen, M.: Trends in content-based recommendation. User Model. User-Adap. Inter. 29(2), 239 (2019)
Article Google Scholar
Sugiyama, K., Kan, M.Y.Z: Exploiting potential citation papers in scholarly paper recommendation. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’13) (New York, NY, USA, 2013), pp. 153–162
Haruna, K., Ismail, M.A., Damiasih, D., Sutopo, J., Herawan, T.: A collaborative approach for research paper recommender system. PLoS ONE 12(10), 1 (2017). https://doi.org/10.1371/journal.pone.0184516
Article Google Scholar
Li, Q.C., Dong, Z.H., Li, T.: Research of information recommendation system based on reading behavior. In: Proceedings of the IEEE International Conference on Machine Learning and Cybernetics, vol. 3, pp. 1626–1631. IEEE, Kunming (2008)
Sharma, R., Gopalani, D., Meena, Y.: Concept-based approach for research paper recommendation. In: Proceedings of the International Conference on Pattern Recognition and Machine Intelligence, pp. 687–692. Springer, Kolkata (2017)
Li, C.L., Su, Y.C., Lin, T.W., Tsai, C.H., Chang, W.C., Huang, K.H., Kuo, T.M., Lin, S.W., Lin, Y.S., Lu, Y.C., et al.: Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013. J. Mach. Learn. Res. 16(1), 2921 (2015)
Google Scholar
Lee, J., Lee, K., Kim, J.G., Kim, S.: Personalized academic paper recommendation system. In: Proceedings of the 6th International Workshop on Social Recommender Systems (SRS’15), pp. 1–4. ACM, Sydney (2015)
Dhanda, M., Verma, V.: Recommender system for academic literature with incremental dataset. Procedia Comput. Sci. 89, 483 (2016)
Article Google Scholar
Liu, X.Y., Chien, B.C.: Applying citation network analysis on recommendation of research paper collection. In: Proceedings of the 4th Multidisciplinary International Social Networks Conference (MISNC’17), pp. 1–6. ACM, Bangkok (2017)
Kanakia, A., Shen, Z., Eide, D., Wang, K.: A scalable hybrid research paper recommender system for microsoft academic. In: The World Wide Web Conference (WWW’19), pp. 2893–2899. ACM, San Francisco (2019)
Bogers, T., Van den Bosch, A.: Recommending scientific articles using citeulike. In: Proceedings of the International Conference on Recommender Systems (RecSys’08), pp. 287–290. ACM, Lausanne (2008)
Sugiyama, K., Hatano, K., Yoshikawa, M.: Adaptive web search based on user profile constructed without any effort from users. In: Proceedings of the 13th International Conference on World Wide Web (WWW’04), pp. 675–684. ACM, New York (2004)
Musto, C.: Enhanced vector space models for content-based recommender systems. In: Proceedings of the 4th ACM Conference on Recommender Systems (RecSys’10), pp. 361–364. ACM, Barcelona (2010)
Nouali, O., Blache, P.: A semantic vector space and features-based approach for automatic information filtering. Expert Syst. Appl. 26(2), 171 (2004)
Article Google Scholar
Ferrara, F., Pudota, N., Tasso, C.: A keyphrase-based paper recommender system. In: Proceedings of the 7th Italian Research Conference on Digital Libraries (IRCDL’11), pp. 14–25. Springer, Pisa (2011)
Beel, J., Langer, S., Gipp, B., Nürnberger, A.: The architecture and datasets of Docear’s research paper recommender system. D-Lib Mag. 20(11/12), 1 (2014). https://doi.org/10.1045/november14-beel
Article Google Scholar
Basu, C., Hirsh, H., Cohen, W.W., Nevill-Manning, C.: Technical paper recommendation: a study in combining multiple information sources. J. Artif. Intell. Res. 14, 231 (2001)
Article Google Scholar
Hong, K., Jeon, H., Jeon, C.: UserProfile-based personalized research paper recommendation system. In: Proceedings of the 8th IEEE International Conference on Computing and Networking Technology, pp. 134–138. IEEE, Gueongju (2012)
Jomsri, P., Sanguansintukul, S., Choochaiwattana, W.: A framework for tag-based research paper recommender system: an IR approach. In: Proceedings of the IEEE 24th International Conference on Advanced Information Networking and Applications Workshops, pp. 103–108. IEEE, Perth (2010)
Gautam, J., Kumar, E.: An improved framework for tag-based academic information sharing and recommendation system. In: Proceedings of the World Congress on Engineering (WCE’12), vol. 2, pp. 1–6. London (2012)
Zhao, R., Mao, K.: Fuzzy bag-of-words model for document representation. IEEE Trans. Fuzzy Syst. 26(2), 794 (2017)
Article Google Scholar
Zhang, C., Ma, Y.: Ensemble Machine Learning: Methods and Applications. Springer, Berlin (2012)
Book Google Scholar
Peng, F., McCallum, A.: Information extraction from research papers using conditional random fields. Inf. Process. Manag. 42(4), 963 (2006)
Article Google Scholar
Nasar, Z., Jaffry, S.W., Malik, M.K.: Textual keyword extraction and summarization: state-of-the-art. Inf. Process. Manag. 56(6), 1 (2019). https://doi.org/10.1016/j.ipm.2019.102088
Article Google Scholar
Son, J., Kim, S.B.: Academic paper recommender system using multilevel simultaneous citation networks. Decis. Support Syst. 105, 24 (2018)
Article Google Scholar
Raamkumar, A.S., Foo, S., Pang, N.: Using author-specified keywords in building an initial reading list of research papers in scientific paper retrieval and recommender systems. Inf. Process. Manag. 53(3), 577 (2017)
Article Google Scholar
Sun, J., Jiang, Y., Cheng, X., Du, W., Liu, Y., Ma, J.: A hybrid approach for article recommendation in research social networks. J. Inf. Sci. 44(5), 696 (2018)
Article Google Scholar
Bulut, B., Kaya, B., Kaya, M.: A paper recommendation system based on user interest and citations. In: Proceedings of the 1st International Informatics and Software Engineering Conference (UBMYK), pp. 1–5. IEEE, Ankara (2019)
Liu, H., Kou, H., Yan, C., Qi, L.: Keywords-driven and popularity-aware paper recommendation based on undirected paper citation graph. Complexity 2020, 1 (2020). https://doi.org/10.1155/2020/2085638
Article Google Scholar
Wenige, L., Ruhland, J.: Retrieval by recommendation: using LOD technologies to improve digital library search. Int. J. Digit. Libr. 19(2–3), 253 (2018)
Article Google Scholar
White, H.D.: Bag of works retrieval: TF*IDF weighting of works co-cited with a seed. Int. J. Digit. Libr. 19(2–3), 139 (2018)
Article Google Scholar
Patra, B.G., Maroufy, V., Soltanalizadeh, B., Deng, N., Zheng, W.J., Roberts, K., Wu, H.: A content-based literature recommendation system for datasets to improve data reusability-a case study on gene expression omnibus (GEO) datasets. J. Biomed. Inform. 104, 1 (2020). https://doi.org/10.1016/j.jbi.2020.103399
Article Google Scholar
Wang, G., He, X., Ishuga, C.I.: HAR-SI: a novel hybrid article recommendation approach integrating with social information in scientific social network. Knowl. Based Syst. 148, 85 (2018)
Article Google Scholar
Han, J., Kamber, M., Pei, J.: Data mining concepts and techniques third edition. In: Morgan Kaufmann Series in Data Management Systems (2011)
Tonelli, S., Manh, K.T., Pianta, E.: Making readability indices readable. In: Proceedings of the First Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 40–48. Association for Computational Linguistics, Montreal (2012)
Leech, G.: The state of the art in corpus linguistics. In: English Corpus Linguistics. Routledge, pp. 20–41 (2014)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag 24(5), 513 (1988)
Article Google Scholar
Aizawa, A.: Inf. Process. Manag. 39(1), 45 (2003)
Article Google Scholar
Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pp. 63–70. Association for Computational Linguistics, Philadelphia (2002)
Rizkallah, J., Sin, D.D.: Integrative approach to quality assessment of medical journals using impact factor, eigenfactor, and article influence scores. PLoS ONE 5(4), e10204 (2010)
Article Google Scholar
Pluskiewicz, W., Drozdzowska, B., Adamczyk, P., Noga, K.: Scientific quality index: a composite size-independent metric compared with h-index for 480 medical researchers. Scientometrics 119(2), 1009 (2019)
Article Google Scholar
Ronda-Pupo, G.A.: Research evaluation of author’s citation-based performance through the relative author superiority index. Transinformação 29(2), 191 (2017)
Gagolewski, M., Mesiar, R.: Aggregating different paper quality measures with a generalized h-index. J. Inf. 6(4), 566 (2012)
Google Scholar
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5(Aug), 845 (2004)
MathSciNet MATH Google Scholar
Chormunge, S., Jena, S.: Correlation based feature selection with clustering for high dimensional data. J. Electr. Syst. Inf. Technol. 5(3), 542 (2018)
Article Google Scholar
Dash, M., Liu, H.: Feature selection for clustering. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data mining, pp. 110–121. Springer, Kyoto (2000)
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., PéRez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243 (2013)
Article Google Scholar
Bholowalia, P., Kumar, A.: EBK-means: a clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl. 105(9), 17 (2014)
Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS’05), pp. 507–514. British Columbia, Vancouver (2005)
Alfaro, E., Gámez, M., García, N.: Ensemble classifiers methods. In: Ensemble Classifiers Methods, Chap. 3, pp. 31–50. Wiley (2018). https://doi.org/10.1002/9781119421566.ch3
Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications, p. 37 (2014)
Thanh Noi, P., Kappas, M.: Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors 18(1), 1 (2018)
Article Google Scholar
Titsias, M., Lawrence, N.D.: Bayesian Gaussian process latent variable model. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS’10), pp. 844–851. Chia Laguna Resort, Sardinia (2010)
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009). https://doi.org/10.4249/scholarpedia.1883
Article Google Scholar
Wang, L.: Support vector machines: theory and applications. In: Support Vector Machines: Theory and Applications, vol. 177. Springer (2005)
Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Mullers, K.R.: Fisher discriminant analysis with kernels. In: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No. 98TH8468), pp. 41–48. IEEE, Madison (1999)
Razali, N.M., Wah, Y.B., et al.: Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J. Stat. Model. Anal. 2(1), 21 (2011)
Google Scholar
Aldor-Noiman, S., Brown, L.D., Buja, A., Rolke, W., Stine, R.A.: The power to see: a new graphical test of normality. Am. Stat. 67(4), 249 (2013)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Kharagpur, Kharagpur, India
Arpita Chaudhuri, Nilanjan Sinhababu, Monalisa Sarma & Debasis Samanta

Authors

Arpita Chaudhuri
View author publications
You can also search for this author in PubMed Google Scholar
Nilanjan Sinhababu
View author publications
You can also search for this author in PubMed Google Scholar
Monalisa Sarma
View author publications
You can also search for this author in PubMed Google Scholar
Debasis Samanta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debasis Samanta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chaudhuri, A., Sinhababu, N., Sarma, M. et al. Hidden features identification for designing an efficient research article recommendation system. Int J Digit Libr 22, 233–249 (2021). https://doi.org/10.1007/s00799-021-00301-2

Download citation

Received: 13 June 2020
Revised: 03 March 2021
Accepted: 02 April 2021
Published: 30 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00799-021-00301-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hidden features identification for designing an efficient research article recommendation system

Abstract

Access this article

Similar content being viewed by others

Recommender Systems: Techniques, Applications, and Challenges

A systematic review: machine learning based recommendation systems for e-learning

A systematic review and research perspective on recommender systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hidden features identification for designing an efficient research article recommendation system

Abstract

Access this article

Similar content being viewed by others

Recommender Systems: Techniques, Applications, and Challenges

A systematic review: machine learning based recommendation systems for e-learning

A systematic review and research perspective on recommender systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation