Skip to main content
Log in

Hidden features identification for designing an efficient research article recommendation system

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

The digital repository of research articles is increasing at a rapid rate and hence searching the right paper becoming a tedious task for researchers. A research paper recommendation system is advocated to help researchers in this context. In the process of designing such a system, proper representation of articles, more specifically, feature identification and extraction are two essential tasks. The existing approaches mainly consider direct features which are readily available from research articles. However, there are certain features which are not readily available from a paper, but may greatly influence the performance of recommendation systems. This paper proposes four indirect features: keyword diversification, text complexity, citation analysis over time, and scientific quality measurement to represent a research article. The keyword diversification measures the uniqueness of the keywords of a paper which helps variation in recommendation. The text complexity measurement helps to provide a paper by matching the user’s understandability level. The citation analysis over time decides the relevancy of a paper. The scientific quality measurement helps to measure the scientific values of papers. Formal definitions of the proposed indirect features, schemes to extract the feature values given a research article, and metrics to measure them quantitatively are discussed in this paper. To substantiate the efficacy of the proposed features, a number of experiments have been carried out. The experimental results reveal that the proposed indirect features uniquely define a research article than the direct features. Given a research paper, extraction of feature vector is computationally fast and thus feasible to filter a large corpus of papers in real time. More significantly, indirect features are matchable with user’s profile features, thus satisfying an important criterion in collaborative filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://www.aparavi.com/data-growth-statistics-blow-your-mind/

  2. https://towardsdatascience.com/hot-topics-in-ai-research-4367bdd93564mlk9”.

  3. DataSources: https://www.scopus.com/search/form.uri?display=basic.

References

  1. Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Recommender systems survey. Knowl. Based Syst. 46, 109 (2013)

    Article  Google Scholar 

  2. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734 (2005)

    Article  Google Scholar 

  3. Bai, X., Wang, M., Lee, I., Yang, Z., Kong, X., Xia, F.: Scientific paper recommendation: a survey. IEEE Access 7, 9324 (2019)

    Article  Google Scholar 

  4. Cohan, A., Goharian, N.: Scientific document summarization via citation contextualization and scientific discourse. Int. J. Digit. Libr. 19(2–3), 287 (2018)

    Article  Google Scholar 

  5. Beierle, F., Aizawa, A., Collins, A., Beel, J.: Choice overload and recommendation effectiveness in related-article recommendations. Int. J. Digit. Libr. 21, 231–246 (2019). https://doi.org/10.1007/s00799-019-00270-7

    Article  Google Scholar 

  6. Ishida, Y., Shimizu, T., Yoshikawa, M.: An analysis and comparison of keyword recommendation methods for scientific data. Int. J. Digit. Libr. 21, 307–327 (2020). https://doi.org/10.1007/s00799-020-00279-3

    Article  Google Scholar 

  7. Sugiyama, K., Kan, M.Y.: Scholarly paper recommendation via user’s recent research interests. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries (JCDL’10), pp. 29–38. ACM, Queensland (2010)

  8. Beel, J., Gipp, B., Langer, S., Breitinger, C.: Int. J. Digit. Libr. 17(4), 305 (2016)

    Article  Google Scholar 

  9. Chakraborty, T., Krishna, A., Singh, M., Ganguly, N., Goyal, P., Mukherjee, A.: In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 528–541. Springer, Auckland (2016)

  10. Sugiyama, K., Kan, M.Y.: A comprehensive evaluation of scholarly paper recommendation using potential citation papers. Int. J. Digit. Libr. 16(2), 91 (2015)

    Article  Google Scholar 

  11. Lops, P., Jannach, D., Musto, C., Bogers, T., Koolen, M.: Trends in content-based recommendation. User Model. User-Adap. Inter. 29(2), 239 (2019)

    Article  Google Scholar 

  12. Sugiyama, K., Kan, M.Y.Z: Exploiting potential citation papers in scholarly paper recommendation. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’13) (New York, NY, USA, 2013), pp. 153–162

  13. Haruna, K., Ismail, M.A., Damiasih, D., Sutopo, J., Herawan, T.: A collaborative approach for research paper recommender system. PLoS ONE 12(10), 1 (2017). https://doi.org/10.1371/journal.pone.0184516

    Article  Google Scholar 

  14. Li, Q.C., Dong, Z.H., Li, T.: Research of information recommendation system based on reading behavior. In: Proceedings of the IEEE International Conference on Machine Learning and Cybernetics, vol. 3, pp. 1626–1631. IEEE, Kunming (2008)

  15. Sharma, R., Gopalani, D., Meena, Y.: Concept-based approach for research paper recommendation. In: Proceedings of the International Conference on Pattern Recognition and Machine Intelligence, pp. 687–692. Springer, Kolkata (2017)

  16. Li, C.L., Su, Y.C., Lin, T.W., Tsai, C.H., Chang, W.C., Huang, K.H., Kuo, T.M., Lin, S.W., Lin, Y.S., Lu, Y.C., et al.: Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013. J. Mach. Learn. Res. 16(1), 2921 (2015)

    Google Scholar 

  17. Lee, J., Lee, K., Kim, J.G., Kim, S.: Personalized academic paper recommendation system. In: Proceedings of the 6th International Workshop on Social Recommender Systems (SRS’15), pp. 1–4. ACM, Sydney (2015)

  18. Dhanda, M., Verma, V.: Recommender system for academic literature with incremental dataset. Procedia Comput. Sci. 89, 483 (2016)

    Article  Google Scholar 

  19. Liu, X.Y., Chien, B.C.: Applying citation network analysis on recommendation of research paper collection. In: Proceedings of the 4th Multidisciplinary International Social Networks Conference (MISNC’17), pp. 1–6. ACM, Bangkok (2017)

  20. Kanakia, A., Shen, Z., Eide, D., Wang, K.: A scalable hybrid research paper recommender system for microsoft academic. In: The World Wide Web Conference (WWW’19), pp. 2893–2899. ACM, San Francisco (2019)

  21. Bogers, T., Van den Bosch, A.: Recommending scientific articles using citeulike. In: Proceedings of the International Conference on Recommender Systems (RecSys’08), pp. 287–290. ACM, Lausanne (2008)

  22. Sugiyama, K., Hatano, K., Yoshikawa, M.: Adaptive web search based on user profile constructed without any effort from users. In: Proceedings of the 13th International Conference on World Wide Web (WWW’04), pp. 675–684. ACM, New York (2004)

  23. Musto, C.: Enhanced vector space models for content-based recommender systems. In: Proceedings of the 4th ACM Conference on Recommender Systems (RecSys’10), pp. 361–364. ACM, Barcelona (2010)

  24. Nouali, O., Blache, P.: A semantic vector space and features-based approach for automatic information filtering. Expert Syst. Appl. 26(2), 171 (2004)

    Article  Google Scholar 

  25. Ferrara, F., Pudota, N., Tasso, C.: A keyphrase-based paper recommender system. In: Proceedings of the 7th Italian Research Conference on Digital Libraries (IRCDL’11), pp. 14–25. Springer, Pisa (2011)

  26. Beel, J., Langer, S., Gipp, B., Nürnberger, A.: The architecture and datasets of Docear’s research paper recommender system. D-Lib Mag. 20(11/12), 1 (2014). https://doi.org/10.1045/november14-beel

    Article  Google Scholar 

  27. Basu, C., Hirsh, H., Cohen, W.W., Nevill-Manning, C.: Technical paper recommendation: a study in combining multiple information sources. J. Artif. Intell. Res. 14, 231 (2001)

    Article  Google Scholar 

  28. Hong, K., Jeon, H., Jeon, C.: UserProfile-based personalized research paper recommendation system. In: Proceedings of the 8th IEEE International Conference on Computing and Networking Technology, pp. 134–138. IEEE, Gueongju (2012)

  29. Jomsri, P., Sanguansintukul, S., Choochaiwattana, W.: A framework for tag-based research paper recommender system: an IR approach. In: Proceedings of the IEEE 24th International Conference on Advanced Information Networking and Applications Workshops, pp. 103–108. IEEE, Perth (2010)

  30. Gautam, J., Kumar, E.: An improved framework for tag-based academic information sharing and recommendation system. In: Proceedings of the World Congress on Engineering (WCE’12), vol. 2, pp. 1–6. London (2012)

  31. Zhao, R., Mao, K.: Fuzzy bag-of-words model for document representation. IEEE Trans. Fuzzy Syst. 26(2), 794 (2017)

    Article  Google Scholar 

  32. Zhang, C., Ma, Y.: Ensemble Machine Learning: Methods and Applications. Springer, Berlin (2012)

    Book  Google Scholar 

  33. Peng, F., McCallum, A.: Information extraction from research papers using conditional random fields. Inf. Process. Manag. 42(4), 963 (2006)

    Article  Google Scholar 

  34. Nasar, Z., Jaffry, S.W., Malik, M.K.: Textual keyword extraction and summarization: state-of-the-art. Inf. Process. Manag. 56(6), 1 (2019). https://doi.org/10.1016/j.ipm.2019.102088

    Article  Google Scholar 

  35. Son, J., Kim, S.B.: Academic paper recommender system using multilevel simultaneous citation networks. Decis. Support Syst. 105, 24 (2018)

    Article  Google Scholar 

  36. Raamkumar, A.S., Foo, S., Pang, N.: Using author-specified keywords in building an initial reading list of research papers in scientific paper retrieval and recommender systems. Inf. Process. Manag. 53(3), 577 (2017)

    Article  Google Scholar 

  37. Sun, J., Jiang, Y., Cheng, X., Du, W., Liu, Y., Ma, J.: A hybrid approach for article recommendation in research social networks. J. Inf. Sci. 44(5), 696 (2018)

    Article  Google Scholar 

  38. Bulut, B., Kaya, B., Kaya, M.: A paper recommendation system based on user interest and citations. In: Proceedings of the 1st International Informatics and Software Engineering Conference (UBMYK), pp. 1–5. IEEE, Ankara (2019)

  39. Liu, H., Kou, H., Yan, C., Qi, L.: Keywords-driven and popularity-aware paper recommendation based on undirected paper citation graph. Complexity 2020, 1 (2020). https://doi.org/10.1155/2020/2085638

    Article  Google Scholar 

  40. Wenige, L., Ruhland, J.: Retrieval by recommendation: using LOD technologies to improve digital library search. Int. J. Digit. Libr. 19(2–3), 253 (2018)

    Article  Google Scholar 

  41. White, H.D.: Bag of works retrieval: TF*IDF weighting of works co-cited with a seed. Int. J. Digit. Libr. 19(2–3), 139 (2018)

    Article  Google Scholar 

  42. Patra, B.G., Maroufy, V., Soltanalizadeh, B., Deng, N., Zheng, W.J., Roberts, K., Wu, H.: A content-based literature recommendation system for datasets to improve data reusability-a case study on gene expression omnibus (GEO) datasets. J. Biomed. Inform. 104, 1 (2020). https://doi.org/10.1016/j.jbi.2020.103399

    Article  Google Scholar 

  43. Wang, G., He, X., Ishuga, C.I.: HAR-SI: a novel hybrid article recommendation approach integrating with social information in scientific social network. Knowl. Based Syst. 148, 85 (2018)

    Article  Google Scholar 

  44. Han, J., Kamber, M., Pei, J.: Data mining concepts and techniques third edition. In: Morgan Kaufmann Series in Data Management Systems (2011)

  45. Tonelli, S., Manh, K.T., Pianta, E.: Making readability indices readable. In: Proceedings of the First Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 40–48. Association for Computational Linguistics, Montreal (2012)

  46. Leech, G.: The state of the art in corpus linguistics. In: English Corpus Linguistics. Routledge, pp. 20–41 (2014)

  47. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag 24(5), 513 (1988)

    Article  Google Scholar 

  48. Aizawa, A.: Inf. Process. Manag. 39(1), 45 (2003)

    Article  Google Scholar 

  49. Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pp. 63–70. Association for Computational Linguistics, Philadelphia (2002)

  50. Rizkallah, J., Sin, D.D.: Integrative approach to quality assessment of medical journals using impact factor, eigenfactor, and article influence scores. PLoS ONE 5(4), e10204 (2010)

    Article  Google Scholar 

  51. Pluskiewicz, W., Drozdzowska, B., Adamczyk, P., Noga, K.: Scientific quality index: a composite size-independent metric compared with h-index for 480 medical researchers. Scientometrics 119(2), 1009 (2019)

    Article  Google Scholar 

  52. Ronda-Pupo, G.A.: Research evaluation of author’s citation-based performance through the relative author superiority index. Transinformação 29(2), 191 (2017)

  53. Gagolewski, M., Mesiar, R.: Aggregating different paper quality measures with a generalized h-index. J. Inf. 6(4), 566 (2012)

    Google Scholar 

  54. Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5(Aug), 845 (2004)

    MathSciNet  MATH  Google Scholar 

  55. Chormunge, S., Jena, S.: Correlation based feature selection with clustering for high dimensional data. J. Electr. Syst. Inf. Technol. 5(3), 542 (2018)

    Article  Google Scholar 

  56. Dash, M., Liu, H.: Feature selection for clustering. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data mining, pp. 110–121. Springer, Kyoto (2000)

  57. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., PéRez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243 (2013)

    Article  Google Scholar 

  58. Bholowalia, P., Kumar, A.: EBK-means: a clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl. 105(9), 17 (2014)

    Google Scholar 

  59. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS’05), pp. 507–514. British Columbia, Vancouver (2005)

  60. Alfaro, E., Gámez, M., García, N.: Ensemble classifiers methods. In: Ensemble Classifiers Methods, Chap. 3, pp. 31–50. Wiley (2018). https://doi.org/10.1002/9781119421566.ch3

  61. Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications, p. 37 (2014)

  62. Thanh Noi, P., Kappas, M.: Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors 18(1), 1 (2018)

    Article  Google Scholar 

  63. Titsias, M., Lawrence, N.D.: Bayesian Gaussian process latent variable model. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS’10), pp. 844–851. Chia Laguna Resort, Sardinia (2010)

  64. Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009). https://doi.org/10.4249/scholarpedia.1883

    Article  Google Scholar 

  65. Wang, L.: Support vector machines: theory and applications. In: Support Vector Machines: Theory and Applications, vol. 177. Springer (2005)

  66. Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Mullers, K.R.: Fisher discriminant analysis with kernels. In: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No. 98TH8468), pp. 41–48. IEEE, Madison (1999)

  67. Razali, N.M., Wah, Y.B., et al.: Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J. Stat. Model. Anal. 2(1), 21 (2011)

    Google Scholar 

  68. Aldor-Noiman, S., Brown, L.D., Buja, A., Rolke, W., Stine, R.A.: The power to see: a new graphical test of normality. Am. Stat. 67(4), 249 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debasis Samanta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaudhuri, A., Sinhababu, N., Sarma, M. et al. Hidden features identification for designing an efficient research article recommendation system. Int J Digit Libr 22, 233–249 (2021). https://doi.org/10.1007/s00799-021-00301-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-021-00301-2

Keywords

Navigation