Skip to main content
Log in

An artificial intelligence-based framework for data-driven categorization of computer scientists: a case study of world’s Top 10 computing departments

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The total number of published articles and the resulting citations are generally acknowledged as suitable criteria of the scientist’s evaluation. However, it is challenging to determine the ranking of scientists as the value of their scientific work (at times) is not directly reflective of the abovementioned aspects. In this regard, multiple other elements needs to be examined in combination for better evaluating the scientific worth of an individual. This work presents a learning-based technique, i.e., an Artificial Intelligence (AI)-based solution towards categorizing scientists utilizing a multifaceted criteria. In this context, a novel ranking metric is proposed which is grounded on authorship, experience, publications count, total citations, i10-index, and h-index. To assess the proposed framework’s performance, a dataset is collected considering the world’s top ten computing departments and ten domestic ones. This results in a data of 1000 computer scientists. The dataset is preprocessed and afterwards three techniques for feature selection are employed, i.e., Mutual Information (MI), Chi-Square (X2), and Fisher-Test (F-Test) to rank the features in the data. To validate the collected data, the framework has three clustering techniques as well, namely, k-medoids, k-means, and spectral clustering to identify the optimum number of heterogeneous groups. Three cluster validity indices are used to evaluate the clustering outcomes, namely, Calinski-Harabasz Index (CHI), Davies Bouldin Index (DBI), and Silhouette Coefficient (SC). Once the optimum clusters are obtained, five classification procedures are used, including, Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN), Decision Tree (DT), Gaussian Naive Bayes (GNB), and Linear Regression Classifier (LRC) to predict the category of a previously unknown scientist. Among all classifiers, an average accuracy of 94.44% is shown by the ANN to predict an unknown/new scientist category. The current proposal is also compared with closely related past works. The proposed framework offers the possibility to independently classify scientists based on AI techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Amjad, T., Daud, A., Aljohani, N.R., 2018. Ranking authors in academic social networks: a survey. Library Hi Tech .

  • Bartneck, C., & Kokkelmans, S. (2011). Detecting h-index manipulation through self-citation analysis. Scientometrics, 87, 85–98.

    Article  Google Scholar 

  • Bornmann, L., & Daniel, H. D. (2007). What do we know about the h index? Journal of the American Society for Information Science and Technology, 58, 1381–1385.

    Article  Google Scholar 

  • Bouyssou, D., & Marchant, T. (2016). Ranking authors using fractional counting of citations: An axiomatic approach. Journal of Informetrics, 10, 183–199.

    Article  Google Scholar 

  • Carpenter, M. P., & Narin, F. (1981). The adequacy of the science citation index (sci) as an indicator of international scientific activity. Journal of the American Society for Information Science, 32, 430–439.

    Article  Google Scholar 

  • Chu, W., & Keerthi, S. S. (2007). Support vector ordinal regression. Neural Computation, 19, 792–815.

    Article  MathSciNet  MATH  Google Scholar 

  • Connor, J., 2011. Google scholar citations open to all. Google Scholar Blog.

  • Dey, L., Chakraborty, S., Biswas, A., Bose, B., & Tiwari, S. (2016). Sentiment analysis of review datasets using naive bayes and k-nn classifier. arXiv Preprint arXiv: 1610.09982. https://doi.org/10.48550/arXiv.1610.09982

    Article  Google Scholar 

  • Dhamdhere, S. N. (2018). Cumulative citations index, h-index and i10-index (research metrics) of an educational institute: A case study. International Journal of Library and Information Science, 10, 1–9.

    Article  Google Scholar 

  • Ding, Y. (2011). Applying weighted pagerank to author citation networks. Journal of the American Society for Information Science and Technology, 62, 236–245.

    Article  Google Scholar 

  • Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., & Zhai, C. (2014). Content-based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology, 65, 1820–1833.

    Article  Google Scholar 

  • Dorogovtsev, S. N., & Mendes, J. F. (2015). Ranking scientists. Nature Physics, 11, 882–883.

    Article  Google Scholar 

  • Dunaiski, M., Visser, W., 2012. Comparing paper ranking algorithms, In Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference, (pp. 21–30)

  • Egghe, L. (2006). Theory and practice of the g-index. Scientometrics, 69, 131–152.

    Article  Google Scholar 

  • Fang, Y., Si, L., Mathur, A.P., 2010. Discriminative models of integrating document evidence and document-candidate associations for expert search, In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, (pp. 683–690)

  • Gao, B.J., Kumar, G.K.J., 2019. Corank: Simultaneously ranking publication venues and researchers, In 2019 IEEE International Conference on Big Data (Big Data), (pp. 6055–6057). IEEE.

  • Gao, C., Wang, Z., Li, X., Zhang, Z., & Zeng, W. (2016). Pr-index: Using the h-index and pagerank for determining true impact. PLoS ONE, 11, e0161755.

    Article  Google Scholar 

  • Gao, W., Hu, L., & Zhang, P. (2018). Class-specific mutual information variation for feature selection. Pattern Recognition, 79, 328–339.

    Article  Google Scholar 

  • Granik, M., Mesyura, V., (2017). Fake news detection using naive bayes classifier. In 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), (pp. 900–903). IEEE.

  • Guggari, S., Kadappa, V., & Umadevi, V. (2018). Non-sequential partitioning approaches to decision tree classifier. Future Computing and Informatics Journal, 3, 275–285.

    Article  Google Scholar 

  • Halaweh, M., (2020). Actual researcher contribution (arc) versus the perceived contribution to the scientific body of knowledge. In Italian Research Conference on Digital Libraries, Springer. (pp. 93–102)

  • Halim, Z., Atif, M., Rashid, A., & Edwin, C. A. (2017). Profiling players using real-world datasets: Clustering the data and correlating the results with the big-five personality traits. IEEE Transactions on Affective Computing, 10(4), 568–584.

    Article  Google Scholar 

  • Halim, Z., & Khan, S. (2019). A data science-based framework to categorize academic journals. Scientometrics, 119, 393–423.

    Article  Google Scholar 

  • Halim, Z., & Rehan, M. (2020). On identification of driving-induced stress using electroencephalogram signals: A framework based on wearable safety-critical scheme and machine learning. Information Fusion, 53, 66–79.

    Article  Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102, 16569–16572.

    Article  MATH  Google Scholar 

  • Hug, S. E., Ochsner, M., & Brandle, M. P. (2017). Citation analysis with microsoft academic. Scientometrics, 111, 371–378.

    Article  Google Scholar 

  • Ioachims, T., (2006). Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (pp. 217–226)

  • Jiang, X., & Zhuge, H. (2019). Forward search path count as an alternative indirect citation impact indicator. Journal of Informetrics, 13, 100977.

    Article  Google Scholar 

  • Jokar, K., Yaghtin, M., Sotudeh, H., & Mirzabeigi, M. (2021). Correlation between quantitative citation analysis and opinion mining of citation contexts. Scientometrics Research Journal. https://doi.org/10.22070/RSCI.2021.13633.1465

    Article  Google Scholar 

  • Jozaghi, E. (2019). A new innovative method to measure the demographic representation of scientists via google scholar. Methodological Innovations, 12, 2059799119884273.

    Article  Google Scholar 

  • Kalachikhin, P. (2018). The development of a webometric criterion for ranking researchers. Automatic Documentation and Mathematical Linguistics, 52, 187–194.

    Article  Google Scholar 

  • Kremelberg, D. (2010). Practical statistics: A quick and easy guide to IBM® SPSS® Statistics, STATA, and other statistical software. SAGE publications.

    Google Scholar 

  • Liang, H., Wang, J. J., Xue, Y., & Cui, X. (2016). It outsourcing research from 1992 to 2013: A literature review based on main path analysis. Information & Management, 53, 227–251.

    Article  Google Scholar 

  • Lima, H., Silva, T.H., Moro, M.M., Santos, R.L., Meira Jr, W., Laender, A.H., (2013). Aggregating productivity indices for ranking researchers across multiple areas. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries, (pp. 97–106)

  • Liu, J. S., & Lu, L. Y. (2012). An integrated approach for main path analysis: Development of the hirsch index as an example. Journal of the American Society for Information Science and Technology, 63, 528–542.

    Article  Google Scholar 

  • Livas, C., Delli, K., & Pandis, N. (2021). Author self-citation in orthodontics is associated with author origin and gender. Progress in Orthodontics, 22, 1–8.

    Article  Google Scholar 

  • Moreira, C., Calado, P., Martins, B., (2011). Learning to rank for expert search in digital libraries of academic publications, In: Portuguese conference on artificial intelligence, Springer. (pp. 431–445)

  • Page, L., Brin, S., Motwani, R., Winograd, T., (1999). The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.

  • Pappalardo, L., Cintia, P., Ferragina, P., Massucco, E., Pedreschi, D., & Giannotti, F. (2019). PlayeRank: Data-driven performance evaluation and player ranking in soccer via a machine learning approach. ACM Transactions on Intelligent Systems and Technology (TIST), 10(5), 1–27.

    Article  Google Scholar 

  • Rahangdale, A., & Raut, S. (2019). Machine learning methods for ranking. International Journal of Software Engineering and Knowledge Engineering, 29(06), 729–761.

    Article  Google Scholar 

  • Santos, J. R. A. (1999). Cronbach’s alpha: A tool for assessing the reliability of scales. Journal of Extension, 37, 1–5.

    Google Scholar 

  • Senanayake, U., Piraveenan, M., & Zomaya, A. (2015). The pagerank-index: Going beyond citation counts in quantifying scientific impact of researchers. PLoS ONE, 10, e0134794.

    Article  Google Scholar 

  • Sidiropoulos, A., & Manolopoulos, Y. (2005a). A citation-based system to assist prize awarding. ACM SIGMOD Record, 34, 54–60.

    Article  Google Scholar 

  • Sidiropoulos, A., & Manolopoulos, Y. (2005b). A new perspective to automatically rank scientific conferences using digital libraries. Information Processing & Management, 41, 289–312.

    Article  Google Scholar 

  • Usmani, A., Daud, A., (2017). Unified author ranking based on integrated publication and venue rank. International Arab Journal of Information Technology (IAJIT) 14.

  • Vavryˇcuk, V. (2018). Fair ranking of researchers and research teams. PLoS ONE, 13, e0195509.

    Article  Google Scholar 

  • Xie, Z. (2020). A prediction method of publication productivity for researchers. IEEE Transactions on Computational Social Systems, 8(2), 423–433.

    Article  Google Scholar 

  • Yang, Z., Tang, J., Wang, B., Guo, J., Li, J., Chen, S., (2009). Expert2bole: From expert finding to bole search. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD’09), (pp. 1–4). Citeseer.

  • Yang, L., Zhang, W., (2010). A study of the dependencies in expert finding, in: 2010 Third International Conference on Knowledge Discovery and Data Mining, (pp. 355–358). IEEE.

  • Yue, Y., Finley, T., Radlinski, F., Joachims, T., (2007). A support vector method for optimizing average precision. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (pp. 271–278)

  • Zerem, E. (2017). The ranking of scientists based on scientific publications assessment. Journal of Biomedical Informatics, 75, 107–109.

    Article  Google Scholar 

  • Zhou, D., Orshanskiy, S A., Zha, H., Giles, C.L., (2007). Co-ranking authors and documents in a heterogeneous network. In Seventh IEEE International Conference on Data Mining (ICDM 2007), (pp. 739–744.). IEEE.

  • Zhu, J., Huang, X., Song, D., & Ruger, S. (2010). Integrating multiple document features in language models for expert finding. Knowledge and Information Systems, 23, 29–54.

    Article  Google Scholar 

Download references

Acknowledgements

The authors are indebted to the editor and anonymous reviewers for their helpful comments and suggestions. The authors wish to thank GIK Institute for providing research facilities. This work was sponsored by the GIK Institute graduate research fund under PSS scheme.

Funding

This work was sponsored by the GIK Institute graduate research fund under PSS scheme. Grant number CS1917.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zahid Halim.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, N., Halim, Z. & Hussain, S.F. An artificial intelligence-based framework for data-driven categorization of computer scientists: a case study of world’s Top 10 computing departments. Scientometrics 128, 1513–1545 (2023). https://doi.org/10.1007/s11192-022-04627-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-022-04627-9

Keywords

Navigation