Abstract
The total number of published articles and the resulting citations are generally acknowledged as suitable criteria of the scientist’s evaluation. However, it is challenging to determine the ranking of scientists as the value of their scientific work (at times) is not directly reflective of the abovementioned aspects. In this regard, multiple other elements needs to be examined in combination for better evaluating the scientific worth of an individual. This work presents a learning-based technique, i.e., an Artificial Intelligence (AI)-based solution towards categorizing scientists utilizing a multifaceted criteria. In this context, a novel ranking metric is proposed which is grounded on authorship, experience, publications count, total citations, i10-index, and h-index. To assess the proposed framework’s performance, a dataset is collected considering the world’s top ten computing departments and ten domestic ones. This results in a data of 1000 computer scientists. The dataset is preprocessed and afterwards three techniques for feature selection are employed, i.e., Mutual Information (MI), Chi-Square (X2), and Fisher-Test (F-Test) to rank the features in the data. To validate the collected data, the framework has three clustering techniques as well, namely, k-medoids, k-means, and spectral clustering to identify the optimum number of heterogeneous groups. Three cluster validity indices are used to evaluate the clustering outcomes, namely, Calinski-Harabasz Index (CHI), Davies Bouldin Index (DBI), and Silhouette Coefficient (SC). Once the optimum clusters are obtained, five classification procedures are used, including, Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN), Decision Tree (DT), Gaussian Naive Bayes (GNB), and Linear Regression Classifier (LRC) to predict the category of a previously unknown scientist. Among all classifiers, an average accuracy of 94.44% is shown by the ANN to predict an unknown/new scientist category. The current proposal is also compared with closely related past works. The proposed framework offers the possibility to independently classify scientists based on AI techniques.
Similar content being viewed by others
References
Amjad, T., Daud, A., Aljohani, N.R., 2018. Ranking authors in academic social networks: a survey. Library Hi Tech .
Bartneck, C., & Kokkelmans, S. (2011). Detecting h-index manipulation through self-citation analysis. Scientometrics, 87, 85–98.
Bornmann, L., & Daniel, H. D. (2007). What do we know about the h index? Journal of the American Society for Information Science and Technology, 58, 1381–1385.
Bouyssou, D., & Marchant, T. (2016). Ranking authors using fractional counting of citations: An axiomatic approach. Journal of Informetrics, 10, 183–199.
Carpenter, M. P., & Narin, F. (1981). The adequacy of the science citation index (sci) as an indicator of international scientific activity. Journal of the American Society for Information Science, 32, 430–439.
Chu, W., & Keerthi, S. S. (2007). Support vector ordinal regression. Neural Computation, 19, 792–815.
Connor, J., 2011. Google scholar citations open to all. Google Scholar Blog.
Dey, L., Chakraborty, S., Biswas, A., Bose, B., & Tiwari, S. (2016). Sentiment analysis of review datasets using naive bayes and k-nn classifier. arXiv Preprint arXiv: 1610.09982. https://doi.org/10.48550/arXiv.1610.09982
Dhamdhere, S. N. (2018). Cumulative citations index, h-index and i10-index (research metrics) of an educational institute: A case study. International Journal of Library and Information Science, 10, 1–9.
Ding, Y. (2011). Applying weighted pagerank to author citation networks. Journal of the American Society for Information Science and Technology, 62, 236–245.
Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., & Zhai, C. (2014). Content-based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology, 65, 1820–1833.
Dorogovtsev, S. N., & Mendes, J. F. (2015). Ranking scientists. Nature Physics, 11, 882–883.
Dunaiski, M., Visser, W., 2012. Comparing paper ranking algorithms, In Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference, (pp. 21–30)
Egghe, L. (2006). Theory and practice of the g-index. Scientometrics, 69, 131–152.
Fang, Y., Si, L., Mathur, A.P., 2010. Discriminative models of integrating document evidence and document-candidate associations for expert search, In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, (pp. 683–690)
Gao, B.J., Kumar, G.K.J., 2019. Corank: Simultaneously ranking publication venues and researchers, In 2019 IEEE International Conference on Big Data (Big Data), (pp. 6055–6057). IEEE.
Gao, C., Wang, Z., Li, X., Zhang, Z., & Zeng, W. (2016). Pr-index: Using the h-index and pagerank for determining true impact. PLoS ONE, 11, e0161755.
Gao, W., Hu, L., & Zhang, P. (2018). Class-specific mutual information variation for feature selection. Pattern Recognition, 79, 328–339.
Granik, M., Mesyura, V., (2017). Fake news detection using naive bayes classifier. In 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), (pp. 900–903). IEEE.
Guggari, S., Kadappa, V., & Umadevi, V. (2018). Non-sequential partitioning approaches to decision tree classifier. Future Computing and Informatics Journal, 3, 275–285.
Halaweh, M., (2020). Actual researcher contribution (arc) versus the perceived contribution to the scientific body of knowledge. In Italian Research Conference on Digital Libraries, Springer. (pp. 93–102)
Halim, Z., Atif, M., Rashid, A., & Edwin, C. A. (2017). Profiling players using real-world datasets: Clustering the data and correlating the results with the big-five personality traits. IEEE Transactions on Affective Computing, 10(4), 568–584.
Halim, Z., & Khan, S. (2019). A data science-based framework to categorize academic journals. Scientometrics, 119, 393–423.
Halim, Z., & Rehan, M. (2020). On identification of driving-induced stress using electroencephalogram signals: A framework based on wearable safety-critical scheme and machine learning. Information Fusion, 53, 66–79.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102, 16569–16572.
Hug, S. E., Ochsner, M., & Brandle, M. P. (2017). Citation analysis with microsoft academic. Scientometrics, 111, 371–378.
Ioachims, T., (2006). Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (pp. 217–226)
Jiang, X., & Zhuge, H. (2019). Forward search path count as an alternative indirect citation impact indicator. Journal of Informetrics, 13, 100977.
Jokar, K., Yaghtin, M., Sotudeh, H., & Mirzabeigi, M. (2021). Correlation between quantitative citation analysis and opinion mining of citation contexts. Scientometrics Research Journal. https://doi.org/10.22070/RSCI.2021.13633.1465
Jozaghi, E. (2019). A new innovative method to measure the demographic representation of scientists via google scholar. Methodological Innovations, 12, 2059799119884273.
Kalachikhin, P. (2018). The development of a webometric criterion for ranking researchers. Automatic Documentation and Mathematical Linguistics, 52, 187–194.
Kremelberg, D. (2010). Practical statistics: A quick and easy guide to IBM® SPSS® Statistics, STATA, and other statistical software. SAGE publications.
Liang, H., Wang, J. J., Xue, Y., & Cui, X. (2016). It outsourcing research from 1992 to 2013: A literature review based on main path analysis. Information & Management, 53, 227–251.
Lima, H., Silva, T.H., Moro, M.M., Santos, R.L., Meira Jr, W., Laender, A.H., (2013). Aggregating productivity indices for ranking researchers across multiple areas. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries, (pp. 97–106)
Liu, J. S., & Lu, L. Y. (2012). An integrated approach for main path analysis: Development of the hirsch index as an example. Journal of the American Society for Information Science and Technology, 63, 528–542.
Livas, C., Delli, K., & Pandis, N. (2021). Author self-citation in orthodontics is associated with author origin and gender. Progress in Orthodontics, 22, 1–8.
Moreira, C., Calado, P., Martins, B., (2011). Learning to rank for expert search in digital libraries of academic publications, In: Portuguese conference on artificial intelligence, Springer. (pp. 431–445)
Page, L., Brin, S., Motwani, R., Winograd, T., (1999). The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
Pappalardo, L., Cintia, P., Ferragina, P., Massucco, E., Pedreschi, D., & Giannotti, F. (2019). PlayeRank: Data-driven performance evaluation and player ranking in soccer via a machine learning approach. ACM Transactions on Intelligent Systems and Technology (TIST), 10(5), 1–27.
Rahangdale, A., & Raut, S. (2019). Machine learning methods for ranking. International Journal of Software Engineering and Knowledge Engineering, 29(06), 729–761.
Santos, J. R. A. (1999). Cronbach’s alpha: A tool for assessing the reliability of scales. Journal of Extension, 37, 1–5.
Senanayake, U., Piraveenan, M., & Zomaya, A. (2015). The pagerank-index: Going beyond citation counts in quantifying scientific impact of researchers. PLoS ONE, 10, e0134794.
Sidiropoulos, A., & Manolopoulos, Y. (2005a). A citation-based system to assist prize awarding. ACM SIGMOD Record, 34, 54–60.
Sidiropoulos, A., & Manolopoulos, Y. (2005b). A new perspective to automatically rank scientific conferences using digital libraries. Information Processing & Management, 41, 289–312.
Usmani, A., Daud, A., (2017). Unified author ranking based on integrated publication and venue rank. International Arab Journal of Information Technology (IAJIT) 14.
Vavryˇcuk, V. (2018). Fair ranking of researchers and research teams. PLoS ONE, 13, e0195509.
Xie, Z. (2020). A prediction method of publication productivity for researchers. IEEE Transactions on Computational Social Systems, 8(2), 423–433.
Yang, Z., Tang, J., Wang, B., Guo, J., Li, J., Chen, S., (2009). Expert2bole: From expert finding to bole search. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD’09), (pp. 1–4). Citeseer.
Yang, L., Zhang, W., (2010). A study of the dependencies in expert finding, in: 2010 Third International Conference on Knowledge Discovery and Data Mining, (pp. 355–358). IEEE.
Yue, Y., Finley, T., Radlinski, F., Joachims, T., (2007). A support vector method for optimizing average precision. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (pp. 271–278)
Zerem, E. (2017). The ranking of scientists based on scientific publications assessment. Journal of Biomedical Informatics, 75, 107–109.
Zhou, D., Orshanskiy, S A., Zha, H., Giles, C.L., (2007). Co-ranking authors and documents in a heterogeneous network. In Seventh IEEE International Conference on Data Mining (ICDM 2007), (pp. 739–744.). IEEE.
Zhu, J., Huang, X., Song, D., & Ruger, S. (2010). Integrating multiple document features in language models for expert finding. Knowledge and Information Systems, 23, 29–54.
Acknowledgements
The authors are indebted to the editor and anonymous reviewers for their helpful comments and suggestions. The authors wish to thank GIK Institute for providing research facilities. This work was sponsored by the GIK Institute graduate research fund under PSS scheme.
Funding
This work was sponsored by the GIK Institute graduate research fund under PSS scheme. Grant number CS1917.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ali, N., Halim, Z. & Hussain, S.F. An artificial intelligence-based framework for data-driven categorization of computer scientists: a case study of world’s Top 10 computing departments. Scientometrics 128, 1513–1545 (2023). https://doi.org/10.1007/s11192-022-04627-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04627-9