Abstract
Expanding a set of known domain experts with new individuals, sharing similar expertise, is a problem that has various applications, such as adding new members to a conference program committee or finding new referees to review funding proposals. In this work, we focus on applications of the problem in the academic world and we introduce VeTo+, a novel approach to effectively deal with it by exploiting scholarly knowledge graphs. VeTo+ expands a given set of experts by identifying scholars having similar publishing habits with them. Our experiments show that VeTo+ outperforms, in terms of accuracy, previous approaches to recommend expansions to a set of given academic experts.











Similar content being viewed by others
Notes
Following the definitions in Sect. 2, for the case of the APV metapath, the corresponding vectors have length equal to the number of distinct venues in the dataset. Of course, since these vectors are very sparse, in practice sparse vector representations can be used to reduce the memory footprint.
For TPDL and JCDL, we used the following venues to calculate the focused APV-based similarities: TPDL (and its predecessor ECDL), JCDL, and IJDL. For SIGMOD and VLDB: SIGMOD, VLDB, EDBT, ICDE, and TODS.
The last one may be larger than the others, however it is easy to take this into consideration.
In this work, the configuration of VeTo and VeTo+ was done by selecting the same parameter value for all experiments performed on the same dataset; the selection was made according to the value that yield the best \(F_1\) results. This experimental design is different to the one used in our previous work [32], where the best configuration of VeTo was selected for each of the respective setups (e.g. \(k=100\) and \(k=200\) was used for the \(F_1\) and the MRR experiment for the SIGMOD dataset, respectively). More details for the configuration of VeTo+ can be found in Sect. 5.2.1).
Note that, since the DSKG dataset does not contain weighted edges between papers and topics, we assigned weights in correspondence to the number of topics connected to each paper, i.e. assuming that a paper is connected with n topics, the weight assigned to each edge is equal to 1/n.
We have also conducted experiments using WG, the alternative graph-based approach proposed in the same paper. However, similarly to the results in [2]. WG performed worse in all cases and its results were omitted from the experimental section for presentation reasons.
Our analysis was based on the venue catalogues determined in Sect. 3.2.2.
References
Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6-11, 2006, pp. 43–50 (2006)
Balog, K., de Rijke, M.: Finding similar experts. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’07, pp. 821–822. Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1277741.1277926
Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, O., Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system. J. Am. Soc. Inf. Sci. Technol. 55(10), 859–868 (2004)
Cao, Y., Liu, J., Bao, S., Li, H.: Research on expert search at enterprise track of TREC 2005. In: Proceedings of the Fourteenth Text Retrieval Conference, TREC 2005, Gaithersburg, Maryland, USA, November 15–18, 2005 (2005). http://trec.nist.gov/pubs/trec14/papers/microsoft-asia.ent.pdf
Chen, H.H., Gou, L., Zhang, X., Giles, C.L.: Collabseer: A search engine for collaboration discovery. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL’11, pp. 231–240. Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/1998076.1998121
Cormack, G.V., Clarke, C.L., Buettcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 758–759 (2009)
Craswell, N., Hawking, D., Vercoustre, A.M., Wilkins, P.: P@noptic expert: Searching for experts not just for documents. In: Ausweb Poster Proceedings, Queensland, Australia, vol. 15, p. 17 (2001)
Davenport, T.H., Prusak, L.: Working knowledge: how organizations manage what they know. Ubiquity 2000, 6 (2000)
Fang, H., Zhai, C.: Probabilistic models for expert finding. In: Advances in Information Retrieval, 29th European Conference on IR Research, ECIR 2007, Rome, Italy, April 2–5, 2007, Proceedings, pp. 418–430 (2007)
Fox, E.A., Shaw, J.A.: Combination of multiple searches. NIST special publication SP 243,(1994)
Gollapalli, S.D., Mitra, P., Giles, C.L.: Similar researcher search in academic environments. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL’12, pp. 167–170. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2232817.2232849
Gollapalli, S.D., Mitra, P., Giles, C.L.: Ranking experts using author-document-topic graphs. In: 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL’13, Indianapolis, IN, USA, July 22–26, 2013, pp. 87–96 (2013). https://doi.org/10.1145/2467696.2467707
Gonçalves, R., Dorneles, C.F.: Automated expertise retrieval: a taxonomy-based survey and open issues. ACM Comput. Surv. 52(5), 96:1-96:30 (2019)
Jaradeh, M.Y., Oelen, A., Farfar, K.E., Prinz, M., D’Souza, J., Kismihók, G., Stocker, M., Auer, S.: Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th International Conference on Knowledge Capture, K-CAP 2019, Marina Del Rey, CA, USA, November 19–21, 2019, pp. 243–246 (2019). https://doi.org/10.1145/3360901.3364435
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. science 220(4598), 671–680 (1983)
Lillis, D., Toolan, F., Collier, R., Dunnion, J.: Extending probabilistic data fusion using sliding windows. In: European Conference on Information Retrieval, pp. 358–369. Springer (2008)
Lillis, D., Zhang, L., Toolan, F., Collier, R.W., Leonard, D., Dunnion, J.: Estimating probabilities for effective data fusion. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 347–354 (2010)
Lu, J., Chen, J., Zhang, C.: Helsinki Multi-Model Data Repository (2016). http://udbms.cs.helsinki.fi/?dataset
Manghi, P., Atzori, C., Bardi, A., Shirrwagen, J., Dimitropoulos, H., La Bruzzo, S., Foufoulas, I., Löhden, A., Bäcker, A., Mannocci, A., et al.: Openaire research graph dump (2019)
Manghi, P., Bardi, A., Atzori, C., Baglioni, M., Manola, N., Schirrwagen, J., Principe, P.: The openaire research graph data model (2019). https://doi.org/10.5281/zenodo.2643199
Petkova, D., Croft, W.B.: Hierarchical language models for expert finding in enterprise corpora. Int. J. Artif. Intell. Tools 17(1), 5–18 (2008)
Salatino, A.A., Osborne, F., Thanapalasingam, T., Motta, E.: The cso classifier: ontology-driven detection of research topics in scholarly articles. In: International Conference on Theory and Practice of Digital Libraries, pp. 296–311. Springer (2019)
Salatino, A.A., Thanapalasingam, T., Mannocci, A., Osborne, F., Motta, E.: The computer science ontology: a large-scale taxonomy of research areas. In: International Semantic Web Conference, pp. 187–205. Springer (2018)
Serdyukov, P., Hiemstra, D.: Modeling documents as mixtures of persons for expert finding. In: Advances in Information Retrieval , 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30–April 3, 2008. Proceedings, pp. 309–320 (2008). https://doi.org/10.1007/978-3-540-78646-7_29
Sfyris, G.A., Fragkos, N., Doulkeridis, C.: Profile-based selection of expert groups. In: International Conference on Theory and Practice of Digital Libraries, pp. 81–93. Springer (2016)
Shi, C., Kong, X., Huang, Y., Philip, S.Y., Wu, B.: Hetesim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)
Shi, C., Li, Y., Philip, S.Y., Wu, B.: Constrained-meta-path-based ranking in heterogeneous information network. Knowl. Inf. Syst. 49(2), 719–747 (2016)
Shi, C., Li, Y., Zhang, J., Sun, Y., Yu, P.S.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017). https://doi.org/10.1109/TKDE.2016.2598561
Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 4(11), 992–1003 (2011)
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD, pp. 990–998. ACM (2008)
Tsallis, C., Stariolo, D.A.: Generalized simulated annealing. Phys. A: Stat. Mech. Appl. 233(1), 395–406 (1996). https://doi.org/10.1016/S0378-4371(96)00271-3
Vergoulis, T., Chatzopoulos, S., Dalamagas, T., Tryfonopoulos, C.: Veto: Expert set expansion in academia. In: International Conference on Theory and Practice of Digital Libraries, pp. 48–61. Springer (2020)
Xiong, Y., Zhu, Y., Yu, P.S.: Top-k similarity join in heterogeneous information networks. IEEE Trans. Knowl. Data Eng. 27(6), 1710–1723 (2015)
Yang, K., Kuo, T., Lee, H., Ho, J.: A reviewer recommendation system based on collaborative intelligence. In: 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009, Milan, Italy, 15–18 September 2009, Main Conference Proceedings, pp. 564–567 (2009). https://doi.org/10.1109/WI-IAT.2009.94
Acknowledgements
This work was partially funded by the EU H2020 project SmartDataLake (825041). We also acknowledge support of this work by the project “Moving from Big Data Management to Data Science” (MIS 5002437/3) which is implemented under the Action “Re-inforcement of the Research and Innovation Infrastructure”, funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund). Icons in Fig. 1 were collected from www.flaticon.com and were made by Freepik.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Grants or other notes about the article that should go on the front page should be placed here. General acknowledgments should be placed at the end of the article.
Appendix: Detailed configurations
Appendix: Detailed configurations
In this section, we present the exact parameter configurations of the rank aggregation algorithms that found to perform best for each dataset (Table 5).
Rights and permissions
About this article
Cite this article
Chatzopoulos, S., Vergoulis, T., Dalamagas, T. et al. VeTo+: improved expert set expansion in academia. Int J Digit Libr 23, 57–75 (2022). https://doi.org/10.1007/s00799-021-00318-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-021-00318-7