Skip to main content
Log in

VeTo+: improved expert set expansion in academia

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Expanding a set of known domain experts with new individuals, sharing similar expertise, is a problem that has various applications, such as adding new members to a conference program committee or finding new referees to review funding proposals. In this work, we focus on applications of the problem in the academic world and we introduce VeTo+, a novel approach to effectively deal with it by exploiting scholarly knowledge graphs. VeTo+ expands a given set of experts by identifying scholars having similar publishing habits with them. Our experiments show that VeTo+ outperforms, in terms of accuracy, previous approaches to recommend expansions to a set of given academic experts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Following the definitions in Sect. 2, for the case of the APV metapath, the corresponding vectors have length equal to the number of distinct venues in the dataset. Of course, since these vectors are very sparse, in practice sparse vector representations can be used to reduce the memory footprint.

  2. For TPDL and JCDL, we used the following venues to calculate the focused APV-based similarities: TPDL (and its predecessor ECDL), JCDL, and IJDL. For SIGMOD and VLDB: SIGMOD, VLDB, EDBT, ICDE, and TODS.

  3. The last one may be larger than the others, however it is easy to take this into consideration.

  4. https://doi.org/10.5281/zenodo.3739315.

  5. In this work, the configuration of VeTo and VeTo+ was done by selecting the same parameter value for all experiments performed on the same dataset; the selection was made according to the value that yield the best \(F_1\) results. This experimental design is different to the one used in our previous work [32], where the best configuration of VeTo was selected for each of the respective setups (e.g. \(k=100\) and \(k=200\) was used for the \(F_1\) and the MRR experiment for the SIGMOD dataset, respectively). More details for the configuration of VeTo+ can be found in Sect. 5.2.1).

  6. Note that, since the DSKG dataset does not contain weighted edges between papers and topics, we assigned weights in correspondence to the number of topics connected to each paper, i.e. assuming that a paper is connected with n topics, the weight assigned to each edge is equal to 1/n.

  7. We have also conducted experiments using WG, the alternative graph-based approach proposed in the same paper. However, similarly to the results in [2]. WG performed worse in all cases and its results were omitted from the experimental section for presentation reasons.

  8. https://github.com/smartdatalake/HMiner.

  9. https://github.com/schatzopoulos/HeySim.

  10. Our analysis was based on the venue catalogues determined in Sect. 3.2.2.

  11. https://trec.nist.gov/.

References

  1. Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6-11, 2006, pp. 43–50 (2006)

  2. Balog, K., de Rijke, M.: Finding similar experts. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’07, pp. 821–822. Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1277741.1277926

  3. Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, O., Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system. J. Am. Soc. Inf. Sci. Technol. 55(10), 859–868 (2004)

    Article  Google Scholar 

  4. Cao, Y., Liu, J., Bao, S., Li, H.: Research on expert search at enterprise track of TREC 2005. In: Proceedings of the Fourteenth Text Retrieval Conference, TREC 2005, Gaithersburg, Maryland, USA, November 15–18, 2005 (2005). http://trec.nist.gov/pubs/trec14/papers/microsoft-asia.ent.pdf

  5. Chen, H.H., Gou, L., Zhang, X., Giles, C.L.: Collabseer: A search engine for collaboration discovery. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL’11, pp. 231–240. Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/1998076.1998121

  6. Cormack, G.V., Clarke, C.L., Buettcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 758–759 (2009)

  7. Craswell, N., Hawking, D., Vercoustre, A.M., Wilkins, P.: P@noptic expert: Searching for experts not just for documents. In: Ausweb Poster Proceedings, Queensland, Australia, vol. 15, p. 17 (2001)

  8. Davenport, T.H., Prusak, L.: Working knowledge: how organizations manage what they know. Ubiquity 2000, 6 (2000)

    Article  Google Scholar 

  9. Fang, H., Zhai, C.: Probabilistic models for expert finding. In: Advances in Information Retrieval, 29th European Conference on IR Research, ECIR 2007, Rome, Italy, April 2–5, 2007, Proceedings, pp. 418–430 (2007)

  10. Fox, E.A., Shaw, J.A.: Combination of multiple searches. NIST special publication SP 243,(1994)

  11. Gollapalli, S.D., Mitra, P., Giles, C.L.: Similar researcher search in academic environments. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL’12, pp. 167–170. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2232817.2232849

  12. Gollapalli, S.D., Mitra, P., Giles, C.L.: Ranking experts using author-document-topic graphs. In: 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL’13, Indianapolis, IN, USA, July 22–26, 2013, pp. 87–96 (2013). https://doi.org/10.1145/2467696.2467707

  13. Gonçalves, R., Dorneles, C.F.: Automated expertise retrieval: a taxonomy-based survey and open issues. ACM Comput. Surv. 52(5), 96:1-96:30 (2019)

    Google Scholar 

  14. Jaradeh, M.Y., Oelen, A., Farfar, K.E., Prinz, M., D’Souza, J., Kismihók, G., Stocker, M., Auer, S.: Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th International Conference on Knowledge Capture, K-CAP 2019, Marina Del Rey, CA, USA, November 19–21, 2019, pp. 243–246 (2019). https://doi.org/10.1145/3360901.3364435

  15. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. science 220(4598), 671–680 (1983)

    Article  MathSciNet  Google Scholar 

  16. Lillis, D., Toolan, F., Collier, R., Dunnion, J.: Extending probabilistic data fusion using sliding windows. In: European Conference on Information Retrieval, pp. 358–369. Springer (2008)

  17. Lillis, D., Zhang, L., Toolan, F., Collier, R.W., Leonard, D., Dunnion, J.: Estimating probabilities for effective data fusion. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 347–354 (2010)

  18. Lu, J., Chen, J., Zhang, C.: Helsinki Multi-Model Data Repository (2016). http://udbms.cs.helsinki.fi/?dataset

  19. Manghi, P., Atzori, C., Bardi, A., Shirrwagen, J., Dimitropoulos, H., La Bruzzo, S., Foufoulas, I., Löhden, A., Bäcker, A., Mannocci, A., et al.: Openaire research graph dump (2019)

  20. Manghi, P., Bardi, A., Atzori, C., Baglioni, M., Manola, N., Schirrwagen, J., Principe, P.: The openaire research graph data model (2019). https://doi.org/10.5281/zenodo.2643199

  21. Petkova, D., Croft, W.B.: Hierarchical language models for expert finding in enterprise corpora. Int. J. Artif. Intell. Tools 17(1), 5–18 (2008)

    Article  Google Scholar 

  22. Salatino, A.A., Osborne, F., Thanapalasingam, T., Motta, E.: The cso classifier: ontology-driven detection of research topics in scholarly articles. In: International Conference on Theory and Practice of Digital Libraries, pp. 296–311. Springer (2019)

  23. Salatino, A.A., Thanapalasingam, T., Mannocci, A., Osborne, F., Motta, E.: The computer science ontology: a large-scale taxonomy of research areas. In: International Semantic Web Conference, pp. 187–205. Springer (2018)

  24. Serdyukov, P., Hiemstra, D.: Modeling documents as mixtures of persons for expert finding. In: Advances in Information Retrieval , 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30–April 3, 2008. Proceedings, pp. 309–320 (2008). https://doi.org/10.1007/978-3-540-78646-7_29

  25. Sfyris, G.A., Fragkos, N., Doulkeridis, C.: Profile-based selection of expert groups. In: International Conference on Theory and Practice of Digital Libraries, pp. 81–93. Springer (2016)

  26. Shi, C., Kong, X., Huang, Y., Philip, S.Y., Wu, B.: Hetesim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)

  27. Shi, C., Li, Y., Philip, S.Y., Wu, B.: Constrained-meta-path-based ranking in heterogeneous information network. Knowl. Inf. Syst. 49(2), 719–747 (2016)

    Article  Google Scholar 

  28. Shi, C., Li, Y., Zhang, J., Sun, Y., Yu, P.S.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017). https://doi.org/10.1109/TKDE.2016.2598561

  29. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 4(11), 992–1003 (2011)

    Article  Google Scholar 

  30. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD, pp. 990–998. ACM (2008)

  31. Tsallis, C., Stariolo, D.A.: Generalized simulated annealing. Phys. A: Stat. Mech. Appl. 233(1), 395–406 (1996). https://doi.org/10.1016/S0378-4371(96)00271-3

    Article  Google Scholar 

  32. Vergoulis, T., Chatzopoulos, S., Dalamagas, T., Tryfonopoulos, C.: Veto: Expert set expansion in academia. In: International Conference on Theory and Practice of Digital Libraries, pp. 48–61. Springer (2020)

  33. Xiong, Y., Zhu, Y., Yu, P.S.: Top-k similarity join in heterogeneous information networks. IEEE Trans. Knowl. Data Eng. 27(6), 1710–1723 (2015)

    Article  Google Scholar 

  34. Yang, K., Kuo, T., Lee, H., Ho, J.: A reviewer recommendation system based on collaborative intelligence. In: 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009, Milan, Italy, 15–18 September 2009, Main Conference Proceedings, pp. 564–567 (2009). https://doi.org/10.1109/WI-IAT.2009.94

Download references

Acknowledgements

This work was partially funded by the EU H2020 project SmartDataLake (825041). We also acknowledge support of this work by the project “Moving from Big Data Management to Data Science” (MIS 5002437/3) which is implemented under the Action “Re-inforcement of the Research and Innovation Infrastructure”, funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund). Icons in Fig. 1 were collected from www.flaticon.com and were made by Freepik.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Serafeim Chatzopoulos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Grants or other notes about the article that should go on the front page should be placed here. General acknowledgments should be placed at the end of the article.

Appendix: Detailed configurations

Appendix: Detailed configurations

In this section, we present the exact parameter configurations of the rank aggregation algorithms that found to perform best for each dataset (Table 5).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chatzopoulos, S., Vergoulis, T., Dalamagas, T. et al. VeTo+: improved expert set expansion in academia. Int J Digit Libr 23, 57–75 (2022). https://doi.org/10.1007/s00799-021-00318-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-021-00318-7

Keywords

Navigation