VeTo+: improved expert set expansion in academia

Chatzopoulos, Serafeim; Vergoulis, Thanasis; Dalamagas, Theodore; Tryfonopoulos, Christos

doi:10.1007/s00799-021-00318-7

VeTo+: improved expert set expansion in academia

Published: 15 November 2021

Volume 23, pages 57–75, (2022)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

Serafeim Chatzopoulos ORCID: orcid.org/0000-0003-1714-5225²,
Thanasis Vergoulis¹,
Theodore Dalamagas¹ &
…
Christos Tryfonopoulos²

132 Accesses
1 Citation
5 Altmetric
Explore all metrics

Abstract

Expanding a set of known domain experts with new individuals, sharing similar expertise, is a problem that has various applications, such as adding new members to a conference program committee or finding new referees to review funding proposals. In this work, we focus on applications of the problem in the academic world and we introduce VeTo+, a novel approach to effectively deal with it by exploiting scholarly knowledge graphs. VeTo+ expands a given set of experts by identifying scholars having similar publishing habits with them. Our experiments show that VeTo+ outperforms, in terms of accuracy, previous approaches to recommend expansions to a set of given academic experts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VeTo: Expert Set Expansion in Academia

A Framework for Analyzing Academic Data

A network approach to expertise retrieval based on path similarity and credit allocation

Article Open access 01 July 2021

Xiancheng Li, Luca Verginer, … P. Panzarasa

Notes

Following the definitions in Sect. 2, for the case of the APV metapath, the corresponding vectors have length equal to the number of distinct venues in the dataset. Of course, since these vectors are very sparse, in practice sparse vector representations can be used to reduce the memory footprint.
For TPDL and JCDL, we used the following venues to calculate the focused APV-based similarities: TPDL (and its predecessor ECDL), JCDL, and IJDL. For SIGMOD and VLDB: SIGMOD, VLDB, EDBT, ICDE, and TODS.
The last one may be larger than the others, however it is easy to take this into consideration.
https://doi.org/10.5281/zenodo.3739315.
In this work, the configuration of VeTo and VeTo+ was done by selecting the same parameter value for all experiments performed on the same dataset; the selection was made according to the value that yield the best \(F_1\) results. This experimental design is different to the one used in our previous work [32], where the best configuration of VeTo was selected for each of the respective setups (e.g. \(k=100\) and \(k=200\) was used for the \(F_1\) and the MRR experiment for the SIGMOD dataset, respectively). More details for the configuration of VeTo+ can be found in Sect. 5.2.1).
Note that, since the DSKG dataset does not contain weighted edges between papers and topics, we assigned weights in correspondence to the number of topics connected to each paper, i.e. assuming that a paper is connected with n topics, the weight assigned to each edge is equal to 1/n.
We have also conducted experiments using WG, the alternative graph-based approach proposed in the same paper. However, similarly to the results in [2]. WG performed worse in all cases and its results were omitted from the experimental section for presentation reasons.
https://github.com/smartdatalake/HMiner.
https://github.com/schatzopoulos/HeySim.
Our analysis was based on the venue catalogues determined in Sect. 3.2.2.
https://trec.nist.gov/.

References

Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6-11, 2006, pp. 43–50 (2006)
Balog, K., de Rijke, M.: Finding similar experts. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’07, pp. 821–822. Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1277741.1277926
Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, O., Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system. J. Am. Soc. Inf. Sci. Technol. 55(10), 859–868 (2004)
Article Google Scholar
Cao, Y., Liu, J., Bao, S., Li, H.: Research on expert search at enterprise track of TREC 2005. In: Proceedings of the Fourteenth Text Retrieval Conference, TREC 2005, Gaithersburg, Maryland, USA, November 15–18, 2005 (2005). http://trec.nist.gov/pubs/trec14/papers/microsoft-asia.ent.pdf
Chen, H.H., Gou, L., Zhang, X., Giles, C.L.: Collabseer: A search engine for collaboration discovery. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL’11, pp. 231–240. Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/1998076.1998121
Cormack, G.V., Clarke, C.L., Buettcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 758–759 (2009)
Craswell, N., Hawking, D., Vercoustre, A.M., Wilkins, P.: P@noptic expert: Searching for experts not just for documents. In: Ausweb Poster Proceedings, Queensland, Australia, vol. 15, p. 17 (2001)
Davenport, T.H., Prusak, L.: Working knowledge: how organizations manage what they know. Ubiquity 2000, 6 (2000)
Article Google Scholar
Fang, H., Zhai, C.: Probabilistic models for expert finding. In: Advances in Information Retrieval, 29th European Conference on IR Research, ECIR 2007, Rome, Italy, April 2–5, 2007, Proceedings, pp. 418–430 (2007)
Fox, E.A., Shaw, J.A.: Combination of multiple searches. NIST special publication SP 243,(1994)
Gollapalli, S.D., Mitra, P., Giles, C.L.: Similar researcher search in academic environments. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL’12, pp. 167–170. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2232817.2232849
Gollapalli, S.D., Mitra, P., Giles, C.L.: Ranking experts using author-document-topic graphs. In: 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL’13, Indianapolis, IN, USA, July 22–26, 2013, pp. 87–96 (2013). https://doi.org/10.1145/2467696.2467707
Gonçalves, R., Dorneles, C.F.: Automated expertise retrieval: a taxonomy-based survey and open issues. ACM Comput. Surv. 52(5), 96:1-96:30 (2019)
Google Scholar
Jaradeh, M.Y., Oelen, A., Farfar, K.E., Prinz, M., D’Souza, J., Kismihók, G., Stocker, M., Auer, S.: Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th International Conference on Knowledge Capture, K-CAP 2019, Marina Del Rey, CA, USA, November 19–21, 2019, pp. 243–246 (2019). https://doi.org/10.1145/3360901.3364435
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. science 220(4598), 671–680 (1983)
Article MathSciNet Google Scholar
Lillis, D., Toolan, F., Collier, R., Dunnion, J.: Extending probabilistic data fusion using sliding windows. In: European Conference on Information Retrieval, pp. 358–369. Springer (2008)
Lillis, D., Zhang, L., Toolan, F., Collier, R.W., Leonard, D., Dunnion, J.: Estimating probabilities for effective data fusion. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 347–354 (2010)
Lu, J., Chen, J., Zhang, C.: Helsinki Multi-Model Data Repository (2016). http://udbms.cs.helsinki.fi/?dataset
Manghi, P., Atzori, C., Bardi, A., Shirrwagen, J., Dimitropoulos, H., La Bruzzo, S., Foufoulas, I., Löhden, A., Bäcker, A., Mannocci, A., et al.: Openaire research graph dump (2019)
Manghi, P., Bardi, A., Atzori, C., Baglioni, M., Manola, N., Schirrwagen, J., Principe, P.: The openaire research graph data model (2019). https://doi.org/10.5281/zenodo.2643199
Petkova, D., Croft, W.B.: Hierarchical language models for expert finding in enterprise corpora. Int. J. Artif. Intell. Tools 17(1), 5–18 (2008)
Article Google Scholar
Salatino, A.A., Osborne, F., Thanapalasingam, T., Motta, E.: The cso classifier: ontology-driven detection of research topics in scholarly articles. In: International Conference on Theory and Practice of Digital Libraries, pp. 296–311. Springer (2019)
Salatino, A.A., Thanapalasingam, T., Mannocci, A., Osborne, F., Motta, E.: The computer science ontology: a large-scale taxonomy of research areas. In: International Semantic Web Conference, pp. 187–205. Springer (2018)
Serdyukov, P., Hiemstra, D.: Modeling documents as mixtures of persons for expert finding. In: Advances in Information Retrieval , 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30–April 3, 2008. Proceedings, pp. 309–320 (2008). https://doi.org/10.1007/978-3-540-78646-7_29
Sfyris, G.A., Fragkos, N., Doulkeridis, C.: Profile-based selection of expert groups. In: International Conference on Theory and Practice of Digital Libraries, pp. 81–93. Springer (2016)
Shi, C., Kong, X., Huang, Y., Philip, S.Y., Wu, B.: Hetesim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)
Shi, C., Li, Y., Philip, S.Y., Wu, B.: Constrained-meta-path-based ranking in heterogeneous information network. Knowl. Inf. Syst. 49(2), 719–747 (2016)
Article Google Scholar
Shi, C., Li, Y., Zhang, J., Sun, Y., Yu, P.S.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017). https://doi.org/10.1109/TKDE.2016.2598561
Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 4(11), 992–1003 (2011)
Article Google Scholar
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD, pp. 990–998. ACM (2008)
Tsallis, C., Stariolo, D.A.: Generalized simulated annealing. Phys. A: Stat. Mech. Appl. 233(1), 395–406 (1996). https://doi.org/10.1016/S0378-4371(96)00271-3
Article Google Scholar
Vergoulis, T., Chatzopoulos, S., Dalamagas, T., Tryfonopoulos, C.: Veto: Expert set expansion in academia. In: International Conference on Theory and Practice of Digital Libraries, pp. 48–61. Springer (2020)
Xiong, Y., Zhu, Y., Yu, P.S.: Top-k similarity join in heterogeneous information networks. IEEE Trans. Knowl. Data Eng. 27(6), 1710–1723 (2015)
Article Google Scholar
Yang, K., Kuo, T., Lee, H., Ho, J.: A reviewer recommendation system based on collaborative intelligence. In: 2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009, Milan, Italy, 15–18 September 2009, Main Conference Proceedings, pp. 564–567 (2009). https://doi.org/10.1109/WI-IAT.2009.94

Download references

Acknowledgements

This work was partially funded by the EU H2020 project SmartDataLake (825041). We also acknowledge support of this work by the project “Moving from Big Data Management to Data Science” (MIS 5002437/3) which is implemented under the Action “Re-inforcement of the Research and Innovation Infrastructure”, funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund). Icons in Fig. 1 were collected from www.flaticon.com and were made by Freepik.

Author information

Authors and Affiliations

IMSI, “Athena” Research Center, 15125, Athens, Greece
Thanasis Vergoulis & Theodore Dalamagas
Department of Informatics & Tel/tions, Univ. of the Peloponnese, 22100, Tripoli, Greece
Serafeim Chatzopoulos & Christos Tryfonopoulos

Authors

Serafeim Chatzopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Thanasis Vergoulis
View author publications
You can also search for this author in PubMed Google Scholar
Theodore Dalamagas
View author publications
You can also search for this author in PubMed Google Scholar
Christos Tryfonopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Serafeim Chatzopoulos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Grants or other notes about the article that should go on the front page should be placed here. General acknowledgments should be placed at the end of the article.

Appendix: Detailed configurations

In this section, we present the exact parameter configurations of the rank aggregation algorithms that found to perform best for each dataset (Table 5).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chatzopoulos, S., Vergoulis, T., Dalamagas, T. et al. VeTo+: improved expert set expansion in academia. Int J Digit Libr 23, 57–75 (2022). https://doi.org/10.1007/s00799-021-00318-7

Download citation

Received: 04 February 2021
Revised: 18 October 2021
Accepted: 21 October 2021
Published: 15 November 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00799-021-00318-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VeTo+: improved expert set expansion in academia

Abstract

Access this article

Similar content being viewed by others

VeTo: Expert Set Expansion in Academia

A Framework for Analyzing Academic Data

A network approach to expertise retrieval based on path similarity and credit allocation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Detailed configurations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VeTo+: improved expert set expansion in academia

Abstract

Access this article

Similar content being viewed by others

VeTo: Expert Set Expansion in Academia

A Framework for Analyzing Academic Data

A network approach to expertise retrieval based on path similarity and credit allocation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Detailed configurations

Appendix: Detailed configurations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation