skip to main content
research-article

Translations Diversification for Expert Finding: A Novel Clustering-based Approach

Published: 29 May 2019 Publication History

Abstract

Expert finding is the task of retrieving and ranking knowledgeable people in the subject of user’s query. It is a well-studied problem that has attracted the attention of many researchers. The most important challenge in expert finding is to determine the similarity between query words and documents authored by candidate experts. One of the most important challenges in Information Retrieval (IR) community is the issue of vocabulary gap between queries and documents. In this study, a translation model based on words clustering in two query and co-occurrence spaces is proposed to overcome this problem. First, the words that are semantically close, are clustered in a query space and then each cluster in this space are clustered again in a co-occurrence space. Representatives of each cluster in the co-occurrence space are considered as a diverse subset of the parent cluster. By this method, the query translations are expected to be diversified in the query space. Next, a probabilistic model, that is based on the belonging degree of word to cluster and similarity of cluster to query in the query space, is used to consider the problem of vocabulary gap. Finally, the corresponding translations to each query are used in conjunction with a combination model for expert finding. Experiments on Stack Overflow dataset show the effectiveness of the proposed method for expert finding.

References

[1]
Ahmad Ali Abin. 2018. A random walk approach to query informative constraints for clustering. IEEE Transactions on Cybernetics 48, 8 (2018), 2272--2283.
[2]
Ahmad Ali Abin and Hamid Beigy. 2015. Active constrained fuzzy clustering: A multiple kernels learning approach. Pattern Recognition 48, 3 (2015), 953--967.
[3]
Krisztian Balog, Leif Azzopardi, and Maarten de Rijke. 2009. A language modeling framework for expert finding. Information Processing 8 Management 45, 1 (2009), 1--19.
[4]
Krisztian Balog, Yi Fang, Maarten de Rijke, Pavel Serdyukov, Luo Si. 2012. Expertise retrieval. Foundations and Trends® in Information Retrieval 6, 2--3 (2012), 127--256.
[5]
Fabiano M. Belém, Carolina S. Batista, Rodrygo L. T. Santos, Jussara M. Almeida, and Marcos A. Gonçalves. 2016. Beyond relevance: Explicitly promoting novelty and diversity in tag recommendation. ACM Transactions on Intelligent Systems and Technology 7, 3 (2016), 26.
[6]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (Jan. 2003), 993--1022.
[7]
Mohamed Bouguessa, Shengrui Wang, and Benoit Dumoulin. 2010. Discovering knowledge-sharing communities in question-answering forums. ACM Transactions on Knowledge Discovery from Data 5, 1 (Dec. 2010), Article 3, 49 pages.
[8]
Yunbo Cao, Jingjing Liu, Shenghua Bao, and Hang Li. 2005. Research on expert search at enterprise track of TREC 2005. In Proceedings of the Text Retrieval Conference.
[9]
Ronan Cummins, Mounia Lalmas, and Colm O’Riordan. 2010. Learning aggregation functions for expert search. In Proceedings of the European Conference on Artificial Intelligence. 535--540.
[10]
Arash Dargahi Nobari, Sajad Sotudeh Gharebagh, and Mahmood Neshati. 2017. Skill translation models in expert finding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1057--1060.
[11]
Hongbo Deng, Irwin King, and Michael R. Lyu. 2012. Enhanced models for expertise retrieval using community-aware strategies. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 1 (2012), 93--106.
[12]
Hui Fang and ChengXiang Zhai. 2007. Probabilistic models for expert finding. In Proceedings of Advances in Information Retrieval. 418--430.
[13]
Yi Fang, Luo Si, and Aditya P. Mathur. 2010. Discriminative models of integrating document evidence and document-candidate associations for expert search. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 683--690.
[14]
Edward A. Fox and Joseph A. Shaw. 1994. Combination of multiple searches. NIST Special Publication SP 243 (1994).
[15]
Sajad Sotudeh Gharebagh, Peyman Rostami, and Mahmood Neshati. 2018. T-shaped mining: A novel approach to talent finding for agile software teams. In Proceedings of the European Conference on Information Retrieval. Springer, 411--423.
[16]
Maryam Karimzadehgan, Ryen White, and Matthew Richardson. 2009. Enhancing expert finding using organizational hierarchies. In Proceedings of the Advances in Information Retrieval. 177--188.
[17]
Maryam Karimzadehgan and ChengXiang Zhai. 2010. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 323--330.
[18]
Maryam Karimzadehgan, ChengXiang Zhai, and Geneva Belford. 2008. Multi-aspect expertise matching for review assignment. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 1113--1122.
[19]
Hang Li, Jun Xu, et al. 2014. Semantic matching in search. Foundations and Trends® in Information Retrieval 7, 5 (2014), 343--469.
[20]
Lei Li, Wei Peng, Saurabh Kataria, Tong Sun, and Tao Li. 2015. Recommending users and communities in social media. ACM Transactions on Knowledge Discovery from Data 10, 2 (Oct. 2015), Article 17, 27 pages.
[21]
Tie-Yan Liu et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval 3, 3 (2009), 225--331.
[22]
Craig Macdonald and Iadh Ounis. 2006. Voting for candidates: Adapting data fusion techniques for an expert search task. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, 387--396.
[23]
Nima Mirbakhsh and Charles X. Ling. 2015. Improving Top-N recommendation for cold-start users via cross-domain information. ACM Transactions on Knowledge Discovery from Data 9, 4 (Jun. 2015), Article 33, 19 pages.
[24]
Saeedeh Momtazi and Felix Naumann. 2013. Topic modeling for expert finding using Latent Dirichlet Allocation. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 3, 5 (2013), 346--353.
[25]
Catarina Moreira, Pável Calado, and Bruno Martins. 2015. Learning to rank academic experts in the DBLP dataset. Expert Systems 32, 4 (2015), 477--493.
[26]
Catarina Moreira and Andreas Wichert. 2013. Finding academic experts on a multisensor approach using Shannon’s entropy. Expert Systems with Applications 40, 14 (2013), 5740--5754.
[27]
Mahmood Neshati, Seyyed Hadi Hashemi, and Hamid Beigy. 2014. Expertise finding in bibliographic network: Topic dominance learning approach. IEEE Transactions on Cybernetics 44, 12 (2014), 2646--2657.
[28]
Sumanth Patil and Kyumin Lee. 2016. Detecting experts on Quora: By their activity, quality of answers, linguistic characteristics and temporal behaviors. Social Network Analysis and Mining 6, 1 (2016), 5.
[29]
Pavel Serdyukov, Henning Rode, and Djoerd Hiemstra. 2008. Modeling multi-step relevance propagation for expert finding. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 1133--1142.
[30]
David van Dijk, Manos Tsagkias, and Maarten de Rijke. 2015. Early detection of topical expertise in community question answering. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 995--998.
[31]
Christophe Van Gysel, Maarten de Rijke, and Marcel Worring. 2016. Unsupervised, efficient and semantic expertise retrieval. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1069--1079.
[32]
Ou Wu, Qiang You, Fen Xia, Lei Ma, and Weiming Hu. 2016. Listwise learning to rank from crowds. ACM Transactions on Knowledge Discovery from Data 11, 1 (July 2016), Article 4, 39 pages.
[33]
Jie Yang, Ke Tao, Alessandro Bozzon, and Geert-Jan Houben. 2014. Sparrows and owls: Characterisation of expert behaviour in stackoverflow. In Proceedings of the International Conference on User Modeling, Adaptation, and Personalization. Springer, 266--277.
[34]
Jun Zhang, Mark S. Ackerman, and Lada Adamic. 2007. Expertise networks in online communities: Structure and algorithms. In Proceedings of the 16th International Conference on World Wide Web. ACM, 221--230.
[35]
Min Zhang, Ruihua Song, Chuan Lin, Shaoping Ma, Zhe Jiang, Yijiang Jin, Yiqun Liu, Le Zhao, and S. Ma. 2003. Expansion-based technologies in finding relevant and new information: Thu trec 2002: Novelty track experiments. NIST Special Publication SP 251 (2003), 586--590.

Cited By

View all
  • (2024)Harnessing the Power of Metadata for Enhanced Question Retrieval in Community Question AnsweringIEEE Access10.1109/ACCESS.2024.339544912(65768-65779)Online publication date: 2024
  • (2024)EPAN-SERec: Expertise preference-aware networks for software expert recommendations with knowledge graphExpert Systems with Applications10.1016/j.eswa.2023.122985244(122985)Online publication date: Jun-2024
  • (2024)Improving the clarity of questions in Community Question Answering networksJournal of Intelligent Information Systems10.1007/s10844-024-00847-yOnline publication date: 2-May-2024
  • Show More Cited By

Index Terms

  1. Translations Diversification for Expert Finding: A Novel Clustering-based Approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 13, Issue 3
    June 2019
    261 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3331063
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 May 2019
    Accepted: 01 March 2019
    Revised: 01 November 2018
    Received: 01 January 2018
    Published in TKDD Volume 13, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Expert finding
    2. Stack Overflow
    3. data clustering
    4. translation diversification
    5. translation model

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Harnessing the Power of Metadata for Enhanced Question Retrieval in Community Question AnsweringIEEE Access10.1109/ACCESS.2024.339544912(65768-65779)Online publication date: 2024
    • (2024)EPAN-SERec: Expertise preference-aware networks for software expert recommendations with knowledge graphExpert Systems with Applications10.1016/j.eswa.2023.122985244(122985)Online publication date: Jun-2024
    • (2024)Improving the clarity of questions in Community Question Answering networksJournal of Intelligent Information Systems10.1007/s10844-024-00847-yOnline publication date: 2-May-2024
    • (2024)Towards Robust Expert Finding in Community Question Answering PlatformsAdvances in Information Retrieval10.1007/978-3-031-56069-9_12(152-168)Online publication date: 24-Mar-2024
    • (2023)RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-TrainingIEEE Transactions on Big Data10.1109/TBDATA.2022.31523869:1(186-199)Online publication date: 1-Feb-2023
    • (2022)High-quality domain expert finding method in CQA based on multi-granularity semantic analysis and interest driftInformation Sciences10.1016/j.ins.2022.02.039596(395-413)Online publication date: Jun-2022
    • (2022)Attention-based skill translation models for expert findingExpert Systems with Applications10.1016/j.eswa.2021.116433(116433)Online publication date: Jan-2022
    • (2021)How much do I Stand Out in Communities Q&A? An Analysis of User Interactions based on Graph EmbeddingProceedings of the XVII Brazilian Symposium on Information Systems10.1145/3466933.3466966(1-8)Online publication date: 7-Jun-2021
    • (2021)User Embedding for Expert Finding in Community Question AnsweringACM Transactions on Knowledge Discovery from Data10.1145/344130215:4(1-16)Online publication date: 26-Mar-2021
    • (2021)A Topic Based Method to Classify the Question Clarity in CQA Networks2021 12th International Conference on Information and Knowledge Technology (IKT)10.1109/IKT54664.2021.9685163(96-101)Online publication date: 14-Dec-2021
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media