skip to main content
10.1145/3275219.3275226acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

Profiling Developer Expertise across Software Communities with Heterogeneous Information Network Analysis

Published: 16 September 2018 Publication History

Abstract

Knowing developer expertise is critical for achieving effective task allocation. However, it is of great challenge to accurately profile the expertise of developers over the Internet as their activities often disperse across different online communities. In this regard, the existing works either merely concern a single community, or simply sum up the expertise in individual communities. The former suffers from low accuracy due to incomplete data, while the latter impractically assumes that developer expertise is completely independent and irrelavant across communities. To overcome those limitations, we propose a new approach to profile developer expertise across software communities through heterogeneous information network (HIN) analysis. A HIN is first built by analyzing the developer activities in various communities, where nodes represent objects like developers and skills, and edges represent the relations among objects. Second, as random walk with restart (RWR) is known for its ability to capture the global structure of the whole network, we adopt RWR over the HIN to estimate the proximity of developer nodes and skill nodes, which essentially reflects developer expertise. Based on the data of 72,645 common users of GitHub and Stack Overflow, we conducted an empirical study and evaluated developer expertise using proposed approach. To evaluate the effect of our approach, we use the obtained expertise to estimate the competency of developers in answering the questions posted in Stack Overflow. The experimental results demonstrate the superiority of our approach over existing methods.

References

[1]
Ricardo Baeza-Yates, Berthier Ribeiro-Neto, et al. 1999. Modern information retrieval. Vol. 463. ACM press New York.
[2]
Christian Bizer, Tom Heath, and Tim Berners-Lee. 2011. Linked data: The story so far. In Semantic services, interoperability and web applications: emerging concepts. IGI Global, 205--227.
[3]
Mohamed Bouguessa, Benoît Dumoulin, and Shengrui Wang. 2008. Identifying authoritative actors in question-answering forums: the case of yahoo! answers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 866--874.
[4]
Thomas Cover and Peter Hart. 1967. Nearest neighbor pattern classification. IEEE transactions on information theory 13, 1 (1967), 21--27.
[5]
Yang Fu, Hailong Sun, and Luting Ye. 2017. Competition-aware task routing for contest based crowdsourced software development. In Software Mining (SoftwareMining), 2017 6th International Workshop on. IEEE, 32--39.
[6]
Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu. 2008. Tapping on the potential of q&a community by recommending answer providers. In Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 921--930.
[7]
Claudia Hauff and Georgios Gousios. 2015. Matching GitHub developer profiles to job advertisements. In Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE Press, 362--366.
[8]
Weizhi Huang, Wenkai Mo, Beijun Shen, Yu Yang, and Ning Li. 2016. CPDScorer: Modeling and Evaluating Developer Programming Ability across Software Communities. In SEKE. 87--92.
[9]
Xiangnan Kong, Philip S Yu, Ying Ding, and David J Wild. 2012. Meta path-based collective classification in heterogeneous information networks. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 1567--1571.
[10]
Erik Kouters, Bogdan Vasilescu, Alexander Serebrenik, and Mark GJ van den Brand. 2012. Who's who in Gnome: Using LSA to merge software repository identities. In Software Maintenance (ICSM), 2012 28th IEEE International Conference on. IEEE, 592--595.
[11]
Jing Liu, Young-In Song, and Chin-Yew Lin. 2011. Competition-based user expertise score estimation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 425--434.
[12]
Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, and Ramayya Krishnan. 2014. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 51--62.
[13]
Xiaoyong Liu, W Bruce Croft, and Matthew Koll. 2005. Finding experts in community-based question-answering services. In Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 315--316.
[14]
László Lovász. 1993. Random walks on graphs. Combinatorics, Paul erdos is eighty 2, 1--46 (1993), 4.
[15]
Sergey Melnik, Hector Garcia-Molina, and Erhard Rahm. 2002. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In Data Engineering, 2002. Proceedings. 18th International Conference on. IEEE, 117--128.
[16]
Wenkai Mo, Beijun Shen, Yuting Chen, and Jiangang Zhu. 2015. Tbil: A taggingbased approach to identity linkage across software communities. In Software Engineering Conference (APSEC), 2015 Asia-Pacific. IEEE, 56--63.
[17]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
[18]
Rohit Saxena and Niranjan Pedanekar. 2017. I Know What You Coded Last Summer: Mining Candidate Expertise from GitHub Repositories. In Companion of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 299--302.
[19]
Chuan Shi, Xiangnan Kong, Philip S Yu, Sihong Xie, and Bin Wu. 2012. Relevance search in heterogeneous networks. In Proceedings of the 15th International Conference on Extending Database Technology. ACM, 180--191.
[20]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2017. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2017), 17--37.
[21]
Yu-Keng Shih and Srinivasan Parthasarathy. 2012. Scalable global alignment for multiple biological networks. In BMC bioinformatics, Vol. 13. BioMed Central, S11.
[22]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992--1003.
[23]
Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, 565--576.
[24]
Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S Yu, and Xiao Yu. 2013. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 7, 3 (2013), 11.
[25]
Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3, 3 (2007), 1--13.
[26]
Bogdan Vasilescu, Vladimir Filkov, and Alexander Serebrenik. 2013. Stackoverflow and github: Associations between software development and crowdsourced knowledge. In Social computing (SocialCom), 2013 international conference on. IEEE, 188--195.
[27]
Rahul Venkataramani, Atul Gupta, Allahbaksh Asadullah, Basavaraju Muddu, and Vasudev Bhat. 2013. Discovery of technical expertise from open source code repositories. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 97--98.
[28]
Zizhe Wang, Hailong Sun, Yang Fu, and Luting Ye. 2017. Recommending crowdsourced software developers in consideration of skill improvement. In Automated Software Engineering (ASE), 2017 32nd IEEE/ACM International Conference on. IEEE, 717--722.
[29]
Zhenglin Xia, Hailong Sun, Jing Jiang, Xu Wang, and Xudong Liu. 2017. A hybrid approach to code reviewer recommendation with collaborative filtering. In Software Mining (SoftwareMining), 2017 6th International Workshop on. IEEE, 24--31.
[30]
Yunxiang Xiong, Zhangyuan Meng, Beijun Shen, and Wei Yin. 2017. Mining Developer Behavior Across GitHub and StackOverflow. In SEKE. 578--583.
[31]
Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, and Zhong Chen. 2013. Cqarank: jointly model topics and expertise in community question answering. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 99--108.
[32]
Haochao Ying, Liang Chen, Tingting Liang, and Jian Wu. 2016. EARec: leveraging expertise and authority for pull-request reviewer recommendation in GitHub. In Proceedings of the 3rd International Workshop on CrowdSourcing in Software Engineering. ACM, 29--35.
[33]
Jun Zhang, Mark S Ackerman, and Lada Adamic. 2007. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on World Wide Web. ACM, 221--230.
[34]
Jiawei Zhang and S Yu Philip. 2015. Integrated Anchor and Social Link Predictions across Social Networks. In IJCAI. 2125--2132.
[35]
Jiawei Zhang, Philip S Yu, and Zhi-Hua Zhou. 2014. Meta-path based multi-network collective link prediction. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1286--1295.
[36]
Wei ZHANG and Hong MEI. 2017. Software development based on collective intelligence on the Internet: feasibility, state-of-the-practice, and challenges. SCIENTIA SINICA Informationis 47, 12 (2017), 1601--1622.
[37]
Xunhui Zhang, Tao Wang, Gang Yin, Cheng Yang, Yue Yu, and Huaimin Wang. 2017. DevRec: A Developer Recommendation System for Open Source Repositories. In International Conference on Software Reuse. Springer, 3--11.
[38]
Zhou Zhao, Qifan Yang, Deng Cai, Xiaofei He, and Yueting Zhuang. 2016. Expert Finding for Community-Based Question Answering via Ranking Metric Network Learning. In IJCAI. 3000--3006.
[39]
Zhou Zhao, Lijun Zhang, Xiaofei He, and Wilfred Ng. 2015. Expert finding for question answering via graph regularized matrix completion. IEEE Transactions on Knowledge and Data Engineering 27, 4 (2015), 993--1004.
[40]
Guangyou Zhou, Siwei Lai, Kang Liu, and Jun Zhao. 2012. Topic-sensitive probabilistic model for expert finding in question answer communities. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 1662--1666.
[41]
Minghui Zhou and Audris Mockus. 2010. Developer fluency: Achieving true mastery in software projects. In Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering. ACM, 137--146.
[42]
Hengshu Zhu, Enhong Chen, Hui Xiong, Huanhuan Cao, and Jilei Tian. 2014. Ranking user authority with relevant knowledge categories for expert finding. World Wide Web 17, 5 (2014), 1081--1107.

Cited By

View all
  • (2024)Cognition2Vocation: meta-learning via ConvNets and continuous transformersNeural Computing and Applications10.1007/s00521-024-09749-036:21(12935-12950)Online publication date: 23-Apr-2024
  • (2023)Exploring Behavioral Trustworthiness of Github DevelopersProceedings of the 2023 5th World Symposium on Software Engineering10.1145/3631991.3632005(92-95)Online publication date: 22-Sep-2023
  • (2023)Value-Wise ConvNet for Transformer Models: An Infinite Time-Aware Recommender SystemIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.321923135:10(9932-9945)Online publication date: 1-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware
September 2018
167 pages
ISBN:9781450365901
DOI:10.1145/3275219
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Institute of Software, Chinese Academy of Sciences
  • CCF: China Computer Federation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Developer expertise
  2. heterogeneous information network
  3. random walk with restart

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

Internetware '18

Acceptance Rates

Internetware '18 Paper Acceptance Rate 20 of 26 submissions, 77%;
Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Cognition2Vocation: meta-learning via ConvNets and continuous transformersNeural Computing and Applications10.1007/s00521-024-09749-036:21(12935-12950)Online publication date: 23-Apr-2024
  • (2023)Exploring Behavioral Trustworthiness of Github DevelopersProceedings of the 2023 5th World Symposium on Software Engineering10.1145/3631991.3632005(92-95)Online publication date: 22-Sep-2023
  • (2023)Value-Wise ConvNet for Transformer Models: An Infinite Time-Aware Recommender SystemIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.321923135:10(9932-9945)Online publication date: 1-Oct-2023
  • (2022)Profiling developers to predict vulnerable code changesProceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3558489.3559069(32-41)Online publication date: 7-Nov-2022
  • (2022)A Collaboration-Aware Approach to Profiling Developer Expertise with Cross-Community Data2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS57517.2022.00043(344-355)Online publication date: Dec-2022
  • (2021)Crypto Experts Advise What They Adopt2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)10.1109/ASEW52652.2021.00044(179-184)Online publication date: Nov-2021
  • (2020)SoftRec: Multi-Relationship Fused Software Developer RecommendationApplied Sciences10.3390/app1012433310:12(4333)Online publication date: 24-Jun-2020
  • (2020)Best Answerers Prediction With Topic Based GAT In Q&A SitesProceedings of the 12th Asia-Pacific Symposium on Internetware10.1145/3457913.3457935(156-164)Online publication date: 1-Nov-2020
  • (2020)Studying Software Developer Expertise and Contributions in Stack Overflow and GitHub2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME46990.2020.00038(312-323)Online publication date: Sep-2020
  • (2020)Gathering GitHub OSS Requirements from Q&A Community: an Empirical Study2020 25th International Conference on Engineering of Complex Computer Systems (ICECCS)10.1109/ICECCS51672.2020.00024(145-155)Online publication date: Oct-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media