Abstract
Crowdsourcing Software Development (CSD) has emerged as a new software development paradigm. Topcoder is now the largest competition-based CSD platform. Many organizations use Topcoder to outsource their software tasks to crowd developers in the form of open challenges. To facilitate timely completion of the crowdsourced tasks, it is important to find right developers who are more likely to win a challenge. Recently, many developer recommendation methods for CSD platforms have been proposed. However, these methods often make unrealistic assumptions about developer status or application scenarios. For example, they consider only skillful developers or only developers registered with the challenges. In this paper, we propose a meta-learning based policy model, which firstly filters out those developers who are unlikely to participate in or submit to a given challenge and then recommend the top k developers with the highest possibility of winning the challenge. We have collected Topcoder data between 2009 and 2018 to evaluate the proposed approach. The results show that our approach can successfully identify developers for posted challenges regardless of the current registration status of the developers. In particular, our approach works well in recommending new winners. The accuracy for top-5 recommendation ranges from 30.1% to 91.1%, which significantly outperforms the results achieved by the related work.
Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker PA, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on operating systems design and implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016, pp 265–283
Abhinav K, Dubey A, Jain S, Virdi G, Kass A, Mehta M (2017) Crowdadvisor: a framework for freelancer assessment in online marketplace. In: Proceedings of the 39th international conference on software engineering: software engineering in practice track. IEEE Press, pp 93–102
Aggarwal CC, et al. (2016) Recommender systems. Springer
Al-Shedivat M, Bansal T, Burda Y, Sutskever I, Mordatch I, Abbeel P (2017) Continuous adaptation via meta-learning in nonstationary and competitive environments, arXiv:1710.03641
Alelyani T, Yang Y (2016) Software crowdsourcing reliability: an empirical study on developers behavior. In: Proceedings of the 2nd international workshop on software analytics. ACM, pp 36–42
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug?. In: Proceedings of the 28th international conference on software engineering. ACM, pp 361–370
Archak N (2010) Money, glory and cheap talk: analyzing strategic behavior of contestants in simultaneous crowdsourcing contests on topcoder.com. In: Proceedings of the 19th international conference on World wide web. ACM, pp 21–30
Avazpour I, Pitakrat T, Grunske L, Grundy J (2014) Dimensions and metrics for evaluating recommendation systems, pp 245–273
Baba Y, Kinoshita K, Kashima H (2016) Participation recommendation system for crowdsourcing contests. Expert Syst Appl 58:174–183
Begel A, Herbsleb JD, Storey M-A (2012) The future of collaborative software development. In: Proceedings of the ACM 2012 conference on computer supported cooperative work companion. ACM, pp 17–18
Brazdil P, Carrier CG, Soares C, Vilalta R (2008) Metalearning: applications to data mining. Springer Science & Business Media
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Calefato F, Lanubile F, Novielli N (2018) How to ask for technical help? Evidence-based guidelines for writing questions on stack overflow. Inf Softw Technol 94:186–207
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, pp 785–794
Chen T, Benesty M et al A high performance implementation of xgboost. [Online]. Available: https://github.com/dmlc/xgboost/
Choetkiertikul M, Avery D, Dam HK, Tran T, Ghose AK (2015) Who will answer my question on stack overflow?. In: 24th Australasian software engineering conference, ASWEC 2015, Adelaide, SA, Australia, September 28 - October 1, 2015, pp 155–164
Chollet F, et al. (2015) Keras. [Online]. Available: https://keras.io
Chowdhury A, Soboroff I (2002) Automatic evaluation of world wide web search services. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 421–422
Cui C, Hu M, Weir JD, Wu T (2016) A recommendation system for meta-modeling: a meta-learning based approach. Expert Syst Appl 46:33–44
Cui Q, Wang S, Wang J, Hu Y, Wang Q, Li M (2017) Multi-objective crowd worker selection in crowdsourced testing. In: 29th International conference on software engineering and knowledge engineering (SEKE), pp 218–223
Cunha T, Soares C, de Carvalho AC (2018) Metalearning and recommender systems: a literature review and empirical study on the algorithm selection problem for collaborative filtering. Inform Sci 423:128–144
Domingos P (2000) A unified bias-variance decomposition. In: Proceedings of 17th international conference on machine learning, pp 231–238
Dubey A, Abhinav K, Virdi G (2017) A framework to preserve confidentiality in crowdsourced software development. In: Proceedings of the 39th international conference on software engineering companion. IEEE Press, pp 115–117
Dwarakanath A, Shrikanth N, Abhinav K, Kass A (2016) Trustworthiness in enterprise crowdsourcing: a taxonomy & evidence from data. In: Proceedings of the 38th international conference on software engineering companion. ACM, pp 41–50
Edwards JR, Van Harrison R (1993) Job demands and worker health: three-dimensional reexamination of the relationship between person-environment fit and strain. J Appl Psychol 78(4):628
Fu Y, Sun H, Ye L (2017) Competition-aware task routing for contest based crowdsourced software development. In: 2017 6th International workshop on software mining (SoftwareMining). IEEE, pp 32–39
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge
Gu X, Zhang H, Zhang D, Kim S (2016) Deep api learning. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 631–642
Hannebauer C, Patalas M, Stünkel S, Gruhn V (2016) Automatically recommending code reviewers based on their expertise: an empirical comparison. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ACM, pp 99–110
Hasteer N, Nazir N, Bansal A, Murthy B (2016) Crowdsourcing software development: many benefits many concerns. Procedia Comput Sci 78:48–54
Hauff C, Gousios G (2015) Matching github developer profiles to job advertisements. In: 12th IEEE/ACM Working conference on mining software repositories, MSR 2015, Florence, Italy, May 16–17, 2015, pp 362–366
Hazan E, Klivans AR, Yuan Y (2017) Hyperparameter optimization: a spectral approach. arXiv:1706.00764
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International joint conference on neural networks, 2008. IJCNN 2008. (IEEE World congress on computational intelligence). IEEE, pp 1322–1328
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hu H, Zhang H, Xuan J, Sun W (2014) Effective bug triage based on historical bug-fix information. In: 2014 IEEE 25th International symposium on in software reliability engineering (ISSRE). IEEE, pp 122– 132
Javadi Khasraghi H, Aghaie A (2014) Crowdsourcing contests: understanding the effect of competitors’ participation history on their performance. Behav Inf Technol 33(12):1383–1395
Karim MR, Yang Y, Messinger D, Ruhe G (2018) Learn or earn? - intelligent task recommendation for competitive crowdsourced software development. In: 51st Hawaii international conference on system sciences, HICSS 2018, Hilton Waikoloa Village, Hawaii, USA, January 3-6, 2018, pp 1–10
Khanfor A, Yang Y, Vesonder G, Ruhe G, Messinger D (2017) Failure prediction in crowdsourced software development. In: 2017 24th Asia-Pacific software engineering conference (APSEC). IEEE, pp 495– 504
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Li W, Huhns MN, Tsai WT, Wu W (2015) Crowdsourcing: cloud-based software development. Springer Publishing Company, Incorporated
Mao K, Yang Y, Wang Q, Jia Y, Harman M (2015) Developer recommendation for crowdsourced software development tasks. In: 2015 IEEE symposium on service-oriented system engineering (SOSE). IEEE, pp 347–356
Mao K, Capra L, Harman M, Jia Y (2017) A survey of the use of crowdsourcing in software engineering. J Syst Softw 126:57–84
Metalearning (2009) Concepts and systems. Springer, Berlin, pp 1–10
Munkhdalai T, Yu H (2017) Meta networks, arXiv:1703.00837
Navarro DJ, Dry MJ, Lee MD (2012) Sampling assumptions in inductive generalization. Cognit Sci 36(2):187–223
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Ponzanelli L, Scalabrino S, Bavota G, Mocci A, Oliveto R, Di Penta M, Lanza M (2017) Supporting software developers with a holistic recommender system. In: 2017 IEEE/ACM 39th international conference on software engineering (ICSE). IEEE, pp 94–105
Porto F, Minku L, Mendes E, Simao A (2018) A systematic study of cross-project defect prediction with meta-learning, arXiv:1802.06025
Powers D (2007) Evaluation: from precision, recall and fmeasure to roc, informedness, markedness and correlation. J Mach Learn Technol 2(01):37–63
Procaci TB, Nunes BP, Nurmikko-Fuller T, Siqueira SWM (2016) Finding topical experts in question & answer communities. In: 16th IEEE International conference on advanced learning technologies, ICALT. Austin, TX, USA, July 25-28, 2016, pp 407–411
Rice JR (1976) The algorithm selection problem. Adv Comput 15:65–118
Sanjana NE, Tenenbaum JB (2003) Bayesian models of inductive generalization. In: Advances in neural information processing systems, pp 59–66
Saremi R, Yang Y (2015) Dynamic simulation of software workers and task completion. In: Proceedings of the second international workshop on crowdsourcing in software engineering. IEEE Press, pp 17– 23
Saremi R, Yang Y, Ruhe G, Messinger D (2017) Leveraging crowdsourcing for team elasticity: an empirical evaluation at topcoder. In: 2017 IEEE/ACM 39th International conference on software engineering: software engineering in practice track (ICSE-SEIP). IEEE, pp 103–112
Stol K-J, Fitzgerald B (2014) Researching crowdsourcing software development: perspectives and concerns. In: Proceedings of the 1st international workshop on crowdsourcing in software engineering. ACM, pp 7–10
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: 2016 IEEE/ACM 38th International conference on software engineering (ICSE). IEEE, pp 321–332
Valentini G, Dietterich TG (2002) Bias—variance analysis and ensembles of svm. In: International workshop on multiple classifier systems. Springer, pp 222–231
Wang W, He D, Fang M (2016) An algorithm for evaluating the influence of micro-blog users. In: Proceedings of the 2016 international conference on intelligent information processing. ACM, p 14
Wang Z, Sun H, Fu Y, Ye L (2017) Recommending crowdsourced software developers in consideration of skill improvement. In: 2017 32nd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 717–722
Yang Y, Karim MR, Saremi R, Ruhe G (2016) Who should take this task?: dynamic decision support for crowd workers. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement. ACM, p 8
Ye L, Sun H, Wang X, Wang J (2018) Personalized teammate recommendation for crowdsourced software developers. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE. Montpellier, France, September 3-7, 2018, pp 808–813
Yuan W, Nguyen HH, Jiang L, Chen Y (2018) Libraryguru: api recommendation for android developers. In: Proceedings of the 40th international conference on software engineering: companion proceeedings. ACM, pp 364–365
Zanatta AL, Machado L, Steinmacher I (2018) Competence, collaboration, and time management: barriers and recommendations for crowdworkers. In: Proceedings of the 5th international workshop on crowd sourcing in software engineering. ACM, pp 9–16
Acknowledgements
This work was supported partly by National Key Research and Development Program of China under Grant No.2016YFB1000804, partly by National Natural Science Foundation under Grant No.(61702024, 61828201, 61421003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Kelly Blincoe
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Z., Sun, H. & Zhang, H. Developer recommendation for Topcoder through a meta-learning based policy model. Empir Software Eng 25, 859–889 (2020). https://doi.org/10.1007/s10664-019-09755-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09755-0