Skip to main content
Log in

Correlation-based software search by leveraging software term database

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Internet-scale open source software (OSS) production in various communities generates abundant reusable resources for software developers. However, finding the desired and mature software with keyword queries from a considerable number of candidates, especially for the fresher, is a significant challenge because current search services often fail to understand the semantics of user queries. In this paper, we construct a software term database (STDB) by analyzing tagging data in Stack Overflow and propose a correlation-based software search (CBSS) approach that performs correlation retrieval based on the term relevance obtained from STDB. In addition, we design a novel ranking method to optimize the initial retrieval result. We explore four research questions in four experiments, respectively, to evaluate the effectiveness of the STDB and investigate the performance of the CBSS. The experiment results show that the proposed CBSS can effectively respond to keyword-based software searches and significantly outperforms other existing search services at finding mature software.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Frakes WB, Kang K. Software reuse research: status and future. IEEE transactions on Software Engineering, 2005, 31(7): 529–536

    Article  Google Scholar 

  2. Yin G, Wang T, Wang H, Fan Q, Zhang Y, Yu Y, Yang C. OSSEAN: mining crowd wisdom in open source communities. In: Proceedings of IEEE Symposium on Service-oriented System Engineering. 2015, 367–371

    Google Scholar 

  3. Krueger C W. Software reuse. ACM Computing Surveys, 1992, 24(2): 131–183

    Article  Google Scholar 

  4. Ghezzi C, Jazayeri M, Mandrioli D. Fundamentals of Software Engineering. Beijing: China Electric Power Press, 2006

    MATH  Google Scholar 

  5. Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T. Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the International Conference on Software Engineering. 2013, 842–851

    Google Scholar 

  6. Chau M, Chen H. Comparison of three vertical search spiders. Computer, 2003, 36(5): 56–62

    Article  Google Scholar 

  7. Guha R, McCool R, Miller E. Semantic search. Bulletin of the American Society for Information Science & Technology, 2003, 36(1): 700–709

    Google Scholar 

  8. Howard M J, Gupta S, Pollock L, Vijay-Shanker K. Automatically mining software-based, semantically-similar words from comment-code mappings. In: Proceedings of the 10th Working Conference on Mining Software Repositories. 2013, 377–386

    Google Scholar 

  9. Yang J, Tan L. Swordnet: inferring semantically related words from software context. Empirical Software Engineering, 2014, 19(6): 161–170

    Article  MathSciNet  Google Scholar 

  10. Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of IEEE International Conference on Software Maintenance. 2012, 604–607

    Google Scholar 

  11. Tian Y, Lo D, Lawall J. Automated construction of a software-specific word similarity database. In: proceedings of IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering. 2014, 44–53

    Google Scholar 

  12. Meij E, Balog K, Odijk D. Entity linking and retrieval for semantic search. In: Proceedings of ACM International Conference on Web Search and Data Mining. 2014, 683–684

    Chapter  Google Scholar 

  13. Rasolofo Y, Savoy J. Term proximity scoring for keyword-based retrieval systems. In: Proceedings of European Conference on Information Retrieval. 2003, 207–218

    Google Scholar 

  14. Widdows C, Duijnhouwer F. Open source maturity model. Cap Gemini Ernst & Young, 2003

    Google Scholar 

  15. Wasserman A I, PalM, Chan C. The business readiness rating: a framework for evaluating open source. EFOSS-Evaluation Framework for Open Source Software, 2006

    Google Scholar 

  16. Russo B, Damiani E, Hissam S, Lundell B, Succi G. Open Source Development, Communities and Quality. Springer US, 2008

    Book  Google Scholar 

  17. Yu Y, Wang H, Yin G, Wang T. Reviewer recommendation for pullrequests in GitHub: What can we learn from code review and bug assignment. Information and Software Technology, 2016, 74: 204–218

    Article  Google Scholar 

  18. Fan Q, Wang H, Yin G, Wang T. Ranking open source software based on crowd wisdom. In: Proceedings of IEEE International Conference on Software Engineering and Service Science. 2015, 966–972

    Google Scholar 

  19. Zhang Y, Yin G, Wang T, Yu Y, Wang H. Evaluating bug severity using crowd-based knowledge: an exploratory study. In: Proceedings of the 7th Asia-Pacific Symposium on Internetware. 2015

    Google Scholar 

  20. Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L. Min(e)d your tags: analysis of question response time in stackoverflow. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2014, 328–335

    Google Scholar 

  21. Pal D, Mitra M, Bhattacharya S. Exploring query categorisation for query expansion: a study. Computer Science, 2015

    Google Scholar 

  22. Miller G A. Wordnet: a lexical database for English. Communications of the ACM, 1995, 38(11): 39–41

    Article  Google Scholar 

  23. Stanley C, Byrne M D. Predicting tags for stackoverflow posts. Proceedings of ICCM, 2013

    Google Scholar 

  24. Short L, Wong C, Zeng D. Tag recommendations in stackoverflow. San Francisco: Stanford University, 2014

    Google Scholar 

  25. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Computer Science, 2013

    Google Scholar 

  26. Jamieson S. Likert scales: how to (ab)use them. Medical Education, 2004, 38(38): 1217–1218

    Article  Google Scholar 

  27. Manning C D, Raghavan P, Tze H. Introduction to Information Retrieval. Beijing: Posts & Telecom Press, 2010

    Google Scholar 

  28. Aula A, Majaranta P, Räihä K J. Eye-tracking reveals the personal styles for search result evaluation. In: Proceedings of IFIP Conference on Human-Computer Interaction. 2005, 1058–1061

    Google Scholar 

  29. Hucka M, Graham M J. Software search is not a science, even among scientists. 2016, arXiv preprint arXiv:1605.02265

    Google Scholar 

  30. Bissyande T F, Thung F, Lo D, Jiang L, Reveillere L. Orion: a software project search engine with integrated diverse software artifacts. In: Proceedings of the International Conference on Engineering of Complex Computer Systems. 2013, 242–245

    Google Scholar 

  31. Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P P. Sourcerer: mining and searching Internet-scale software repositories. Data Mining and Knowledge Discovery, 2009, 18(2): 300–336

    Article  MathSciNet  Google Scholar 

  32. Lu M, Sun X,Wang S, Lo D. Query expansion via wordnet for effective code search. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering. 2015, 545–549

    Google Scholar 

  33. Nie L, Jiang H, Ren Z, Sun Z, Li X. Query expansion based on crowd knowledge for code search. IEEE Transactions on Services Computing, 2016, 9(5): 771–783

    Article  Google Scholar 

  34. Lv F, Zhang H, Lou J, Wang S, Zhang D, Zhao J. Codehow: effective code search based on API understanding and extended boolean model(e). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering. 2015, 260–270

    Google Scholar 

  35. McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q. Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 2012, 38(5): 1069–1087

    Article  Google Scholar 

  36. Sridhara G, Hill E, Pollock L, Vijay-Shanker K. Identifying word relations in software: a comparative study of semantic similarity tools. In: Proceedings of IEEE International Conference on Program Comprehension. 2008, 123–132

    Google Scholar 

  37. Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of IEEE International Conference on Software Maintenance. 2012, 604–607

    Google Scholar 

  38. Tian Y, Lo D, Lawall J. SEWordSim: software-specific word similarity database. In: Proceedings of the 36th ACM International Conference on Software Engineering. 2014, 568–571

    Google Scholar 

  39. Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L. Min(e)d your tags: analysis of question response time in stackoverflow. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2014, 328–335

    Google Scholar 

  40. Wang S, Lo D, Vasilescu B, Serebrenik A. Entagrec: an enhanced tag recommendation system for software information sites. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2014, 291–300

    Google Scholar 

  41. Mo W, Zhu J, Qian Z, Shen B. SOLinker: constructing semantic links between tags and URLs on StackOverflow. In: Proceedings of the 40th IEEE Annual Computer Software and Applications Conference. 2016, 582–591

    Google Scholar 

  42. Chen C, Gao S, Xing Z. Mining analogical libraries in Q&A discussions–incorporating relational and categorical knowledge into word embedding In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2016, 338–348

    Google Scholar 

Download references

Acknowledgements

The research was supported by the National Natural Science Foundation of China (Grant Nos. 61432020, 61303064, 61472430, 61502512) and National Grand R&D Plan (2016YFB1000805).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Yin.

Additional information

Zhixing Li received his BS degree in computer science from Chongqing University, China in 2015. He is now an MS candidate in computer science at National University of Defense Technology, China. His research interests include open source software engineering, data mining, and knowledge discovering in open source software.

Gang Yin received his PhD degree in computer science from National University of Defense Technology (NUDT), China in 2006. He is now an associate professor in NUDT. He has worked in several major research projects including national 973, 863 projects and so on. He has published more than 60 research papers in international conferences and journals. His current research interests include distributed computing, information security, software engineering, and machine learning.

Tao Wang received both his BS and MS degrees in computer science from National University of Defense Technology (NUDT), China in 2007 and 2010. He is now a PhD candidate in computer science, NUDT. His research interests include open source software engineering, machine learning, data mining, and knowledge discovering in open source software.

Yang Zhang received both his BS and MS degrees in computer science from National University of Defense Technology (NUDT), China in 2013 and 2015. He is now a PhD candidate in computer science, NUDT. His research interests include open source software engineering, data mining, and social coding networks.

Yue Yu received his PhD degree in computer science from National University of Defense Technology (NUDT), China in 2016. He is now an associate professor in NUDT. He has visited UC Davis supported by CSC scholarship. His research findings has published on MSR, FSE, IST, ICSME APSEC and SEKE. His current research interests include software engineering, spanning from mining software repositories and analyzing social coding networks.

Huaimin Wang received his PhD degree in computer science from National University of Defense Technology (NUDT), China in 1992. He is now a professor and chief engineer in Department of Educational Affairs, NUDT. He has been awarded the “Chang Jiang Scholars Program” professor and the Distinct Young Scholar, etc. He has published more than 100 research papers in peer-reviewed international conferences and journals. His current research interests include middleware, software agent, and trustworthy computing.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Yin, G., Wang, T. et al. Correlation-based software search by leveraging software term database. Front. Comput. Sci. 12, 923–938 (2018). https://doi.org/10.1007/s11704-017-6573-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-017-6573-z

Keywords

Navigation