Skip to main content
Log in

Automatically detecting feature requests from development emails by leveraging semantic sequence mining

  • Original Article
  • Published:
Requirements Engineering Aims and scope Submit manuscript

Abstract

Mailing list is widely used as an important channel for communications between developers and stakeholders. It consists of emails that are posted for various purposes, such as reporting problems, seeking help in usage, managing projects, and discussing new features. Due to the intensive amount of new incoming emails every day, some valuable emails that intend to describe new features may get overlooked by developers. However, identifying these feature requests from development emails is a labor-intensive and challenging task. In this paper, we propose an automated solution to discover feature requests from development emails by leveraging semantic sequence patterns. First, we tag sentences in emails by using 81 fuzzy rules proposed in our previous study. Then we represent the semantic sequence with the contextual information of an email in a 2-g model. After applying sequence pattern mining techniques, we generate 10 semantic sequence patterns from 317 tagged emails that are randomly sampled from the Ubuntu community. We also conduct an empirical evaluation of their capability to discover feature requests from massive emails in Ubuntu and other four open source communities. The results show that our approach can effectively identify feature requests from these emails. Compared to existing baselines, our approach can achieve a better performance in terms of precision, recall, F1-score, AUC, and positive, with the average precision and recall for discovering feature requests from emails being 76% and 86%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://39.104.76.212:8082.

  2. https://www.ifi.uzh.ch/dam/jcr:00000000-14e5-028d-ffff-ffffaffc5e6c/ReplicationpackageDECA.zip.

  3. https://www.ifi.uzh.ch/dam/jcr:00000000-5b34-b3d9-0000-00004910bd8d/DECA_API.zip.

References

  1. Aery M, Chakravarthy S (2005) emailsift: email classification based on structure and content. In: Proceedings of the 5th IEEE international conference on data mining (ICDM 2005), 27–30 Nov 2005, Houston, Texas, USA, pp 18–25

  2. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. Acm Sigmod Rec 22:207–216

    Article  Google Scholar 

  3. Alrajeh D, Russo A, Uchitel S, Kramer J (2016) Logic-based learning in software engineering. In: Proceedings of the 38th international conference on software engineering, ICSE 2016, Austin, TX, USA, May 14–22, 2016, pp 892–893

  4. Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds. ACM, p 23

  5. Antoniol G, Ayari K, Penta MD, Khomh F, Guéhéneuc Y (2008) Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the (2008) conference of the centre for advanced studies on collaborative research, Oct 27–30, 2008. Richmond Hill, p 23

  6. Bacchelli A, Sasso TD, D’Ambros M, Lanza M (2012) Content classification of development emails. In: International conference on software engineering, pp 375–385

  7. Bacchelli A, Mocci A, Cleve A, Lanza M (2017) Mining structured data in natural language artifacts with island parsing. Sci Comput Program 150:31–55

    Article  Google Scholar 

  8. Bagui S, Nandi D, Bagui SC, White RJ (2019) Classifying phishing email using machine learning and deep learning. In: 2019 International conference on cyber security and protection of digital services, cyber security 2018, Oxford, United Kingdom, June 3–4, 2019, pp 1–2

  9. Bahgat EM, Rady S, Gad W, Moawad IF (2018) Efficient email classification approach based on semantic methods. Ain Shams Eng J 9(4):3259–3269

    Article  Google Scholar 

  10. Barushka A, Hajek P (2018) Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks. Appl Intell 48(10):35383556

    Article  Google Scholar 

  11. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305

    MathSciNet  MATH  Google Scholar 

  12. Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479

    Google Scholar 

  13. Burdukiewicz M, Sobczyk P, Lauber C (2015) N-gram analysis of biological sequences. Biol Cybern 9(3):85–95

    Google Scholar 

  14. Chakravarthy S, Venkatachalam A, Telang A (2010) A graph-based approach for multi-folder email classification. In: ICDM 2010, the 10th IEEE international conference on data mining, Sydney, Australia, 14–17 Dec 2010, pp 78–87

  15. Cleland-Huang J, Dumitru H, Duan C, Castro-Herrera C (2009) Automated support for managing feature requests in open forums. Commun ACM 52(10):68–74

    Article  Google Scholar 

  16. Community U (2017) Mailing lists. https://lists.ubuntu.com/

  17. Community U (2017) Ubuntu development discuss. https://lists.ubuntu.com/archives/ubuntu-devel-discuss/

  18. Di Sorbo A, Panichella S, Visaggio CA, Di Penta M, Canfora G, Gall H (2016) Deca: development emails content analyzer. In: Proceedings of the 38th international conference on software engineering companion, ACM, ICSE ’16, pp 641–644

  19. Fang Y, Zhang C, Huang C, Liu L, Yang Y (2019) Phishing email detection using improved RCNN model with multilevel vectors and attention mechanism. IEEE Access 7:56329–56340

    Article  Google Scholar 

  20. Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  21. Goguen JA, Linde C (1993) Techniques for requirements elicitation. In: Proceedings of IEEE international symposium on requirements engineering, RE 1993, San Diego, California, USA, Jan 4–6, 1993, pp 152–164

  22. Groen EC, Seyff N, Ali R, Dalpiaz F, Dörr J, Guzman E, Hosseini M, Marco J, Oriol M, Perini A, Stade MJC (2017) The crowd in requirements engineering: the landscape and challenges. IEEE Softw 34(2):44–52

    Article  Google Scholar 

  23. Guzzi A, Bacchelli A, Lanza M, Pinzger M, Deursen AV (2013) Communication in open source software development mailing lists. In: Working conference on mining software repositories, pp 277–286

  24. Heider F (1958) The psychology of interpersonal relations. Am Sociol Rev 23(6):170

    Google Scholar 

  25. Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: 35th International conference on software engineering, ICSE ’13, San Francisco, CA, USA, May 18–26, 2013, pp 392–401

  26. Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 392–401

  27. Faris H, Ala MAZ, Heidari AA, Aljarah I, Mafarja M, Hassonah MA, Fujita H (2018) An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inf Fusion 48:67–83

    Article  Google Scholar 

  28. Huang Q, Xia X, Lo D, Murphy GC (2020) Automating intention mining. IEEE Trans Softw Eng 46(10):1098–1119. https://doi.org/10.1109/TSE.2018.2876340

  29. Kim Y (2014) Convolutional neural networks for sentence classification. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2014, October 25-29, 2014, Doha, Qatar. Meeting of SIGDAT, a Special Interest Group of the ACL. ACL, pp 1746–1751. https://doi.org/10.3115/v1/d14-1181

  30. Kiritchenko S, Matwin S (2011) Email classification with co-training. Ibm Corp 301–312

  31. Kiritchenko S, Matwin S, Abu-Hakima S (2004) Email classification with temporal features. In: Intelligent information processing and web mining, proceedings of the international IIS: IIPWM’04 conference held in Zakopane, Poland, May 17–20, 2004, pp 523–533

  32. Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. 1605.05101

  33. Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: 2015 IEEE 23rd international requirements engineering conference (RE), pp 116–125

  34. Malle BF (1999) How people explain behavior: a new theoretical framework. Personal Soc Psychol Rev Off J Soc Person Soc Psychol 3(1):23

    Article  Google Scholar 

  35. Malle BF, Knobe J (1997) The folk concept of intentionality. J Exp Soc Psychol 33(2):101–121

    Article  Google Scholar 

  36. Mcmillan C, Mcmillan C, Mcmillan C, Mcmillan C (2017) Detecting user story information in developer-client conversations to generate extractive summaries. In: IEEE/ACM international conference on software engineering, pp 49–59

  37. Merten T, Mager B, Hübner P, Quirchmayr T, Paech B, Bürsner S (2015) Requirements communication in issue tracking systems in four open-source projects. In: REFSQ workshops, pp 114–125

  38. Merten T, Falis M, Hübner P, Quirchmayr T, Bürsner S, Paech B (2016) Software feature request detection in issue tracking systems. In: Requirements engineering conference (RE), 2016 IEEE 24th international, pp 166–175

  39. Morales-Ramirez I, Kifetew FM, Perini A (2017) Analysis of online discussions in support of requirements discovery. In: International conference on advanced information systems engineering. Springer, Berlin, pp 159–174

  40. Pei J, Han J, Mortazaviasl B, Pinto H, Chen Q, Dayal U, Hsu MC (2001) Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth, pp 215–224

  41. Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton

    MATH  Google Scholar 

  42. Robertson AM, Willett P (1998) Applications of n-grams in textual information systems. J Doc 54(1):48–67

    Article  Google Scholar 

  43. Russell SJ, Norvig PN (2010) Artificial intelligence: a modern approach. Third International Edition. Pearson Education. https://dblp.org/rec/books/daglib/0023820.bib

  44. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523

    Article  Google Scholar 

  45. Sankhwar S, Pandey D, Khan RA (2019) Email phishing: an enhanced classification model to detect malicious urls. EAI Endorsed Trans Scal Inf Syst 6(21):e5

    Google Scholar 

  46. Saraiva J, Bird C, Zimmermann T (2015) Products, developers, and milestones: How should i build my N-gram language model. In: Proceedings of the joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of Software Engineering (ESEC/FSE) Industry Track, ACM

  47. Shi L, Wang Q, Li M (2013) Learning from evolution history to predict future requirement changes. In: 21st IEEE international requirements engineering conference, RE 2013, Rio de Janeiro, RJ, Brazil, July 15–19, 2013, pp 135–144

  48. Shi L, Chen C, Wang Q, Boehm BW (2016) Is it a new feature or simply “don’t know yet”?: On automated redundant OSS feature requests identification. In: 24th IEEE international requirements engineering conference, RE 2016, Beijing, China, Sep 12–16, 2016, pp 377–382

  49. Shi L, Chen C, Wang Q, Li S, Boehm B (2017) Understanding feature requests by leveraging fuzzy method and linguistic analysis. In: IEEE/ACM international conference on automated software engineering, pp 440–450

  50. Shi L, Chen C, Wang Q, Li S, Boehm BW (2017) Understanding feature requests by leveraging fuzzy method and linguistic analysis. In: Proceedings of the 32nd IEEE/ACM international conference on automated software engineering, ASE 2017, Urbana, IL, USA, Oct 30–Nov 03, 2017, pp 440–450

  51. Slimani T, Lazzez A (2013) Sequential mining: patterns and algorithms analysis. Int J Comput Electron Res 2(5):639–64

    Google Scholar 

  52. Sorbo AD, Panichella S, Visaggio CA, Penta MD, Canfora G, Gall HC (2015) Development emails content analyzer: intention mining in developer discussions (T). In: Proceedings of the 2015 30th IEEE/ACM international conference on automated software engineering (ASE), pp 12–23

  53. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. Springer, Berlin, pp 1–17

    Google Scholar 

  54. Steinmacher I, Silva MAG, Gerosa MA (2014) Barriers faced by newcomers to open source projects: a systematic review. In: Source Open Corral L, Sillitti A, Succi G, Vlasenko J, Wasserman AI (eds) Software, mobile open source technologies, pp 153–163

  55. Vlas RE, Robinson WN (2012) Two rule-based natural language strategies for requirements discovery and classification in open source software development projects. J Manag Inf Syst 28(4):11–38

    Article  Google Scholar 

  56. Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1–2):31–60

    Article  Google Scholar 

  57. Zhang Y, Shen B, Chen Y (2014) Mining developer mailing list to predict software defects, vol. 1, pp 83–390

Download references

Acknowledgements

Our deepest gratitude goes to the anonymous reviewers for their careful work and thoughtful suggestions that have helped improve this manuscript substantially. We also would like to thank Michael Shoga for constructive criticism of this manuscript. This work is supported by the National Key Research and Development Program of China under Grant No. 2018YFB1403400, Youth Innovation Promotion Association CAS, and the National Science Foundation of China under Grant Nos. 61802374, 61432001, 61602450, and 62002348.

This material is also based upon work supported by the U.S. Department of Defense through the Systems Engineering Research Center (SERC), and the National Science Foundation Grant CMMI-1408909, Developing a Constructive Logic-Based Theory of Value-Based Systems Engineering.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, L., Chen, C., Wang, Q. et al. Automatically detecting feature requests from development emails by leveraging semantic sequence mining. Requirements Eng 26, 255–271 (2021). https://doi.org/10.1007/s00766-020-00344-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00766-020-00344-y

Keywords

Navigation