Abstract
Modern Software Engineering (SE) is characterized by the use of several models that establish and show the different states a software product goes through, from its initial conception to its end, passing across its development, setup and maintenance among others. Each phase produces a set of deliverables following different documentation standards, but in many cases, natural language text is a key aspect in the elaboration of such documents. This work surveys the state of the art in the application of text mining techniques to architectural software design, starting from the role of text documents during development phases, specifically the kind of text documents that can be subsequently exploited to assist architects in the complex task of designing software. Intelligent text analysis techniques utilized in software engineering tasks across the software life-cycle are detailed in order to analyze works focused on automatically bridging the gap between requirements and software architectures.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abrahams M, Barkley J (1998) Rtl verification strategies. In: Wescon/98, pp 130–134. doi:10.1109/WESCON.1998.716434
Amyot D, Mussbacher G (2001) Bridging the requirements/design gap in dynamic systems with use case maps (UCMs). In: Proceedings of the 23rd international conference on software engineering (ICSE’01), Toronto, pp 743–744
Antoniol G, Canfora G, Casazza G, de Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE transactions on software engineering 28(10): 970–983 doi:10.1109/TSE.2002.1041053
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds (CASCON’08), Ontario
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering (ICSE’06), Shanghai, pp 361–370. doi:10.1145/1134285.1134336
Asuncion HU, Asuncion AU, Taylor RN (2010) Software traceability with topic modeling. In: Proceedings of the 32nd international conference on software engineering (ICSE’10), Cape Town, vol 1, pp 95–104. doi:10.1145/1806799.1806817
Baniassad E, Clarke S (2004a) Finding aspects in requirements with Theme/Doc. In: Early aspects: aspect-oriented requirements engineering and architecture design (AOSD 2004), Lancaster
Baniassad E, Clarke S (2004b) Theme: an approach for aspect-oriented analysis and design. In: Proceedings of the 26th international conference on software engineering (ICSE’04), Edinburgh, pp 158–167
Bass L, Klein M, Bachmann F (2000) Quality attribute design primitives. Technical report CMU/SEI-2000- TN-017, Software Engineering Institute, Carnegie Mellon University
Beck K, Andres C (2004) Extreme programming explained: embrace change, 2nd edn. Addison-Wesley Professional, Reading
Bettenburg N, Just S, Schröter A, Weiß C, Premraj R, Zimmermann T (2007) Quality of bug reports in eclipse. In: Proceedings of the 2007 OOPSLA workshop on eclipse technology exchange, Montreal, pp 21–25
Boehm B, Egyed A, Kwan J, Port D, Shah A, Madachy R (1998) Using the WinWin spiral model: a case study. Computer 31(7): 33–44. doi:10.1109/2.689675
Boehm B, Grünbacher P, Briggs RO (2001) EasyWinWin: A groupware-supported methodology for requirements negotiation. In: Proceedings of the 23rd international conference on software engineering (ICSE’01), Toronto, pp 720–721
Booch G, Rumbaugh J, Jacobson I (1999) The unified modeling language user guide. Addison Wesley Longman Publishing Co. Inc., Redwood city
Brandozzi M, Perry DE (2003) From goal-oriented requirements to architectural prescriptions: The preskriptor process. In: Proceedings of the 2nd international software requirements to architectures workshop (STRAW’03), pp 107–113
Casamayor A, Godoy D, Campo M (2009) Semi-supervised classification of non-functional requirements: an empirical analysis. Rev Iberoam Intel Artif 13(44): 35–45. doi:10.4114/ia.v13i44.1044
Casamayor A, Godoy D, Campo M (2010) Identification of non-functional requirements in textual specifications: a semi-supervised learning approach. Inform Softw Technol 52(4): 436–445. doi:10.1016/j.infsof.2009.10.010
Castro-Herrera C, Duan C, Cleland-Huang J, Mobasher B (2008) Using data mining and recommender systems to facilitate large-scale, open, and inclusive requirements elicitation processes. In: Proceedings of the 2008 16th IEEE international requirements engineering conference (RE’08), Barcelona, pp 165–168. doi:10.1109/RE.2008.47
Castro-Herrera C, Duan C, Cleland-Huang J, Mobasher B (2009) A recommender system for requirements elicitation in large-scale software projects. In: Proceedings of the 2009 ACM symposium on applied computing (SAC’09), Honolulu, pp 1419–1426. doi:10.1145/1529282.1529601
Chen X, Hosking JG, Grundy JC (2011) A combination approach for enhancing automated traceability, new ideas and emerging results track. In: Proceedings of the 2011 international conference on software engineering (ICSE’2011), Honolulu
Chitchyan R, Sampaio A, Rashid A, Rayson P (2006) A tool suite for aspect-oriented requirements engineering. In: Proceedings of the 2006 international workshop on early aspects (EA’06), Shanghai, pp 19–26. doi:10.1145/1137639.1137644
Cleland-Huang J, Mobasher B (2008) Using data mining and recommender systems to scale up the requirements process. In: Proceedings of the 2nd international workshop on ultra-large-scale software-intensive systems (ULSSIS’08), Leipzig, pp 3–6. doi:10.1145/1370700.1370702
Cleland-Huang J, Settimi R, Duan C, Zou X (2005) Utilizing supporting evidence to improve dynamic requirements traceability. In: Proceedings of the 13th IEEE international conference on requirements engineering, Paris, pp 135–144. doi:10.1109/RE.2005.78
Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: Proceedings of the 14th IEEE international requirements engineering conference (RE’06), Minneapolis, pp 36–45
Cleland-Huang J, Settimi R, Zou X, Solc P (2007) Automated classification of non-functional requirements. Requirements Eng 12(2): 103–120
Crasso M, Zunino A, Campo M (2008) Easy web service discovery: a query-by-example approach. Sci Comput Programm 71(2): 144–164
Cubranic D, Murphy GC (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th international conference on software engineering & knowledge engineering (SEKE’04), Banff, pp 92–97
Cybulski JL, Reed K (1998) Computer-assisted analysis and refinement of informal software requirements documents. In: Proceedings of the 5th Asia Pacific software engineering conference (APSEC’98), Taipei, pp 128–135
Dekhtyar A, Hayes JH, Menzies T (2004) Text is software too. In: International workshop on mining software repositories (MSR’04), Edinburgh, pp 22–26
Di Lucca GA, Di Penta M, Gradara S (2002) An approach to classify software maintenance requests. In: Proceedings of the international conference on software maintenance (ICSM’02), Montreal, pp 93–102
Diaz-Pace A, Kim H, Bass L, Bianco P, Bachmann F (2008) Integrating quality-attribute reasoning frameworks in the ArchE design assistant. In: Proceedings of the 4th international conference on quality of software-architectures (QoSA’08), Springer, Karlsruhe, LNCS, vol 5281, pp 171–188. doi:10.1007/978-3-540-87879-7_11
Dong X, Halevy A, Madhavan J, Nemes E, Zhang J (2004) Similarity search for web services. In: Proceedings of the 30th international conference on very large data bases (VLDB’04), Toronto, pp 372–383
Duan C, Cleland-Huang J (2007) Clustering support for automated tracing. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering (ASE’07), Atlanta, pp 244–253. doi:10.1145/1321631.1321668
Egyed A, Boehm B (1999) Comparing software system requirements negotiation patterns. Syst Eng J 6(1): 1–14
Egyed A, Grünbacher P (2005) Supporting software understanding with automated requirements traceability. Int J Softw Eng Knowl Eng 15(5): 783–810
Fantechi A, Gnesi S, Lami G, Maccari A (2002) Application of linguistic techniques for use case analysis. In: Proceedings of the 10th anniversary IEEE joint international conference on requirements engineering (RE’02), Essen, pp 157–164
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) Knowledge discovery and data mining: towards an unifying framework. In: Proceedings of the 2nd international conference on knowledge discovery and data mining (KDD’96), Portland, pp 82–88
Fernandes R, Cowie A (2004) Capturing informal requirements as formal models. In: Proceedings of the 9th Australian workshop on requirements engineering (AWRE’04), pp 1–8
Frakes WB, Nejmeh BA (1986) Software reuse through information retrieval. SIGIR Forum 21(1–2): 30–36
Gegick M, Rotella P, Xie T (2010) Identifying security bug reports via text mining: An industrial case study. In: Proceedings of the 7th IEEE working conference on mining software repositories (MSR 2010), Cape Town, pp 11–20. doi:10.1109/MSR.2010.5463340
Gervasi V, Nuseibeh B (2002) Lightweight validation of natural language requirements. Soft Pract Exper 32(2): 113–133
Gervasi V, Zowghi D (2005) Reasoning about inconsistencies in natural language requirements. ACM Trans Softw Eng Methodol 14(3): 277–330
Gethers M, Poshyvanyk D (2010) Using relational topic models to capture coupling among classes in object-oriented software systems. In: Proceedings of the 2010 IEEE international conference on software maintenance (ICSM’10), pp 1–10
Gibiec M, Czauderna A, Cleland-Huang J (2010) Towards mining replacement queries for hard-to-retrieve traces. In: Proceedings of the IEEE/ACM international conference on automated software engineering (ASE’10), Antwerp, pp 245–254
Gokyer G, Cetin S, Sener C, Yondem MT (2008) Non-functional requirements to architectural concerns: ML and NLP at crossroads. In: Proceedings of the 2008 the 3rd international conference on software engineering advances (ICSEA’08), Sliema, pp 400–406
Gotel O, Finkelstein CW (1994) An analysis of the requirements traceability problem. In: Proceedings of the 1st international conference on requirements engineering, pp 94–101
Grüenbacher P, Briggs R (2001) Surfacing tacit knowledge in requirements negotiation: experiences using Easy Win Win. In: Proceedings of the 4th annual Hawaii international conference on system sciences (HICSS’01), Maui, vol 1, p 1062
Grünbacher P, Egyed A, Medvidovic N (2001) Refinement and evolution issues in bridging requirements and architecture—the CBSP approach. In: Proceedings of the 1st international workshop from software requirements to architectures (STRAW’01), Toronto, pp 42–47
Grünbacher P, Egyed A, Medvidovic N (2004) Reconciling software requirements and architectures with intermediate models. Softw Syst Model 3(3): 235–253
Hayes JH, Dekhtyar A, Osborne J (2003) Improving requirements tracing via information retrieval. In: Proceedings of the 11th IEEE international conference on requirements engineering (RE’03), p 138
Hayes JH, Dekhtyar A, Sundaram SK (2005) Improving after-the-fact tracing and mapping: supporting software quality predictions. IEEE Softw 22(6): 30–37. doi:10.1109/MS.2005.156
Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1): 4–19
Hill E, Pollock L, Vijay-Shanker K (2007) Exploring the neighborhood with dora to expedite software maintenance. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering (ASE’07), Atlanta, pp 14–23
Jalaji A, Goff R, Jackson M, Jones N, Menzies T (2006) Making sense of text: identifying non functional requirements early. Technical report, West Virginia University CSEE
Kitapci H, Boehm BW (2006) Using a hybrid method for formalizing informal stakeholder requirements inputs. In: Proceedings of the fourth internationa workshop on comparative evaluation in requirements engineering (CERE’06), Minneapolis, pp 48–59. doi:10.1109/CERE.2006.8
Ko Y, Park S, Seo J (2000) Web-based requirements elicitation supporting system using requirements categorization. In: Proceedings of 12th international conference on software engineering and knowledge engineering (SEKE’2000), Chicago, pp 334–451
Ko Y, Park S, Seo J, Choi S (2007) Using classification techniques for informal requirements in the requirements analysis-supporting system. Inform Softw Technol 49: 1128–1140
Kuhn A, Ducasse S, Gírba T (2007) Semantic clustering: identifying topics in source code. Inform Softw Technol 49(3): 230–243
Lamkanfi A, Demeyer S, Soetens QD, Tim V (2011) Comparing text mining algorithms for predicting the severity of a reported bug. In: Proceedings of the 15th European conference on software maintenance and reengineering (CSMR 2011)
Lin J, Lin CC, Cleland-Huang J, Settimi R, Amaya J, Bedford G, Berenbach B, Khadra OB, Duan C, Zou X (2006) Poirot: a distributed tool supporting enterprise-wide automated traceability. In: Proceedings of the 14th IEEE international requirements engineering conference (RE’06), Minneapolis, pp 356–357. doi:10.1109/RE.2006.48
Liu W, Easterbrook S (2003) Eliciting architectural decisions from requirements using a rule-based framework. In: Proceedings of the 2nd international software requirements to architectures workshop (STRAW’03), Portland, pp 94–99
Liu Y, Poshyvanyk D, Ferenc R, Gyimothy T, Chrisochoides N (2009) Modeling class cohesion as mixtures of latent topics. In: Proceedings of the IEEE international conference on software maintenance (ICSM 2009), Edmonton, pp 233–242
Maarek YS, Berry DM, Kaiser GE (1991) An information retrieval approach for automatically constructing software libraries. IEEE Trans Softw Eng 17(8): 800–813
Marcus A, Maletic JI, Sergeyev A (2005) Recovery of traceability links between software documentation and source code. Int J Softw Eng Knowl Eng 15(5): 811–836
Matter D, Kuhn A, Nierstrasz O (2009) Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings of the 2009 6th IEEE international working conference on mining software repositories (MSR’09), Vancouver, pp 131–140. doi:10.1109/MSR.2009.5069491
McMillan C, Poshyvanyk D, Revelle M (2009) Combining textual and structural analysis of software artifacts for traceability link recovery. In: Proceedings of the 2009 ICSE workshop on traceability in emerging forms of software engineering (TEFSE’09), Vancouver, pp 41–48
Medvidovic N, Taylor R (2000) A classification and comparison framework for software architecture description languages. IEEE Trans Softw Eng 26(1): 70–93
Mich L, Franch M, Inverardi PN (2004) Market research for requirements analysis using linguistic tools. Requirements Eng 9(1): 40–56
Mussbacher G, Amyot D, Weiss M (2007) Visualizing early aspects with use case maps. In: Transactions on aspect-oriented software development III. Springer, New York, pp 105–143
Nattoch Dag J, Regnell B, Gervasi V, Brinkkemper S (2005) A linguistic-engineering approach to large-scale requirements management. IEEE Softw 22(1): 32–39
Nuseibeh B (2001) Weaving together requirements and architectures. IEEE Comput 34(2): 115–117
Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2010) On the equivalence of information retrieval methods for automated traceability link recovery. In: Proceedings of the 2010 IEEE 18th international conference on program comprehension (ICPC’10), Braga, pp 68–71
Ormandjieva O, Hussain I, Kosseim L (2007) Toward a text classification system for the quality assessment of software requirements written in natural language. In: Proceedings of the 4th international workshop on software quality assurance (SOQUA’07), Dubrovnik, pp 39–45
Palmer J, Liang Y (1992) Indexing and clustering of software requirements specifications. Inform Dec Technol 18(4): 283–299
Park S, Kim H, Ko Y, Seo J (2000) Implementation of an efficient requirements-analysis supporting system using similarity measure techniques. Inform Softw Technol 42(6): 429–438
Pierce RA (1978) A requirements tracing tool. In: Proceedings of the software quality assurance workshop on functional and performance issues, pp 53–60
Plebani P, Pernici B (2009) URBE: Web service retrieval based on similarity evaluation. IEEE Trans Knowl Data Eng 21(11): 1629–1642
Ramesh B (1998) Factors influencing requirements traceability practice. Commun ACM 41(12): 37–44
Rashid A, Sawyer P, Moreira A, Araújo J (2002) Early aspects: A model for aspect-oriented requirements engineering. In: Proceedings of the 10th anniversary IEEE joint international conference on requirements engineering (RE’02), Essen, pp 199–202
Rising L, Janoff NS (2000) The scrum software development process for small teams. IEEE Softw 17(4): 26–32
Rosenhainer L (2004) Identifying crosscutting concerns in requirements specifications. In: Early aspects: aspect-oriented requirements engineering and architecture design (AOSD 2004)
Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th international conference on software engineering (ICSE’07), Minneapolis, pp 499–510
Sampaio A, Rashid A (2008) Mining early aspects from requirements with EA-Miner. In: Companion of the 30th international conference on software engineering, pp 911–912
Sampaio A, Loughran N, Rashid A, Rayson P (2005) Mining aspects in requirements. In: Early aspects: aspect-oriented requirements engineering and architecture design (AOSD 2005), Chicago
Sampaio A, Rashid A, Chitchyan R, Rayson P (2007) Transactions on aspect-oriented software development iii. In: Rashid A, Aksit M (eds) EA-Miner: towards automation in aspect-oriented requirements engineering. Springer, New York, pp 4–39
Savage T, Dit B, Gethers M, Poshyvanyk D (2010a) TopicXP: exploring topics in source code using latent dirichlet allocation. In: Proceedings of the 2010 IEEE international conference on software maintenance (ICSM’10), Timisoara, pp 1–6
Savage T, Revelle M, Poshyvanyk D (2010b) FLAT3: feature location and textual tracing tool. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering (ICSE’10), Cape Town, pp 255–258. doi:10.1145/1810295.1810345
Schnattinger K, Hahn U (1997) Intelligent text analysis for dynamically maintaining and updating domain knowledge bases. In: Proceedings of the 2nd international symposium on intelligent data analysis (IDA’97), London, pp 4–6
Schugerl P, Rilling J, Charland P (2008) Mining bug repositories—a quality assessment. In: Proceedings of the 2008 international conference on computational intelligence for modelling control & automation, pp 1105–1110. doi:10.1109/CIMCA.2008.63
Settimi R, Cleland-Huang J, Khadra OB, Mody J, Lukasik W, DePalma C (2004) Supporting software evolution through dynamically retrieving traces to UML artifacts. In: Proceedings of the 7th international workshop on principles of software evolution (IWPSE 2004), Kyoto, pp 49–54
Shepherd D, Pollock L, Vijay-Shanker K (2006) Towards supporting on-demand virtual remodularization using program graphs. In: Proceedings of the 5th international conference on aspect-oriented software development (AOSD’06), Bonn, pp 3–14
Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the 6th international conference on aspect-oriented software development (AOSD’07), Vancouver, pp 212–224
Steele A, Arnold J, Cleland-Huang J (2006) Speech detection of stakeholders’ non-functional requirements. In: Proceedings of the 1st international workshop on multimedia requirements engineering (MERE’06), Minneapolis, p 3
Thomas SW, Adams B, Hassan AE, Blostein D (2011) Modeling the evolution of topics in source code histories. In: Proceedings of the 8th working conference on mining software repositories (MSR 2011), Honolulu
Tsumaki T, Morisawa Y (2000) A framework of requirements tracing using UML. In: Proceedings of the seventh Asia-Pacific software engineering conference (APSEC’00), Singapore, pp 206–213
Wang X, Lai G, Liu C (2009) Recovering relationships between documentation and source code based on the characteristics of software engineering. Electron Notes Theor Comput Sci 243: 121–137
Watkins R, Neal M (1994) Why and how of requirements tracing. IEEE Softw 11(4): 104–106
Witte R, Li Q, Zhang Y, Rilling J (2007) Ontological text mining of software documents. In: Proceedings of the 12th international conference on applications of natural language to information systems (NLDB 2007), Springer, Paris, LNCS, vol 4592, pp 168–180
Xuan J, Jiang ZH, Yan J, Luo Z (2010) Automatic bug triage using semi-supervised text classification. In: Proceedings of the 22nd international conference on software engineering & knowledge engineering (SEKE’2010), Redwood City, pp 209–214
Ye Y, Fischer G (2005) Reuse-conducive development environments. Autom Softw Eng 12(2): 199–235
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Casamayor, A., Godoy, D. & Campo, M. Mining textual requirements to assist architectural software design: a state of the art review. Artif Intell Rev 38, 173–191 (2012). https://doi.org/10.1007/s10462-011-9237-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-011-9237-7