Skip to main content
Log in

Mining textual requirements to assist architectural software design: a state of the art review

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Modern Software Engineering (SE) is characterized by the use of several models that establish and show the different states a software product goes through, from its initial conception to its end, passing across its development, setup and maintenance among others. Each phase produces a set of deliverables following different documentation standards, but in many cases, natural language text is a key aspect in the elaboration of such documents. This work surveys the state of the art in the application of text mining techniques to architectural software design, starting from the role of text documents during development phases, specifically the kind of text documents that can be subsequently exploited to assist architects in the complex task of designing software. Intelligent text analysis techniques utilized in software engineering tasks across the software life-cycle are detailed in order to analyze works focused on automatically bridging the gap between requirements and software architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abrahams M, Barkley J (1998) Rtl verification strategies. In: Wescon/98, pp 130–134. doi:10.1109/WESCON.1998.716434

  • Amyot D, Mussbacher G (2001) Bridging the requirements/design gap in dynamic systems with use case maps (UCMs). In: Proceedings of the 23rd international conference on software engineering (ICSE’01), Toronto, pp 743–744

  • Antoniol G, Canfora G, Casazza G, de Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE transactions on software engineering 28(10): 970–983 doi:10.1109/TSE.2002.1041053

    Article  Google Scholar 

  • Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds (CASCON’08), Ontario

  • Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering (ICSE’06), Shanghai, pp 361–370. doi:10.1145/1134285.1134336

  • Asuncion HU, Asuncion AU, Taylor RN (2010) Software traceability with topic modeling. In: Proceedings of the 32nd international conference on software engineering (ICSE’10), Cape Town, vol 1, pp 95–104. doi:10.1145/1806799.1806817

  • Baniassad E, Clarke S (2004a) Finding aspects in requirements with Theme/Doc. In: Early aspects: aspect-oriented requirements engineering and architecture design (AOSD 2004), Lancaster

  • Baniassad E, Clarke S (2004b) Theme: an approach for aspect-oriented analysis and design. In: Proceedings of the 26th international conference on software engineering (ICSE’04), Edinburgh, pp 158–167

  • Bass L, Klein M, Bachmann F (2000) Quality attribute design primitives. Technical report CMU/SEI-2000- TN-017, Software Engineering Institute, Carnegie Mellon University

  • Beck K, Andres C (2004) Extreme programming explained: embrace change, 2nd edn. Addison-Wesley Professional, Reading

    Google Scholar 

  • Bettenburg N, Just S, Schröter A, Weiß C, Premraj R, Zimmermann T (2007) Quality of bug reports in eclipse. In: Proceedings of the 2007 OOPSLA workshop on eclipse technology exchange, Montreal, pp 21–25

  • Boehm B, Egyed A, Kwan J, Port D, Shah A, Madachy R (1998) Using the WinWin spiral model: a case study. Computer 31(7): 33–44. doi:10.1109/2.689675

    Article  Google Scholar 

  • Boehm B, Grünbacher P, Briggs RO (2001) EasyWinWin: A groupware-supported methodology for requirements negotiation. In: Proceedings of the 23rd international conference on software engineering (ICSE’01), Toronto, pp 720–721

  • Booch G, Rumbaugh J, Jacobson I (1999) The unified modeling language user guide. Addison Wesley Longman Publishing Co. Inc., Redwood city

    Google Scholar 

  • Brandozzi M, Perry DE (2003) From goal-oriented requirements to architectural prescriptions: The preskriptor process. In: Proceedings of the 2nd international software requirements to architectures workshop (STRAW’03), pp 107–113

  • Casamayor A, Godoy D, Campo M (2009) Semi-supervised classification of non-functional requirements: an empirical analysis. Rev Iberoam Intel Artif 13(44): 35–45. doi:10.4114/ia.v13i44.1044

    Google Scholar 

  • Casamayor A, Godoy D, Campo M (2010) Identification of non-functional requirements in textual specifications: a semi-supervised learning approach. Inform Softw Technol 52(4): 436–445. doi:10.1016/j.infsof.2009.10.010

    Article  Google Scholar 

  • Castro-Herrera C, Duan C, Cleland-Huang J, Mobasher B (2008) Using data mining and recommender systems to facilitate large-scale, open, and inclusive requirements elicitation processes. In: Proceedings of the 2008 16th IEEE international requirements engineering conference (RE’08), Barcelona, pp 165–168. doi:10.1109/RE.2008.47

  • Castro-Herrera C, Duan C, Cleland-Huang J, Mobasher B (2009) A recommender system for requirements elicitation in large-scale software projects. In: Proceedings of the 2009 ACM symposium on applied computing (SAC’09), Honolulu, pp 1419–1426. doi:10.1145/1529282.1529601

  • Chen X, Hosking JG, Grundy JC (2011) A combination approach for enhancing automated traceability, new ideas and emerging results track. In: Proceedings of the 2011 international conference on software engineering (ICSE’2011), Honolulu

  • Chitchyan R, Sampaio A, Rashid A, Rayson P (2006) A tool suite for aspect-oriented requirements engineering. In: Proceedings of the 2006 international workshop on early aspects (EA’06), Shanghai, pp 19–26. doi:10.1145/1137639.1137644

  • Cleland-Huang J, Mobasher B (2008) Using data mining and recommender systems to scale up the requirements process. In: Proceedings of the 2nd international workshop on ultra-large-scale software-intensive systems (ULSSIS’08), Leipzig, pp 3–6. doi:10.1145/1370700.1370702

  • Cleland-Huang J, Settimi R, Duan C, Zou X (2005) Utilizing supporting evidence to improve dynamic requirements traceability. In: Proceedings of the 13th IEEE international conference on requirements engineering, Paris, pp 135–144. doi:10.1109/RE.2005.78

  • Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: Proceedings of the 14th IEEE international requirements engineering conference (RE’06), Minneapolis, pp 36–45

  • Cleland-Huang J, Settimi R, Zou X, Solc P (2007) Automated classification of non-functional requirements. Requirements Eng 12(2): 103–120

    Article  Google Scholar 

  • Crasso M, Zunino A, Campo M (2008) Easy web service discovery: a query-by-example approach. Sci Comput Programm 71(2): 144–164

    Article  MathSciNet  MATH  Google Scholar 

  • Cubranic D, Murphy GC (2004) Automatic bug triage using text categorization. In: Proceedings of the 16th international conference on software engineering & knowledge engineering (SEKE’04), Banff, pp 92–97

  • Cybulski JL, Reed K (1998) Computer-assisted analysis and refinement of informal software requirements documents. In: Proceedings of the 5th Asia Pacific software engineering conference (APSEC’98), Taipei, pp 128–135

  • Dekhtyar A, Hayes JH, Menzies T (2004) Text is software too. In: International workshop on mining software repositories (MSR’04), Edinburgh, pp 22–26

  • Di Lucca GA, Di Penta M, Gradara S (2002) An approach to classify software maintenance requests. In: Proceedings of the international conference on software maintenance (ICSM’02), Montreal, pp 93–102

  • Diaz-Pace A, Kim H, Bass L, Bianco P, Bachmann F (2008) Integrating quality-attribute reasoning frameworks in the ArchE design assistant. In: Proceedings of the 4th international conference on quality of software-architectures (QoSA’08), Springer, Karlsruhe, LNCS, vol 5281, pp 171–188. doi:10.1007/978-3-540-87879-7_11

  • Dong X, Halevy A, Madhavan J, Nemes E, Zhang J (2004) Similarity search for web services. In: Proceedings of the 30th international conference on very large data bases (VLDB’04), Toronto, pp 372–383

  • Duan C, Cleland-Huang J (2007) Clustering support for automated tracing. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering (ASE’07), Atlanta, pp 244–253. doi:10.1145/1321631.1321668

  • Egyed A, Boehm B (1999) Comparing software system requirements negotiation patterns. Syst Eng J 6(1): 1–14

    Article  Google Scholar 

  • Egyed A, Grünbacher P (2005) Supporting software understanding with automated requirements traceability. Int J Softw Eng Knowl Eng 15(5): 783–810

    Article  Google Scholar 

  • Fantechi A, Gnesi S, Lami G, Maccari A (2002) Application of linguistic techniques for use case analysis. In: Proceedings of the 10th anniversary IEEE joint international conference on requirements engineering (RE’02), Essen, pp 157–164

  • Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) Knowledge discovery and data mining: towards an unifying framework. In: Proceedings of the 2nd international conference on knowledge discovery and data mining (KDD’96), Portland, pp 82–88

  • Fernandes R, Cowie A (2004) Capturing informal requirements as formal models. In: Proceedings of the 9th Australian workshop on requirements engineering (AWRE’04), pp 1–8

  • Frakes WB, Nejmeh BA (1986) Software reuse through information retrieval. SIGIR Forum 21(1–2): 30–36

    Article  Google Scholar 

  • Gegick M, Rotella P, Xie T (2010) Identifying security bug reports via text mining: An industrial case study. In: Proceedings of the 7th IEEE working conference on mining software repositories (MSR 2010), Cape Town, pp 11–20. doi:10.1109/MSR.2010.5463340

  • Gervasi V, Nuseibeh B (2002) Lightweight validation of natural language requirements. Soft Pract Exper 32(2): 113–133

    Article  MATH  Google Scholar 

  • Gervasi V, Zowghi D (2005) Reasoning about inconsistencies in natural language requirements. ACM Trans Softw Eng Methodol 14(3): 277–330

    Article  Google Scholar 

  • Gethers M, Poshyvanyk D (2010) Using relational topic models to capture coupling among classes in object-oriented software systems. In: Proceedings of the 2010 IEEE international conference on software maintenance (ICSM’10), pp 1–10

  • Gibiec M, Czauderna A, Cleland-Huang J (2010) Towards mining replacement queries for hard-to-retrieve traces. In: Proceedings of the IEEE/ACM international conference on automated software engineering (ASE’10), Antwerp, pp 245–254

  • Gokyer G, Cetin S, Sener C, Yondem MT (2008) Non-functional requirements to architectural concerns: ML and NLP at crossroads. In: Proceedings of the 2008 the 3rd international conference on software engineering advances (ICSEA’08), Sliema, pp 400–406

  • Gotel O, Finkelstein CW (1994) An analysis of the requirements traceability problem. In: Proceedings of the 1st international conference on requirements engineering, pp 94–101

  • Grüenbacher P, Briggs R (2001) Surfacing tacit knowledge in requirements negotiation: experiences using Easy Win Win. In: Proceedings of the 4th annual Hawaii international conference on system sciences (HICSS’01), Maui, vol 1, p 1062

  • Grünbacher P, Egyed A, Medvidovic N (2001) Refinement and evolution issues in bridging requirements and architecture—the CBSP approach. In: Proceedings of the 1st international workshop from software requirements to architectures (STRAW’01), Toronto, pp 42–47

  • Grünbacher P, Egyed A, Medvidovic N (2004) Reconciling software requirements and architectures with intermediate models. Softw Syst Model 3(3): 235–253

    Article  Google Scholar 

  • Hayes JH, Dekhtyar A, Osborne J (2003) Improving requirements tracing via information retrieval. In: Proceedings of the 11th IEEE international conference on requirements engineering (RE’03), p 138

  • Hayes JH, Dekhtyar A, Sundaram SK (2005) Improving after-the-fact tracing and mapping: supporting software quality predictions. IEEE Softw 22(6): 30–37. doi:10.1109/MS.2005.156

    Article  Google Scholar 

  • Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1): 4–19

    Article  Google Scholar 

  • Hill E, Pollock L, Vijay-Shanker K (2007) Exploring the neighborhood with dora to expedite software maintenance. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering (ASE’07), Atlanta, pp 14–23

  • Jalaji A, Goff R, Jackson M, Jones N, Menzies T (2006) Making sense of text: identifying non functional requirements early. Technical report, West Virginia University CSEE

  • Kitapci H, Boehm BW (2006) Using a hybrid method for formalizing informal stakeholder requirements inputs. In: Proceedings of the fourth internationa workshop on comparative evaluation in requirements engineering (CERE’06), Minneapolis, pp 48–59. doi:10.1109/CERE.2006.8

  • Ko Y, Park S, Seo J (2000) Web-based requirements elicitation supporting system using requirements categorization. In: Proceedings of 12th international conference on software engineering and knowledge engineering (SEKE’2000), Chicago, pp 334–451

  • Ko Y, Park S, Seo J, Choi S (2007) Using classification techniques for informal requirements in the requirements analysis-supporting system. Inform Softw Technol 49: 1128–1140

    Article  Google Scholar 

  • Kuhn A, Ducasse S, Gírba T (2007) Semantic clustering: identifying topics in source code. Inform Softw Technol 49(3): 230–243

    Article  Google Scholar 

  • Lamkanfi A, Demeyer S, Soetens QD, Tim V (2011) Comparing text mining algorithms for predicting the severity of a reported bug. In: Proceedings of the 15th European conference on software maintenance and reengineering (CSMR 2011)

  • Lin J, Lin CC, Cleland-Huang J, Settimi R, Amaya J, Bedford G, Berenbach B, Khadra OB, Duan C, Zou X (2006) Poirot: a distributed tool supporting enterprise-wide automated traceability. In: Proceedings of the 14th IEEE international requirements engineering conference (RE’06), Minneapolis, pp 356–357. doi:10.1109/RE.2006.48

  • Liu W, Easterbrook S (2003) Eliciting architectural decisions from requirements using a rule-based framework. In: Proceedings of the 2nd international software requirements to architectures workshop (STRAW’03), Portland, pp 94–99

  • Liu Y, Poshyvanyk D, Ferenc R, Gyimothy T, Chrisochoides N (2009) Modeling class cohesion as mixtures of latent topics. In: Proceedings of the IEEE international conference on software maintenance (ICSM 2009), Edmonton, pp 233–242

  • Maarek YS, Berry DM, Kaiser GE (1991) An information retrieval approach for automatically constructing software libraries. IEEE Trans Softw Eng 17(8): 800–813

    Article  Google Scholar 

  • Marcus A, Maletic JI, Sergeyev A (2005) Recovery of traceability links between software documentation and source code. Int J Softw Eng Knowl Eng 15(5): 811–836

    Article  Google Scholar 

  • Matter D, Kuhn A, Nierstrasz O (2009) Assigning bug reports using a vocabulary-based expertise model of developers. In: Proceedings of the 2009 6th IEEE international working conference on mining software repositories (MSR’09), Vancouver, pp 131–140. doi:10.1109/MSR.2009.5069491

  • McMillan C, Poshyvanyk D, Revelle M (2009) Combining textual and structural analysis of software artifacts for traceability link recovery. In: Proceedings of the 2009 ICSE workshop on traceability in emerging forms of software engineering (TEFSE’09), Vancouver, pp 41–48

  • Medvidovic N, Taylor R (2000) A classification and comparison framework for software architecture description languages. IEEE Trans Softw Eng 26(1): 70–93

    Article  Google Scholar 

  • Mich L, Franch M, Inverardi PN (2004) Market research for requirements analysis using linguistic tools. Requirements Eng 9(1): 40–56

    Article  Google Scholar 

  • Mussbacher G, Amyot D, Weiss M (2007) Visualizing early aspects with use case maps. In: Transactions on aspect-oriented software development III. Springer, New York, pp 105–143

  • Nattoch Dag J, Regnell B, Gervasi V, Brinkkemper S (2005) A linguistic-engineering approach to large-scale requirements management. IEEE Softw 22(1): 32–39

    Article  Google Scholar 

  • Nuseibeh B (2001) Weaving together requirements and architectures. IEEE Comput 34(2): 115–117

    Article  Google Scholar 

  • Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2010) On the equivalence of information retrieval methods for automated traceability link recovery. In: Proceedings of the 2010 IEEE 18th international conference on program comprehension (ICPC’10), Braga, pp 68–71

  • Ormandjieva O, Hussain I, Kosseim L (2007) Toward a text classification system for the quality assessment of software requirements written in natural language. In: Proceedings of the 4th international workshop on software quality assurance (SOQUA’07), Dubrovnik, pp 39–45

  • Palmer J, Liang Y (1992) Indexing and clustering of software requirements specifications. Inform Dec Technol 18(4): 283–299

    MATH  Google Scholar 

  • Park S, Kim H, Ko Y, Seo J (2000) Implementation of an efficient requirements-analysis supporting system using similarity measure techniques. Inform Softw Technol 42(6): 429–438

    Article  Google Scholar 

  • Pierce RA (1978) A requirements tracing tool. In: Proceedings of the software quality assurance workshop on functional and performance issues, pp 53–60

  • Plebani P, Pernici B (2009) URBE: Web service retrieval based on similarity evaluation. IEEE Trans Knowl Data Eng 21(11): 1629–1642

    Article  Google Scholar 

  • Ramesh B (1998) Factors influencing requirements traceability practice. Commun ACM 41(12): 37–44

    Article  Google Scholar 

  • Rashid A, Sawyer P, Moreira A, Araújo J (2002) Early aspects: A model for aspect-oriented requirements engineering. In: Proceedings of the 10th anniversary IEEE joint international conference on requirements engineering (RE’02), Essen, pp 199–202

  • Rising L, Janoff NS (2000) The scrum software development process for small teams. IEEE Softw 17(4): 26–32

    Article  Google Scholar 

  • Rosenhainer L (2004) Identifying crosscutting concerns in requirements specifications. In: Early aspects: aspect-oriented requirements engineering and architecture design (AOSD 2004)

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th international conference on software engineering (ICSE’07), Minneapolis, pp 499–510

  • Sampaio A, Rashid A (2008) Mining early aspects from requirements with EA-Miner. In: Companion of the 30th international conference on software engineering, pp 911–912

  • Sampaio A, Loughran N, Rashid A, Rayson P (2005) Mining aspects in requirements. In: Early aspects: aspect-oriented requirements engineering and architecture design (AOSD 2005), Chicago

  • Sampaio A, Rashid A, Chitchyan R, Rayson P (2007) Transactions on aspect-oriented software development iii. In: Rashid A, Aksit M (eds) EA-Miner: towards automation in aspect-oriented requirements engineering. Springer, New York, pp 4–39

  • Savage T, Dit B, Gethers M, Poshyvanyk D (2010a) TopicXP: exploring topics in source code using latent dirichlet allocation. In: Proceedings of the 2010 IEEE international conference on software maintenance (ICSM’10), Timisoara, pp 1–6

  • Savage T, Revelle M, Poshyvanyk D (2010b) FLAT3: feature location and textual tracing tool. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering (ICSE’10), Cape Town, pp 255–258. doi:10.1145/1810295.1810345

  • Schnattinger K, Hahn U (1997) Intelligent text analysis for dynamically maintaining and updating domain knowledge bases. In: Proceedings of the 2nd international symposium on intelligent data analysis (IDA’97), London, pp 4–6

  • Schugerl P, Rilling J, Charland P (2008) Mining bug repositories—a quality assessment. In: Proceedings of the 2008 international conference on computational intelligence for modelling control & automation, pp 1105–1110. doi:10.1109/CIMCA.2008.63

  • Settimi R, Cleland-Huang J, Khadra OB, Mody J, Lukasik W, DePalma C (2004) Supporting software evolution through dynamically retrieving traces to UML artifacts. In: Proceedings of the 7th international workshop on principles of software evolution (IWPSE 2004), Kyoto, pp 49–54

  • Shepherd D, Pollock L, Vijay-Shanker K (2006) Towards supporting on-demand virtual remodularization using program graphs. In: Proceedings of the 5th international conference on aspect-oriented software development (AOSD’06), Bonn, pp 3–14

  • Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the 6th international conference on aspect-oriented software development (AOSD’07), Vancouver, pp 212–224

  • Steele A, Arnold J, Cleland-Huang J (2006) Speech detection of stakeholders’ non-functional requirements. In: Proceedings of the 1st international workshop on multimedia requirements engineering (MERE’06), Minneapolis, p 3

  • Thomas SW, Adams B, Hassan AE, Blostein D (2011) Modeling the evolution of topics in source code histories. In: Proceedings of the 8th working conference on mining software repositories (MSR 2011), Honolulu

  • Tsumaki T, Morisawa Y (2000) A framework of requirements tracing using UML. In: Proceedings of the seventh Asia-Pacific software engineering conference (APSEC’00), Singapore, pp 206–213

  • Wang X, Lai G, Liu C (2009) Recovering relationships between documentation and source code based on the characteristics of software engineering. Electron Notes Theor Comput Sci 243: 121–137

    Article  Google Scholar 

  • Watkins R, Neal M (1994) Why and how of requirements tracing. IEEE Softw 11(4): 104–106

    Article  Google Scholar 

  • Witte R, Li Q, Zhang Y, Rilling J (2007) Ontological text mining of software documents. In: Proceedings of the 12th international conference on applications of natural language to information systems (NLDB 2007), Springer, Paris, LNCS, vol 4592, pp 168–180

  • Xuan J, Jiang ZH, Yan J, Luo Z (2010) Automatic bug triage using semi-supervised text classification. In: Proceedings of the 22nd international conference on software engineering & knowledge engineering (SEKE’2010), Redwood City, pp 209–214

  • Ye Y, Fischer G (2005) Reuse-conducive development environments. Autom Softw Eng 12(2): 199–235

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Agustin Casamayor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Casamayor, A., Godoy, D. & Campo, M. Mining textual requirements to assist architectural software design: a state of the art review. Artif Intell Rev 38, 173–191 (2012). https://doi.org/10.1007/s10462-011-9237-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-011-9237-7

Keywords

Navigation