Towards semi-automated assignment of software change requests

https://doi.org/10.1016/j.jss.2016.01.038Get rights and content

Highlights

  • We present a configurable approach to assign Change Requests to software developers.

  • It supports contextual information necessary to dynamic environments.

  • The approach relies on Rule-Based Expert System and machine learning techniques.

  • It shows an improvement of accuracy up to 46.5% over other approaches.

Abstract

Change Requests (CRs) are key elements to software maintenance and evolution. Finding the appropriate developer to a CR is crucial for obtaining the lowest, economically feasible, fixing time. Nevertheless, assigning CRs is a labor-intensive and time consuming task. In this paper, we report on a questionnaire-based survey with practitioners to understand the characteristics of CR assignment, and on a semi-automated approach for CR assignment which combines rule-based and machine learning techniques. In accordance with the results of the survey, the proposed approach emphasizes the use of contextual information, essential to effective assignments, and puts the development team in control of the assignment rules, toward making its adoption easier. The assignment rules can be either extracted from the assignment history or created from scratch. An empirical validation was performed through an offline experiment with CRs from a large software project. The results pointed out that the approach is up to 46.5% more accurate than other approaches which relying solely on machine learning techniques. This indicates that a rule-based approach is a viable and simple method to leverage CR assignments.

Introduction

Change Request (CR) are software artifacts that describe defects to be fixed or enhancements to be implemented in a software system (Cavalcanti et al., 2013a). CRs are managed with the support of a CR repository software, such as Bugzilla (Bugzilla, 2013) and Mantis (Mantis Bug Tracker, 2013). These repositories play a fundamental role in the software maintenance process, being a common place for communication and coordination among different stakeholders (Bertram et al., 2010). Indeed, the CR artifact is the primary unit of work in many software development projects (Anvik and Murphy, 2007).

The task of assigning a CR, also known as CR triage, consists of selecting the most suitable software developer to handle a given CR. Generally, such a developer is the one who has enough expertise to handle the issues reported in the CR (Aljarah et al., 2011). In addition, the assignment decision must take into account the developer’s workload, availability, and the CR priority, in order to obtain the lowest, economically feasible time to fix (Di Lucca, Di Penta, Gradara, 2002, Hosseini, Nguyen, Godfrey, 2012, Cavalcanti, Neto, Machado, de Almeida, de Lemos Meira, 2013). Thus, this task requires considerable knowledge of the project, and good communication skills to negotiate with the involved stakeholders (Cavalcanti et al., 2013c).

Assigning CRs to developers is both labor-intensive and time consuming, as it is usually regarded as a manual handling task (Anvik, Hiew, Murphy, 2006, Jeong, Kim, Zimmermann, 2009). Depending on the software project, the number of new CRs can vary from dozens to hundreds in a single day (Cavalcanti et al., 2013a). As a consequence, the greater the number of CRs that are opened, the more complex the problem becomes.

Several automated approaches have been proposed to overcome the problem of CR assignment by using machine learning techniques. Some of these approaches are based on the hypothesis that the most suitable developer for a new CR is the one who has already solved similar CRs in the past (Di Lucca, Di Penta, Gradara, 2002, Cubranic, Murphy, 2004, Anvik, Hiew, Murphy, 2006, Ahsan, Ferzund, Wotawa, 2009b, Jeong, Kim, Zimmermann, 2009, Lin, Shu, Yang, Hu, Wang, 2009, Rahman, Ruhe, Zimmermann, 2009). Other approaches consider that an appropriate developer can be found by looking at past CRs and data from version control systems (Canfora, Cerulo, 2006, Ahsan, Ferzund, Wotawa, 2009a, Matter, Kuhn, Nierstrasz, 2009, Kagdi, Gethers, Poshyvanyk, Hammad, 2012) or source code (Linares-Vásquez et al., 2012). In general, these approaches use machine learning techniques to automatically suggest a list of appropriate developers for a new incoming CR.

Despite the number of proposals, there is no empirical evidence about their applicability to real-world environments. To the best of our knowledge, most practitioners are still assigning CRs manually. Current approaches have not been adopted because of two main problems, as follows (Cavalcanti et al., 2014):

  • They were designed to be autonomous, so that the software analysts do not have the control of the approach; this is, they cannot modify the behavior of the approach. Without such control, in turn, the approach cannot be properly calibrated. As a consequence, if its performance is not satisfactory, it is simply discarded.

  • These approaches lack contextual information necessary to assign CRs properly. Software development companies might be highly dynamic, in terms of involved staff, e.g., developers move from project to project; developers can be hired/fired during project development; or they can even take a vacation or a day off. This dynamic influences the assignment of CRs. Thus, contextual information impacts the performance of automated approaches.

In this paper, we present a configurable approach developed to assign CRs which enables software analysts to control its behavior, as well as, it provides a mean to support contextual information necessary to perform effective assignments in dynamic environments. The approach relies on Rule-Based Expert System (RBES) and machine learning techniques.

The main ideas for this work come from our past three publications (Cavalcanti, da Mota Silveira Neto, Machado, Vale, de Almeida, de Lemos Meira, 2013, Cavalcanti, Neto, Machado, de Almeida, de Lemos Meira, 2013, Cavalcanti, Machado, da Mota Silveira Neto, de Almeida, de Lemos Meira, 2014). In Cavalcanti et al. (2014), our approach was introduced but with less details. Thus we added more information about the approach, such as its architecture, implementation, and machine learning techniques. From Cavalcanti et al. (2013c), which is a survey with software developers, we selected the specifics results that helped us to propose the semi-automated solution. Then, we used results from the work (Cavalcanti et al., 2013b), which is an extensive mapping study on CR repositories issues, to elaborate the related work specific to the topic of assigning CRs.

Besides putting together these work, we also provided an extended experimental study of the proposed approach. According to the experiment performed, which compared our approach against other solution based solely on machine learning algorithm, we observed that ours improved the accuracy of assignments by 46.5%.

The remainder of this paper is organized as follows: Section 2 provides some background on CR management; in Section 3 we present the questionnaire-based survey; Section 4 presents the proposed approach to semi-automate the assignment of CRs; Section 5 describes the empirical validation performed to evaluate the proposed approach; Section 6 describes related work; and Section 7 concludes this work.

Section snippets

Change request management

A CR is a software artifact that describes a defect to be fixed, an adaptive or perfective change, or a new functionality to be implemented in a software system (Cavalcanti et al., 2013a). They are managed with the support of specific software systems which we simply refer as CR repositories. Examples of such repositories are Bugzilla (Bugzilla, 2013), Mantis (Mantis Bug Tracker, 2013), RedMine (Redmine, 2013), and Trac (The Trac Project, 2013). The CR repositories play a fundamental role in

Understanding change request assignment

Although many automated approaches for CR assignment were proposed, there is a lack of research for investigating the characteristics of the activity itself. Such kind of investigation would be helpful towards driving more effective solutions, since the specific aspects of the task can be understood and properly handled. In this sense, this section presents a questionnaire-based survey with the objective of understanding the impact of CR assignment on software development (Cavalcanti et al.,

Semi-automated approach to change request assignment

According to the results presented in the previous section, we can conclude that CR assignment takes place in complex and highly dynamic environments. In these environments, analysts using an automated approach to assign CRs would need to intervene in such approach for changing its behavior in order to meet the new facts of the environment.

For instance, if a developer moved out from a project, then the CRs that would be assigned to him should be routed to some other developer, also respecting

Empirical validation

In order to validate the proposed approach, it is reasonable that we compare it in relation to other approaches proposed in the literature. Machine learning-based approaches have proved to be the best choice to assign CRs, especially when applying the SVM algorithm (Cavalcanti et al., 2013b). In this way, this experiment compared the proposed approach with a pure machine learning approach that uses the SVM algorithm.

The experiment was performed as an off-line experiment, in which we used

Related work

Most of the research found in the literature addresses the problem of CR assignment by using machine learning models and techniques with the objective of providing automated solutions. When applying machine learning models to the CR assignment problem, the content of an incoming CR is used to query the database of the CRs already fixed. Then, a list of potential developers to be assigned is retrieved from the CRs in the query results and suggested to the analyst. The analyst, in turn, selects

Conclusion and future work

CRs are key elements to software maintenance; however, assigning CRs to developers is expensive. In order to overcome this, researchers have proposed semi-automated approaches for CR assignment. Although they represent advances to the area, to the best of our knowledge they have not been adopted in practice. This is mainly because these approaches lack controlling mechanisms, for changing their behavior, and they do not consider contextual information that may influence CR assignment, such as

Yguaratã Cerqueira Cavalcanti received his Ph.D. degree in Computer Science from the Federal University of Pernambuco. He is experienced in Software Engineering, working mainly with development of reusable software and maintenance and evolution of software. In his research, he have applied techniques for mining software repositories in order to improve the practice of Software Engineering. He is a researcher in the Reuse in Software Engineering (RiSE) group and the National Institute for

References (53)

  • AhsanS.N. et al.

    Automatic classification of software change request using multi-label machine learning methods

    Proceedings of the 2009 33rd Annual IEEE Software Engineering Workshop (SEW’2009)

    (2009)
  • AhsanS.N. et al.

    Automatic software bug triage system (BTS) based on latent semantic indexing and support vector machine

    Proceedings of the 2009 Fourth International Conference on Software Engineering Advances (ICSEA’09)

    (2009)
  • AljarahI. et al.

    Selecting discriminating terms for bug assignment: a formal analysis

    Proceedings of the 7th International Conference on Predictive Models in Software Engineering

    (2011)
  • AnvikJ. et al.

    Who should fix this bug?

    Proceedings of the 28th International Conference on Software Engineering (ICSE’2006)

    (2006)
  • AnvikJ. et al.

    Determining implementation expertise from bug reports

    Proceedings of the Fourth International Workshop on Mining Software Repositories (MSR’2007)

    (2007)
  • AnvikJ. et al.

    Reducing the effort of bug report triage: Recommenders for development-oriented decisions

    ACM Trans. Softw. Eng. Methodol.

    (2011)
  • Apache, 2013....
  • Baeza-YatesR.A. et al.

    Modern Information Retrieval

    (1999)
  • BasiliV. et al.

    Experimentation in software engineering

    IEEE Trans. Softw. Eng.

    (1986)
  • BertramD. et al.

    Communication, collaboration, and bugs: the social nature of issue tracking in small, collocated teams

    Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work (CSCW’2010)

    (2010)
  • BettenburgN. et al.

    Duplicate bug reports considered harmful... Really?

    Proceedings of the 24th IEEE International Conference on Software Maintenance (ICSM’2008)

    (2008)
  • BhattacharyaP. et al.

    Automated, highly-accurate, bug assignment using machine learning and tossing graphs

    J. Syst. Softw

    (2012)
  • BirdC. et al.

    Fair and balanced? bias in bug-fix datasets

    Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE’2009)

    (2009)
  • Bugzilla, 2013. URL:...
  • BystrK. et al.

    Task complexityaffects information task complexity affects information

    Inf. Process. Manag.

    (1995)
  • CaglayanB. et al.

    Issue ownership activity in two large software projects

    ACM SIGSOFT Softw. Eng. Notes

    (2012)
  • CanforaG. et al.

    A taxonomy of information retrieval models and tools

    Comput. Inf. Technol.

    (2004)
  • CanforaG. et al.

    Supporting change request assignment in open source development

    Proceedings of the ACM Symposium on Applied Computing (SAC’2006)

    (2006)
  • CasavantT.L. et al.

    A taxonomy of scheduling in general-purpose distributed computing systems

    IEEE Trans. Softw. Eng.

    (1988)
  • CavalcantiY.C. et al.

    Combining Rule-based and Information Retrieval Techniques to assign Software Change Requests

    Proceedings of The 29th IEEE/ACM International Conference on Automated Software Engineering (ASE’2014)

    (2014)
  • CavalcantiY.C. et al.

    The bug report duplication problem: an exploratory study

    Softw. Qual. J.

    (2013)
  • CavalcantiY.C. et al.

    Challenges and opportunities for software change request repositories: a systematic mapping study

    J. Softw.: Evolut. Process

    (2013)
  • CavalcantiY.C. et al.

    Towards understanding software change request assignment: a survey with practitioners

    Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering (EASE’2013)

    (2013)
  • ChenL. et al.

    An approach to improving bug assignment with bug tossing graphs and bug similarities

    J. Softw.

    (2011)
  • CrowstonK. et al.

    Coordination practices within floss development teams the bug fixing process

    Proceedings of the 1st International Workshop on Computer Supported Activity Coordination (CSAC’2004)

    (2004)
  • CubranicD. et al.

    Automatic bug triage using text categorization

    Proceedings of the 16th International Conference on Software Engineering & Knowl. Engineering (SEKE’2004)

    (2004)
  • Cited by (14)

    • A scheduling-driven approach to efficiently assign bug fixing tasks to developers

      2021, Journal of Systems and Software
      Citation Excerpt :

      This process improves the time and the accuracy of bug triaging. Cavalcanti et al. use contextual information to provide a rule-based system to improve the assignment of change requests to developers (Cavalcanti et al., 2016). The authors address the dynamicity in the developer team structure as a main concern that needs to be taken into consideration when assigning tasks.

    • Automatic assignment of integrators to pull requests: The importance of selecting appropriate attributes

      2018, Journal of Systems and Software
      Citation Excerpt :

      That way, in projects that receive a large number of issues (e.g., Mozilla and Eclipse), there are many open issues that need to be solved but do not have assigned developer. Thus, allocating this backlog for developers makes bug triage a complex activity Cavalcanti et al. (2014, 2016). The pull-based development differs from the issue-based model because the pull request contains source code, providing information like the number of lines that have been added/removed, the names of the files changed, their locations, the number of commits in the pull request, and the content of the changes.

    • Survey on User Feature Requests Analysis and Processing

      2023, Ruan Jian Xue Bao/Journal of Software
    • Extracting Software Change Requests from Mobile App Reviews

      2021, Proceedings - 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops, ASEW 2021
    View all citing articles on Scopus

    Yguaratã Cerqueira Cavalcanti received his Ph.D. degree in Computer Science from the Federal University of Pernambuco. He is experienced in Software Engineering, working mainly with development of reusable software and maintenance and evolution of software. In his research, he have applied techniques for mining software repositories in order to improve the practice of Software Engineering. He is a researcher in the Reuse in Software Engineering (RiSE) group and the National Institute for Software Engineering in Brazil (INES). Currently he is working as software engineer in the Brazilian Federal Service for Data Processing (SERPRO), where he tries to align research and practice.

    Ivan do Carmo Machado is a post-doctoral associate at the Computer Science Department at the Federal University of Bahia, Brazil. He received a Ph.D. in Computer Science from the Federal University of Bahia in 2014. His research interests include empirical and evidence-based software engineering, software product lines, software testing, and software architecture. He is a member of the ACM and the Brazilian Computer Society.

    Paulo Anselmo da Mota Silveira Neto has a Bachelor of Computer Science degree from Catholic University of Pernambuco (UNICAP), Specialist in Software Engineering from University of Pernambuco (UPE), Master of Science degree in Computer Science (Software Engineering) from Federal University of Pernambuco (UFPE). Nowadays, he is a Ph.D. candidate in Computer Science at Federal University of Pernambuco and member of the Reuse in Software Engineering (RiSE) Group, which has executed research regarding to Software Product Lines (SPL) Testing, SPL Architecture Evaluation, Test Selection Techniques and Regression Testing. He is also participating on important research projects in Software Engineering area, as the National Institute of Science and Technology for Software Engineering (I.N.E.S.).

    Eduardo Santana de Almeida is an assistant professor at Federal University of Bahia and head of the Reuse in Software Engineering (RiSE) Labs. He has more than 200 papers published in the main conferences and journals related to Software Engineering and has chaired several national and international conferences and workshops. His research areas include: methods, processes, tools and metrics to develop reusable software. Contact him at [email protected].

    View full text