Comparing manual and automated feature location in conceptual models: A Controlled experiment

https://doi.org/10.1016/j.infsof.2020.106337Get rights and content

Abstract

Context

Maintenance activities cannot be completed without locating the set of software artifacts that realize a particular feature of a software system. Manual Feature Location (FL) is widely used in industry, but it becomes challenging (time-consuming and error prone) in large software repositories. To reduce manual efforts, automated FL techniques have been proposed. Research efforts in FL tend to make comparisons between automated FL techniques, ignoring manual FL techniques. Moreover, existing research puts the focus on code, neglecting other artifacts such as models.

Objective

This paper aims to compare manual FL against automated FL in models to answer important questions about performance, productivity, and satisfaction of both treatments.

Method

We run an experiment for comparing manual and automated FL on a set of 18 subjects (5 experts and 13 non-experts) in the domain of our industrial partner, BSH, manufacturer of induction hobs for more than 15 years. We measure performance (recall, precision, and F-measure), productivity (ratio between F-measure and spent time), and satisfaction (perceived ease of use, perceived usefulness, and intention to use) of both treatments, and perform statistical tests to assess whether the obtained differences are significant.

Results

Regarding performance, manual FL significantly outperforms automated FL in precision and F-measure (up to 27.79% and 19.05%, respectively), whereas automated FL significantly outperforms manual FL in recall (up to 32.18%). Regarding productivity, manual FL obtains 3.43%/min, which improves automated FL significantly. Finally, there are no significant differences in satisfaction for both treatments.

Conclusions

The findings of our work can be leveraged to advance research to improve the results of manual and automated FL techniques. For instance, automated FL in industry faces issues such as low discrimination capacity. In addition, the obtained satisfaction results have implications for the usage and possible combination of manual, automated, and guided FL techniques.

Introduction

Feature Location (FL) has been recognized as one of the most common activities undertaken by software developers [1]. During maintenance activities, developers need to identify where and how a feature (i.e., particular functionality) is realized in software artifacts in order to fix bugs, introduce new features, and adapt or enhance existing features.

FL is considered as an important support activity during development, management, and maintenance of software, since it is helpful for a number of software tasks such as feature coverage, software reuse, program comprehension, or impact analysis. This kind of tasks are considered as a good practice by numerous major software standards such as CMMI or ISO 15504 [2], and can be critical to the success of a project [3], since they lead to increased maintainability and reliability of complex software systems [4] and decrease the expected defect rate in developed software [5].

Furthermore, as reflected in a recent survey [6], FL is gaining momentum in the research community since it helps initiate Software Product Lines (SPLs) from already existing software systems. SPLs enable a systematic reuse of variants to tailor different products. Savings of $584 million in development costs, a 2x-4x reduction in time to market, or a reduction in maintenance costs of around 60% are among the documented real-world examples of the benefits of SPLs [7]. Hence, there is a need to adopt SPLs in companies that deal with other complex software systems such as automotive, cyber-physical and robotics [8]. To adopt a SPL, the located features are used to formalize the commonalities and variabilities across the product family. To do the formalization, feature modeling [9] can be used. In spite of the utility of FL, manual FL is a challenging activity in complex and large repositories of software artifacts that have been developed over several years by different developers [2], [10], [11]. In this context, FL activities become time-consuming and error prone [12], [13], [14], [15].

In order to reduce the effort of developers during manual FL, researchers have presented several techniques that provide automated assistance to locate features. A compendium of the most well-known techniques can be found within the survey by Julia Rubin and Marsha Chechik [16]. In the survey, the techniques are classified into static or dynamic techniques (depending on whether they involve program execution information or not) and sub-classified into plain or guided techniques (depending on whether they produce an output automatically or semi-automatically with user guidance). Most of the techniques focus on FL in source code, and rely on Information Retrieval (IR) techniques to locate the features [1], [17], [18]. There are many IR techniques, but most of the research efforts show better results when applying Latent Semantic Indexing (LSI) [1], [19], [20].

Despite the importance of FL and the existence of techniques for automated assistance, research efforts tend to make comparisons between automated techniques, without comparing them against manual FL. In addition, automated techniques are focused on code, neglecting other software artifacts such as models (which have proved to increase efficiency and effectiveness in software development [21]). Thus, several important questions remain unanswered with regard to the differences in performance, productivity, and satisfaction when the manual and automated FL treatments are used to locate features in models.

To answer these questions, we conducted an experiment to compare manual FL against automated FL in models. Specifically, we recruited 18 subjects (5 experts and 13 non-experts) in the domain of our industrial partner, BSH, who has manufactured induction hobs (under the Siemens and Bosch brands, among others) for more than 15 years. For the manual FL treatment, the subjects manually located the model elements that realize a set of features using the name of each feature and models as the search space. For the automated FL treatment, we used an algorithm that leverages LSI to obtain the model elements that realize a feature description, provided by the subjects.

The experiment was conducted in terms of performance (recall, precision, and F-measure), productivity (ratio between F-measure and spent time), and satisfaction (perceived ease of use, perceived usefulness, and intention to use). Manual FL obtains average values of 44.42% recall, 42.36% precision, 41.49% F-measure, 3.43%/min productivity, 3.42 Perceived Ease of Use (PEOU), 3.47 Perceived Usefulness (PU), and 3.22 Intention to Use (ITU). Automated FL obtains average values of 76.60% recall, 14.57% precision, 22.44% F-measure, 1.21%/min productivity, 3.42 PEOU, 3.56 PU, and 3.33 ITU. After the experiment, we analyzed the results of the manual and automated FL treatments by means of a statistical analysis to find out whether significant differences exist between both. The analysis determines that the differences in performance and productivity are statistically significant, while revealing that differences in satisfaction are so minimal as to be of no practical statistical significance.

The results of our work suggest that (1) neither domain experts nor domain non-experts find the perfect solutions for the features, (2) manual FL outperforms automated FL, and (3) satisfaction results are very similar for both manual and automated FL. These issues and their causes have a number of readings and implications that can be leveraged to either improve the results for manual and automated FL (by, for instance, pairing software engineers or designing complementary artefacts for automated approaches), or to design further experiments that tackle novel research questions that arise from this work. Overall, the contributions of the paper can be summarized as follows:

  • We propose an experiment for comparing manual and automated FL in models.

  • We show that neither domain experts nor domain non-experts find the perfect solutions when locating features.

  • We show that the use of the automated FL yields worse results than those that are manually obtained. This is a novel point that our work uncovers since research efforts have so far compared assistance tools against assistance tools in order to improve their results, instead of comparing them against humans.

  • Our analysis suggests how to advance research on FL to lead to an improvement of the results for assistance tools. In addition, the findings of our work present implications for the usage and possible combination of manual, automated, and guided FL techniques.

The rest of the paper is structured as follows: Section 2 provides the necessary background in FL in models, manual FL, and automated FL. Section 3 describes the design of the experiment. Section 4 presents the results and their statistical analysis. Section 5 discusses the outcomes of our work. Section 6 deals with the threats to the validity of our work. Section 7 reviews the related work. Finally, Section 8 concludes the paper.

Section snippets

Feature location in models

Feature Location (FL) is one of the most important and common activities performed by developers during software maintenance and evolution [22]. FL is the process of finding the set of software artifacts that realize a specific feature. FL can be performed either manually or in an automated fashion. Manual FL is a common practice but it can become error prone and time-consuming [8], [10], [12], [13], [14], [15], [23], so automated FL has received much attention during recent years [6], [16],

Objective

The experiment for comparing manual FL with automated FL in models was designed following the Wohlin et al. guidelines [35]. The goal of our experiment was to analyze FL in models, for the purpose of filling in the gap in empirical evaluation on this topic, with respect to the different FL treatments, from the viewpoint of both experts and non-experts in a domain, in the context of software development for induction hobs.

The measures used in our research to achieve the determined goal are

Results

The findings of our work for each of the research questions under study can be summarized as follows:

  • RQ1: While automated FL obtains better recall values than manual FL, manual FL outperforms automated FL overall. Differences in the results are statistically significant.

  • RQ2: Manual FL obtains better productivity results than automated FL. Again, differences in the results are statistically significant.

  • RQ3: Automated FL obtains generally better results than manual FL regarding satisfaction.

Discussion

Table 3 provides an overview of the main discussion issues that arise from the results of our work. The rest of the section provides more details on each of the points outlined in the table.

The results of our work suggest that neither domain experts nor domain non-experts find the perfect solutions for the features. In addition, the performance results show indicates that while automated FL outperforms manual FL up to 32.18% in recall, manual FL outperforms automated FL up to 27.79% in

Threats to validity

To describe the threats of validity of our work, we use the classification of [54], which distinguishes four aspects of validity (construct validity, internal validity, external validity, and reliability).

Construct validity reflects the extent to which the operational measures that are studied represent what the researchers have in mind and what is investigated based on the research questions. There are six threats of this kind: author bias, task design, mono-method bias, hypothesis guessing,

Related work

Some works research how developers locate features. The work presented in [59] reports an exploratory study of FL, consisting of three experiments with six FL exercises. The study evaluates the quality of FL and the impact of explicit FL knowledge, also proposing a conceptual framework for understanding FL processes. In [60], the authors present an exploratory case study on identifying and manually locating features in Marlin, a variant-rich open-source embedded firmware. Another work [20]

Conclusions

Feature Location (FL) is one of the most frequent activities in software development, particularly during maintenance activities. However, in industrial environments, software artifacts are developed over long periods of time by different software engineers, resulting in complex and large repositories, and thus FL becomes a challenging, time-consuming activity that does not guarantee good results. To tackle this issue, researches have proposed automated Information Retrieval techniques such as

CRediT authorship contribution statement

Francisca Pérez: Conceptualization, Methodology, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Supervision. Jorge Echeverría: Methodology, Formal analysis, Investigation, Resources, Data curation, Visualization. Raúl Lapeña: Software, Validation, Data curation, Writing - original draft, Writing - review & editing. Carlos Cetina: Conceptualization, Investigation, Resources, Writing - review & editing, Supervision, Project

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) through the Spanish National R+D+i Plan and ERDF funds under the Project ALPS (RTI2018-096411-B-I00).

References (68)

  • A. Ghazarian

    A research agenda for software reliability

    IEEE Reliab. Soc. 2009 Ann. Technol. Rep.

    (2010)
  • P. Rempel et al.

    Preventing defects: the impact of requirements traceability completeness on software quality

    IEEE Trans. Softw. Eng.

    (2017)
  • W.K.G. Assunção et al.

    Reengineering legacy applications into software product lines: a systematic mapping

    Empir. Softw. Eng.

    (2017)
  • Key Benefits: Why Product Line Engineering?, 2020,...
  • T. Berger et al.

    The state of adoption and the challenges of systematic variability management in industry

    Empir. Softw. Eng.

    (2019)
  • K.C. Kang et al.

    Feature-Oriented Domain Analysis (FODA) feasibility study

    Technical Report

    (1990)
  • J. Krüger et al.

    Features and how to find them: A survey of manual feature location

    Software Engineering for Variability Intensive Systems - Foundations and Applications

    (2019)
  • Y. Zhang et al.

    Ontological approach for the semantic recovery of traceability links between software artefacts

    IET Softw.

    (2008)
  • C. et al.

    Map - mining architectures for product line evaluations

    Proceedings Working IEEE/IFIP Conference on Software Architecture

    (2001)
  • P. Grnbacher et al.

    Model-based customization and deployment of eclipse-based tools: industrial experiences

    2009 IEEE/ACM International Conference on Automated Software Engineering

    (2009)
  • J.H. Hayes et al.

    Advancing candidate link generation for requirements tracing: the study of methods

    IEEE Trans. Softw. Eng.

    (2006)
  • A.J. Ko et al.

    An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks

    IEEE Trans. Softw. Eng.

    (2006)
  • J. Rubin et al.

    A survey of feature location techniques

    Domain Engineering

    (2013)
  • A. Marcus et al.

    Recovering documentation-to-source-code traceability links using latent semantic indexing

    Proceedings of the 25th International Conference on Software Engineering

    (2003)
  • W. Zhao et al.

    Sniafl: towards a static non-interactive approach to feature location

    Proceedings. 26th International Conference on Software Engineering

    (2004)
  • M. Revelle et al.

    Using data fusion and web mining to support feature location in software

    IEEE 18th International Conference on Program Comprehension (ICPC)

    (2010)
  • D. Liu et al.

    Feature location via information retrieval based filtering of a single scenario execution trace

    Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering

    (2007)
  • M. Brambilla et al.

    Model-driven software engineering in practice

    Synthesis Lect. Softw. Eng.

    (2012)
  • B. Dit et al.

    Feature location in source code: a taxonomy and survey

    J. Softw.

    (2013)
  • Extractive software product line adoption catalog, 2020,...
  • J. Font et al.

    Feature location in models through a genetic algorithm driven by information retrieval techniques

    Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems

    (2016)
  • J. Martinez et al.

    Automating the extraction of model-based software product lines from model variants (t)

    2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)

    (2015)
  • A. van Deursen et al.

    Domain-specific languages: an annotated bibliography

    SIGPLAN Not.

    (2000)
  • S. Winkler et al.

    A survey of traceability in requirements engineering and model-Driven development

    Softw. Syst. Model. (SoSyM)

    (2010)
  • Cited by (6)

    • Evaluating the influence of scope on feature location

      2021, Information and Software Technology
      Citation Excerpt :

      These facts suggest that the tasks are non-trivial. Furthermore, the domain used in this experiment has already been used in previous works [24][40]. The domain threat appears since we only analyzed the induction hob domain.

    • FeatRacer: Locating Features Through Assisted Traceability

      2023, IEEE Transactions on Software Engineering
    • HAnS: IDE-based editing support for embedded feature annotations

      2021, ACM International Conference Proceeding Series
    View full text