Keywords

1 Introduction

The problems of identification and selection the right software components out of a range of choices to satisfy a set of requirements have received considerable attention in the field of component-based software engineering during the last two decades [4, 10].

Identification of a software architecture for a given system may be achieve in two ways: (1) Component Identification [7] and (2) Component Selection [9]. Component Identification has the scope to partition functionalities of a given system into non-intersecting logical components to provide the starting points for designing the architecture. The aim of Component Selection methods is to find suitable components from repository to satisfy a set of requirements under various constraints/criteria (i.e. cost, number of used components, etc.). This paper has focused on the component selection process, the goal being to provide the suitable existing components matching software requirements.

After the deployment phase, the maintenance phase requires more attention and the software architects need assistance in the decisions of the frequent changes of the software system, either for adding new requirements or for removing some of the requirements. So, the other perspective concerning component configurations refers to updating/adding/removing one or many requirements from an already constructed system. This represents the reconfiguration problem [15], transforming the structural view of a component system, changing the system’s functionality [13].

The major contribution to this paper is the combination of the two perspectives: the multilevel configuration [12] of the component selection problem combined with the dynamical changing requirements, i.e. updating/adding/removing requirements (or components) from an already constructed system [13]. Another contribution contained in this paper is the consideration of a non-functional requirement, the cost of a component, and therefore the cost of the entire solution.The configuration problem considers the multilayer view with additional cost objective. The reconfiguration problem considers the following dynamics: system requirements change over time and the component repository varies over time.

The paper is organized as follows: Sect. 2 contains configuration and reconfiguration description problems, and presents the used component model. The optimisation process of the dynamic multilevel component selection problem is described in Sect. 3. In Sect. 4 we apply to a real world case study our approach to validate it. Some experiments are performed considering two dynamics: requirements changes over time and component repository varies over time. Section 5 introduces the current state of art regarding the component selection problem and analysis the differences compared with our present approach. We conclude our paper and discuss future work in Sect. 6.

2 Background: Configuration/Reconfiguration Problem and Component Model

To provide a discussion context for the dynamic multilevel component selection process, we first describe the configuration/reconfiguration problems and then the assumptions about components and their compositions, i.e. the used component model.

2.1 Component Systems, Configurations and Reconfigurations

A component is an independent software package that provides functionality via defined interfaces. The interface may be an export interface through which a component provides functionality to other components or an import interface through which a component gains services from other components.

A configuration [15] of a component system is described as the structural relationship between components, indicated by the layout of components and connectors. A reconfiguration is to modify the structure of a component system in terms of additions, deletion, and replacement of components and/or connectors.

2.2 Component Model

A graphical representation of our view of components is given in Fig. 1. There are two type of components: simple component - is specified by the inports (the set of input variables/parameters), outports (the set of output variables/parameters) and a function (the computation function of the component) and compound component - is a group of connected components in which the output of a component is used as input by another component from the group.

Fig. 1.
figure 1

Components graphical representation and components assembly construction reasoning

In Fig. 1 we have designed the compound component by fill in the box. We have also presented the inner side of the compound component: the constituents components and the interactions between them. For details about the component model please refer to [12].

3 Dynamic Multilevel Component Selection Optimisation Process

To present our optimisation approach, we first give an overview.

Our approach starts by considering a set of components (repository) available for selection and the specification of a final system (input and output). The optimisation process begins with the Dynamic Multilevel Component Selection Problem Formulation (see Fig. 2 for details). The result of this step is the transformation of the final system specification as the set of required interfaces (and the set of provided interfaces). In the second step, the construction of the multilevel configurations is done by applying the evolutionary optimisation algorithm (from the fourth step, see Fig. 2) for each time steps (from the Dynamic Changing Requirements or Dynamic Changing Components step). The evolutionary optimisation algorithm is applied for each time steps (i.e. if there are still changing requirements or components) and for each compound component from each level. The solution with best fitness value is selected at each level. The fifth step presents the results.

Fig. 2.
figure 2

Dynamic multilevel component selection optimisation process

3.1 Dynamic Multilevel Component Selection Problem Formulation

A formal definition of the configuration problem [12] (seen as a compound component) is as follows. Consider SR the set of final system requirements (the provided functionalities of the final compound component) as \(SR=\{r_{1}, r_{2}, . . . , r_{n}\}\) and SC the set of components (repository) available for selection as \(SC=\{c_{1}, c_{2}, . . . , c_{m}\}.\) Each component \(c_{i}\) can satisfy a subset of the requirements from SR (the provided functionalities) denoted \(SP_{c_{i}}=\{ p_{i_{1}}, p_{i_{2}}, . . ., p_{i_{k}}\}\) and has a set of requirements denoted \(SR_{c_{i}}=\{ r_{i_{1}}, r_{i_{2}}, . . ., r_{i_{h}}\}\). The goal is to find a set of components Sol in such a way that every requirement \(r_{j}\) (\(j=\overline{1,n}\)) from the set SR can be assigned a component \(c_{i}\) from Sol where \(r_{j}\) is in \(SP_{c_{i}}\) (\(i=\overline{1,m}\)), while minimizing the number of used components and the total cost of assembly. All the requirements of the selected components must be satisfied by the components in the solution. If a selected component is a compound component, the internal structure is also provided. All the levels of the system are constructed.

The reconfiguration problem [15] is define similar to the configuration problem but considering the dynamical changes of either requirements or component. Regarding the reconfiguration problem [13], the dynamics of the component selection problem can be viewed in two ways:

  1. 1.

    The system requirements change over time. The operations allowed to take place in this dynamic situation are:

    1. (a)

      new requirements are introduced, in addition to the existing ones;

    2. (b)

      some of the requirements are removed;

    3. (c)

      a combination of the two above: some of the requirements are removed while new ones are added.

  2. 2.

    The repository containing the components varies over time. A set of possible components is initially considered. The operations allowed are similar to the ones above, i.e. adding new components, or removing existing ones, or a combination of adding and removing components. At each time step some new components may be available, either with cost lower or higher than the existing ones or with more or less number of provided interfaces (required by the system under development).

3.2 Multilevel Configurations

The second step from the dynamic multilevel component selection process consists in the construction of the multilevel configurations of the system. Components are themselves compositions of components. This give rise to the idea of composition levels. In other words, in an hierarchical system, a subsystem of higher level components can be the infrastructure of a single component at a lower level [12].

3.3 Evolutionary Optimisation

The approach presented in this paper uses principles of evolutionary computation and multiobjective optimization [5]. First, the problem is formulated as a multiple objective optimization problem having 5 objectives, The percentage importance of each objective to the fitness functions are: \(30\,\%\) number of distinct used components, \(30\,\%\) number of new requirements, \(5\,\%\) number of provided interfaces, \(5\,\%\) number of initial requirements that are not in solution, and \(30\,\%\) cost value. We have selected these percentages because of their impact in finding the final solution. There are several ways to deal with a multiobjective optimization problem. In this paper the Pareto dominance principle is used.

Solution Representation. The current solution representation was used in [13] paper. A solution (chromosome) is represented as a 5-tuple \((lstProv, lstComp,\ lstInitReq,\ lstNewReq,\ cost)\) with the following information: list of provided interfaces (lstProv); list of components (lstComp); list of initial requirements (lstInitReq); list of new requirements (lstNewReq); cost (sum of the cost of each component in the chromosome). The value of \(i-th\) component represents the component satisfying the \(i-th\) provided interface from the list of provided interfaces. An example is given in what follows.

A valid chromosome may be structured as follows:

\(Crom_{0}=\ ((3,\ 4),\ (12,\ 24),\ (1,\ 2),\ (5,\ 7,\ 8,\ 11,\ 33,\ 30),(67)).\) This chromosome does not represent a solution, it is only an initialized chromosome without any applied genetic operator. The provided interfaces \((3,\ 4)\) are offered by the components \((12,\ 24)\). The set of initial requirements are: \((1,\ 2).\) By using a component we need to provide it’s requirements: component 12 requires the \((5,\ 7,\ 8,\ 11)\) new requirements and component 24 requires the \((33,\ 30)\) new requirements.

Genetic Operator. Because the current paper uses the same genetic algorithm as in [13] paper, the mutation operator keeps the computation method. There are two types of mutations that can be applied to a chromosome, depending of the chromosome “status”: the chromosome still has new requirements to satisfy or the chromosome representations does not have any other new requirements to be satisfied. See details in [13].

3.4 Evaluation

When comparing [6] two algorithms, the best fitness values obtained by the searches concerned are an obvious indicator to how well the optimisation process performed. Inferential statistics may be applied to discern whether one set of experiments are significantly different in some aspect from another. Usually we wish to be in a position to make a claim that we have evidence that suggests that Algorithm A (Genetic Algorithm) is better than Algorithm B (Random Search). The Wilcoxon signed ranks test [3] is used for answering the following question: do two samples represent two different populations? It is a nonparametric procedure employed in hypothesis testing situations, involving a design with two samples. It is a pairwise test that aims to detect significant differences between two sample means, that is, the behavior of two algorithms. The best fitness value (from the entire population) was used for comparing the two algorithms.

The Wilcoxon signed ranks test has two hypothesis:

  1. 1.

    Null hypothesis \(H_{0}\): The median difference is zero versus.

  2. 2.

    Research hypothesis \(H_{1}\): The median difference is not zero, \(\alpha =0.05\).

Steps of the Wilcoxon signed ranks test: compute \(W_{-}\) and \(W_{+}\); check if \(W_{-}+W_{+}\) = n(n+1)/2; select the test statistic (for the two tailed test the test statistic is the smaller of \(W_{-}\) and \(W_{+}\)); we must determine whether the observed test statistic \(W_{t}\) supports the \(H_{0}\) or \(H_{1}\), i.e. we determine a critical value of \(W_{c}\) such that if the observed value of \(W_{t}\) is less or equal to critical value \(W_{c}\), we reject \(H_{0}\) in favor to \(H_{1}\).

Due to stochastic nature of optimisation algorithms, searches must be repeated several times in order to mitigate against the effect of random variation. How many runs do we need when we analyze and compare algorithms? In many fields of science (i.e. medicine and behaviour science) a common rule of thumb [1] is to use at least \(n=30\) observations. We have also used in our evaluation 30 executions for each algorithm.

Our Research Question: How and Why do Search-based Algorithms (in our case a Genetic Algorithm and a Random Search Algorithm) provide different results for the Dynamic Multilevel Component Selection Problem?

4 Reservation System Case Study

To better illustrate the components selection optimisation approach proposed in this paper, a real case study for building a Reservation System is developed. The system allows booking several types of items (hotel, car, ... etc.), by different types of customers, having thus different types of offers. A possible (first level) architecture (created by a software architect) of the system may be as follows. Four modules that define the business logic of this system are identified: Offer Module (provides transactions on making a reservation and getting a notification), LoyaltyPrg Module (responsible for the loyalty program for old clients), ReservationType Module (managing different types of booking offers) and Customer Module (provides information about customers). Two of the four modules mentioned above, LoyaltyPrg Module and Offer Module, are described at level 1 as compound components which are further decomposed at next levels whereas, the modules ReservationType and Customer Module are simple components and remain unchanged over modules decomposition. The components and the structure of (one) solution may be found at [14].

4.1 Component Selection Problem Formulation

Having specified two input data (customerData, calendarData) and two output data (doneReservation, requestConfirmation) needed to be computed, and having a set of 126 available components, the goal is to find a subset of the given components such that all the requirements are satisfied considering the optimisation criteria specified above. The set of requirements \(SR=\{ r_{3}, r_{4} \}\) (view as provided interfaces \(\{ p_{3}, p_{4} \}\)) and the set of components \(SC=\{ c_{0}, c_{1}, c_{2}, c_{3}, c_{4}, c_{5}, c_{6},..., c_{126} \}\) are given. The final system has as input data (transformed in required interfaces) the set \(\{ r_{1}, r_{2}\}\).

Remark. Due to lack of space the component repository is not described in this paper but may be found at [14]. There are many components that may provide the same functionality with different requirements interfaces. The components from the repository system have been numbered for better management and utilization of the algorithm.

4.2 Experimental Studies - Case 1: Dynamic Changing Requirements

We consider two types of dynamics and, consequently two experiments corresponding to each of them: the requirements of the problem change over time, and the components available at a certain time step change.

The algorithm was run 100 times and the number of nondominated solutions and the number of distinct nondominated solutions were recorded for all situations. Also, the cost and the number of distinct used components in a solution were logged. Also, the best, worse and average fitness values were recorded for all situations.

In order to analyze the behavior of the algorithm, we performed a few tests. Their role is to see if the number of iterations and the population size play a role in finding the Pareto solutions. For each time step we report the number of non-dominated solution in the final population and the number of distinct solutions (some of them will have multiple copies and we consider in the end the singular solutions). We use the average value for both number of nondominated solutions and the number of distinct nondominated solutions (over 100 runs).

The final system requirements change from one step to another. There are a few possible scenarios: adding new requirements to the ones at the previous step (the stakeholder needs some new requirements to be added, for example, in the considered case study a requirement related to special offers due to certain holidays is requested by the stakeholder to be implemented); removing some of the requirements from the previous step (the development team found out that some requirements were not correctly specified first or even not needed, for example in the considered case study, it may be the case that the developers considered individual reservation as a distinct requirement but this can be reduced to group reservation as a particular case); a combination of the previous two: adding new requirements and removing some of the ones at the previous step (in this situation we always ensure that the added components are different from the removed ones). We do not treat each of these situations in particular due to the fact that we did not observe a particular behavior for a particular situation. It appears that the complexity is same no matter the sort of dynamics involved in this case. Four different time steps are built using artificially generated data and the dynamics at each of these steps are: T=1 (The initial requirements), T=2 (Add one new requirement), T=3 (Remove one requirement and add one new requirement), T=4 (Add one new requirement).

Remark. It is worth noticing that we have multiple time steps only for the first level of the final system. The next levels just construct the compound components from the first level and modifications (either removing or adding new requirements at these levels) will result in construction components not compatible with the previous levels. Also, the repository containing the components is unchanged for the entire duration of the algorithm and time steps.

Performed Tests. Some remarks can be derived from the experiments performed. We are interested in finding as many nondominated solutions as possible, but, on the other had we look for diversity as well and we wish to have a large number of distinct solutions among the nondominated ones.

Multilevel Configurations. Until now we have obtained the final system but it still has some compound components, that means we need to construct them as well by applying the same algorithm but with different requirements and input data. The best obtained solution from Level 1 (time step 4) has the fitness value 5.40 (6 provides, 6 components, and cost 11, \(L1=\{60,\ 14c,\ 11,\ 12,\ 38,\ 53\}\)). This ”best“ solution is from the set of final nondominated solutions from Level 1. This solution has a compound component with id 14. This compound component forms the second level of the final system. The solution with the best fitness value 3.55 (from level 2) has 4 distinct components, 5 providers and cost 7: \(L2=\{67,\ 71c,\ 82,\ 79\}\). This solution has a compound component, id 71. This compound component forms the third level of the final system. The solution with the best fitness value 4.15 has 4 distinct components, 5 providers and cost 9: \(L3=\{91,\ 97,\ 88,\ 104\}\) has no compound component.

Wilcoxon Statistical Test. In Sect. 3.4 we have described in details the Wilcoxon statistical test that we have use to compare our Genetic Algorithm with the Random Search Algorithm. In Table 1 we have the test results for the Case Study 1 - Dynamic Changing Requirements. The Wilcoxon statistical test shows that we have statistically significant evidence at \(\alpha =0.05\) to show that the median is positive, i.e. the \(H_{0}\) Null-Hypothesis is rejected in favor of \(H_{1}\) for all levels and for all time steps.

Table 1. Wilcoxon statistical test - changing requirements experiment.

4.3 Experimental Studies - Case 2: Dynamic Changing Components

In this case, the repository containing components changes over time. This modification of the available components may be seen as an update of the COTS market, new components being available or other being withdrawn from the market.

Five different time steps are built using artificially generated data and the dynamics at each of these steps are: T=1 (The initial components), T=2 (Add two new components), T=3 (Remove one component), T=4 (Remove one component and add one new component), and T=5 (Add three new components).

It is worth noticing that the requirements are unchanged for the entire duration of the algorithm and time steps. But we have multiple time steps for all levels for the dynamic changing components case, unlike the changing requirements case. This make sense because we only change the component repository and not the requirements of the compound components from the previous levels.

Performed Tests. In order to analyze the behavior of the algorithm, we performed a few tests. Their role is to see if the number of iterations and the population size play a role in finding the Pareto solutions. Next, a discussion about the obtained solutions on Level 1 (only for the last time step due to page limitation) and the influence of changing the available components for the obtained solutions follows. The best obtained solution from Level 1 (time step 5) has the fitness value 4.45 (5 provides, 5 components, and cost 9, \(L1=\{64,\ 65,\ 53,\ 57,\ 67c\}\)). The new components that were added to the repository at this time step improved the final solution from the structure perspective, that means a solution with a less number of components (and provides) was discovered than the solutions obtained at the previous time steps.

Table 2. Wilcoxon statistical test - changing components experiment.

Multilevel Configurations. Until now we have obtained the final system but it still has some compound components, that means we need to construct them as well by applying the same algorithm but with different requirements and input data. The obtained solution from Level 1, time step 5 has a compound component, id 67. This compound component forms the second level of the final system. For the second level we have three time steps. The modifications for each step are presented next: T = 1 (No modifications of the component repository), T = 2 (Adding two new components), and T = 3 (Adding three new components and removing an old component). The best obtained solution from Level 2 (time step 3) has the fitness value 4.4 (4 provides, 3 components, and cost 11, \(L2=\{75,\ 85,\ 79c\}\)). The new components that were added to the repository at this time step improved the final solution from the structure perspective but not from the cost perspective. For example, the solution \(L2=\{84,\ 83\}\) has a less number of components and providers but has a higher cost 14 (fitness values is 4.95). The obtained solution from Level 2, time step 3 has a compound component, id 79. This compound component forms the third level of the final system. For the third level we have four time steps. The modifications for each step are presented next: T = 1 (No modifications of the component repository), T = 2 (Adding two new components), T = 3 (Removing two old components), and T = 4 (Adding two new components and eliminating one old component). The best obtained solution from Level 3 (time step 4) has the fitness value 4.75 (5 provides, 4 components, and cost 11, \(L3=\{103,\ 96,\ 90,\ 87\}\)). The new components that were added to the repository at this time step improved the final solution from the structure perspective but not from the cost perspective. For example, the new added component 104 has cost 7, therefore the solution constructed containing this component has a fitness equal to 5.0 due to 4 providers, 3 components but cost 14. The obtained solution from Level 3, time step has no compound components, therefore no other execution of the algorithm is needed: all the compound components were configured.

Wilcoxon Statistical Test. In Sect. 3.4 we have described in details the Wilcoxon statistical test that we have use to compare our Genetic Algorithm with the Random Search Algorithm. In Table 2 we have the test results for the Case Study 1 - Dynamic Changing Components. The Wilcoxon statistical test shows that we have statistically significant evidence at \(\alpha =0.05\) to show that the median is positive, i.e. the \(H_{0}\) Null-Hypothesis is rejected in favor of \(H_{1}\) for all levels and for all time steps.

5 Related Work Analysis

This section presents the current state of art regarding the component selection problem and analyzes the differences compared with our present approach. Component selection methods are traditionally done in an architecture-centric manner. In relation to existing component selection methods, our approach aims to achieve goals similar to [4]. All the above approaches did not considered the multilevel structure of a component-based system; our previous research has studied this problem in [12]. Various genetic algorithms representations were proposed in [8, 10]. The authors proposed an optimization model of software components selection for CBSS development. We argue that our model differs by the fact that components interactions are computed automatically based on required and provided component interface specification. Also, regarding the function ratings, our approach discovers automatically the constituent components for each module of the final system. In [11] a hybrid approach for multi-attribute QoS optimization of component-based software systems has been proposed. In relation to this existing approach, ours aims to achieve similar goals, being capable of obtaining multiple solutions in a single run and it can be scaled to any number of components and requirements. Another perspective refers to updating requirements (components) from an already constructed system [15]. Our previous research regarding this perspective was proposed in [13]. Our current approach considers dynamic modifications of the requirements, investigating different ways of modifying them, by adding new ones or deleting existing ones. A similar approach considering evolution of software architecture was proposed by [2]. It suggests the best actions to be taken according to a set of new requirements. In relation to this approach, our current approach also discovers the optimal solution minimizing the final cost when new requirements are needed. But it also considers the case that the component repository changes over time.

6 Conclusion

The current work investigated the potential of evolutionary algorithms in a particular case of multiobjective dynamic system: multilevel component selection problem. Two types of dynamics have been considered: the requirements of the system change over time and the components available in the repository change over time. The Wilcoxon statistical test was used to compare our Genetic Algorithm approach with a Random Search Algorithm: we have statistically significant evidence at \(\alpha = 0.05\) to show that the median is positive, i.e. we obtain better results with our approach.

With respect to the state-of-art the following major aspects characterize the novelty of the approach presented in this paper: this is among the first papers that supports the evolution of a software architecture using an optimization model that automatically construct the entire architecture of a multilevel system; the components interactions are computed automatically based on required and provided component interface specification, and finally, our approach can facilitate the work of a maintainer.