Keywords

1 Introduction

The increase in the number and complexity of information systems and software applications sets the ground for the development of methods for increasing reuse. Software Product Line Engineering (SPLE) promotes systematic reuse among different, yet similar, systems by creating reusable artifacts, commonly referred to as core assets or domain artifacts, and guiding their use in particular systems [7, 12]. It has been shown that SPLE has the potential to decrease time-to-market and increase product quality, yet it requires a high up-front investment in the development of core assets [7]. Hence, SPLE techniques are commonly adopted in a bottom-up approach, namely after several similar variants have been created using ad-hoc reuse techniques [6]. In these cases, the decision whether to set up a Software Product Line (SPL) from the individual systems and adopt SPLE techniques is non-trivial and requires an in-depth assessment. Such assessment should be based on (i) analyzing the similarity and variability of the existing products, and (ii) applying suitable metrics that reflect the effort of creating core assets and generating particular (product) artifacts from these core assets.

While many studies deal with different aspects of similarity analysis of software artifacts, e.g., [1] for variability analysis and [2, 3, 17] for clones detection, only few studies address ‘product line-ability’ decisions. Berger et al. [4] coined the term ‘product line-ability’ to refer to the ability of a set of products to form a product line and suggested how to measure it in practice [5]. Several metrics have been proposed for evaluating product line architectures [8, 9, 19]. However, all of these approaches assume a rather high similarity of representations, mainly implementations or architecture models, of the different systems. This makes the reuse of similar components quite straightforward, namely, reused as-is. In reality, however, things are not so black and white; the answer may be degree-based: while some components can be reused as they are, without any adaptations or with small adaptations, others may be quite different in terms of low-level implementations. This is especially relevant in cases where systems were developed by different teams, for different purposes and do not necessarily share similar representations. This leads to the need for defining more robust reusability analysis methods, which take into account the intensions (as specified in behaviors) of artifacts, rather than solely their implementations, and allow for a more refined evaluation of the reuse effort, which reflects the possibilities to adopt specific reuse practices.

In our previous works we presented an approach based on behavioral similarity of object-oriented software artifacts [13,14,15]. This approach views such artifacts as things exhibiting behaviors, i.e., transformations between states due to some external events. The approach further supports associating different reuse mechanisms based on the characteristics of related similarity mappings, which intuitively correspond to different degrees of reuse efforts. In this paper we propose a framework for the identification of “similarly behaving” artifacts and analyzing their potential reuse in the context of product lines. The framework is based on three polymorphism-inspired mechanisms: parametric, subtyping, and overloading. It further provides metrics for calculating behavior similarity and a method for analyzing the product line-ability of a set of products, aiming to support developers in making reuse decisions.

The rest of the paper is structured as follows. Section 2 presents related work on product line-ability metrics. Section 3 provides an overview of the suggested approach and introduces similarity relations between a set of systems. Section 4 presents the product line-ability framework that is based on the similarity relations, while Sect. 5 is devoted to implementation aspects and preliminary results. Finally, Sect. 6 summarizes and refers to future directions.

2 Previous Work

Few works propose metrics for evaluating the ability of a set of systems or products to form a product line. We briefly overview some of the most relevant ones below.

In [19], metrics are suggested to assess similarity, variability, reusability, and complexity of product line architectures. These metrics are applicable only on architectures specified in the vADL specification language and thus have a limited scope.

In [4], a more general framework is proposed for deriving a set of metrics for evaluating similar systems, which can be used to estimate the ability of a set of products to form a product line. The framework assumes the availability of a graph-based representation of all products, reflecting dependencies between products’ components which can be logical or communicative. In [5], these metrics are applied for a particular case study from the automobile industry for evaluating existing steering systems and estimating the benefit of creating a software product line. In [18], a graph-based representation is also used, incorporating weight values of assets into the metrics. This approach is based on the assumption that similarity of components means at least syntactic similarity, i.e., similar components have identical interfaces. As explained in [4], “syntactical signature identity is at least necessary but not sufficient. Therefore, semantic signature identity for two components must additionally be ensured which can for example be evaluated automatically by using the component’s test suites in an entangled manner which have to ensure path coverage at least”.

Two metrics proposed by Berger et al. [4, 5] are of special interest for assessing product-line ability in general and introducing our work in particular: size of commonality – evaluating the similarity of the existing products (and consequentially the size of potential core assets), and product-related reusability – reflecting the effort of creating particular products from the core assets. Table 1 describes these metrics. In all cases, the set of products is denoted by p1, …, pn, and the required and optional components of a product pi, namely, the components inherently necessary to fulfill the product’s basic functionality and those which add further functionality, are denoted by Cpi,r and Cpi,o, respectively.

Table 1. Product line-ability metrics taken from [4]

It should be noted that the metrics suggested by Berger et al. [4, 5] assume the availability of explicit representation of the required and optional components of each product. Moreover, they take a low-level syntactic view, assuming identity of interfaces of the considered components. Finally, they only consider reuse as is, avoiding other types of reuse that require adaptation of artifacts.

We aim to lift the above strong assumptions, taking a more high-level, behavioral approach. In other words, we are interested also in situations where components may have semantically similar interfaces, and exhibit similar (not necessarily identical) or even different behaviors. We also aim to distinguish between different types of reusability, according to the effort they require for artifacts adaptation. For the sake of completeness, we shortly reproduce definitions of behavioral similarity from our previous publications [13,14,15] in Sect. 3. Sections 4 and 5 present the new contribution, namely the behavioral product line-ability framework and its application.

3 Behavioral Similarity of Products

For calculating similarity between components (systems or parts of them), we consider their behaviors, each of which is viewed as a transformation between stable states due to some external triggers.

Definition 1 (behavior).

A behavior is represented by a triple (S1, <e>, S*), where:

  • S1 is the initial state of the component before the behavior occurs,

  • <e> is a sequence of external events triggering the component’s behavior,

  • S* is the final state of the component after the behavior occurs.

For instance, the behavior of a sorting component can be characterized by the input – unsorted list of items (S1), the sorting event (<e>), and the output – sorted list (S*).

Behavior can be further represented via two types of descriptors: shallow and deep.

Definition 2 (shallow descriptor).

The shallow behavior descriptor, commonly known as interface, is a pair (op, params) where:

  • op = (bname, btype), bname is the behavior name and btype is the behavior’s output (returned type, in the form of a finite or infinite set of possible values);

  • params = {(pname, ptype)} denoting the behavior’s parameters and types (all ptypes have the form of sets of possible values).

Definition 3 (deep descriptor).

The deep behavior descriptor is a pair (att_used, att_modified), where:

  • att_used = {(aname, atype)}, where aname, atype are respectively the name and type (possible values) of an attribute involved in the behavior;

  • att_modified = {(aname, atype)}, where aname, atype are respectively the name and type (possible values) of an attribute being modified in the behavior.

The aforementioned descriptors enable representation of behavior as a triple (S1, <e>, S*) where S1 = deep.att_used ∪ shallow.params (namely, all attributes and parameters being used by the behavior before being modified by it), and S* = deep.att_modified ∪ shallow.op (i.e., all attributes modified by the behavior and the output). We assume that <e> is derived from the semantics of the behavior name.

Table 2 exemplifies these representations for two sorting algorithms – merge and quick sortFootnote 1. While the implementations of these algorithms are quite different, their shallow and deep descriptors are quite similar. Yet, we need a formal way to compare them, as the name and types of attributes may be different.

Table 2. Examples of shallow and deep behavior descriptors

Next, we define relations based on behavioral similarity. For this we assume the availability of a similarity mapping Sim between constituents of the shallow and deep descriptors. The similarity mapping can be based on syntactic considerations (e.g., identity). In this case, the parameters of the two implementations of the sorting algorithms, input and inputArray, will not be considered similar. Alternatively, the similarity mapping can be based on semantic considerations using semantic nets or statistical techniques to measure the distances among words and terms [11]. In this case, the names used in the code need to be meaningful. Another option is using type or schematic similarity, potentially ignoring the semantic roles or essence of the compared elements [10]. This is mainly relevant for comparing systems from different domains.

The choice of a particular similarity mapping Sim between the descriptors elements of the compared behaviors b1 and b2 induces several types, depending on the following properties of Sim:

  • Sim is either covered, i.e., total and onto, or not covered.

  • Sim is either single mapped (i.e., every element of the descriptors of b1 has a matching single element of the descriptors of b2), or multi-mapped (there are elements which have more than one match).

Combinations of the above properties of similarity mapping correspond to different similarity (reuse-related) relations which may hold between shallow and deep behavior descriptors.

Definition 4 (similarity relation).

Given a similarity mapping Sim, the similarity relation between two behaviors b1 and b2 can be:

  • Use: if Sim is covered and single-mapped (intuitively corresponding to the highest degree of similarity between behaviors).

  • Refinement: if Sim is multi-mapped (intuitively corresponding to splitting a variable of one behavior in the other).

  • Extension: if Sim is not covered (intuitively corresponding to adding a variable in a behavior).

These similarity relations are summarized in Table 3. Note that both refinement and extension can hold at the same time; we refer to this situation as refined extension. Two behaviors may not be related with a similarity relation at all. The two sorting algorithms depicted in Table 1 can be considered Extension (EXT), as Merge Sort handles (reads and writes) an extra attribute – tempMergeArray of type int[].

Table 3. Similarity relations

It should be noted that if we take Sim as identity, USE corresponds to the use-as-is notion of reusability considered in [4, 5]. This notion is considerably extended in our framework by (i) allowing similarities other than identity, (ii) considering two different representations of operation: shallow (interface) and deep (transformation), and (iii) considering other variants of similarity relations besides USE (a 1–1 single-mapped mapping), including REF, EXT, and REF-EXT.

Concerning point (ii) above, the similarity relations defined in Table 3 can indeed hold for both shallow and deep descriptors, leading to 25 possibilities ({USE, REF, EXT, REF-EXT, NONE} for both shallow and deep descriptors). In this paper, we focus on polymorphism-inspired mechanisms, which are characterized by the cases when the USE relationship holds between shallow descriptors. This intuitively corresponds to the case when interfaces of the behaviors are similar (but not necessarily identical as usually is the case in polymorphisms; for this reason we term these relations polymorphism-inspired). As listed in Table 4, the differences in the deep descriptors similarity distinguish between three types of potential reuse: parametric (similar transformations), subtyping (refined or extended transformations), and overloading (different transformations). Intuitively, these types require different amount of effort when reusing with parametric requiring the least effort and overloading – the most effort (recall that overloading implies different deep behaviors, namely, different transformations on attributes).

Table 4. Characteristics of polymorphism-inspired mechanisms

Next we introduce a method and metrics for deciding on product line-ability based on the different types of behavioral similarity relations.

4 Behavioral Product Line-Ability Framework

In what follows we assume a set of products p1, …, pN (N ≥ 2), where each pi is a set of components, namely, pi = {Ci1, …, Ci ki}. A component can be a product or its part, e.g., a module or a class. Product line-ability of such a set of products has two directions:

  • bottom-up, i.e., constructing a core asset out of the products’ components

  • top-down, i.e. creating products out of the core asset.

The simplest case is when all products share a set of identical, mandatory components. Then in the bottom-up direction all these components can be taken to form a core asset. In the top-down direction, these components will be used (as-is) in a new product. However, components may not be identical, but still similar. Then the construction of a core asset (bottom-up direction), while requiring some adaptation effort, may still be worthwhile. Also the creation of new products from the core asset may require adaptation (top-down direction). Product line-ability analysis in this case is intended for providing a way of assessing the required adaptation efforts in both directions. In addition, some products may be less related than the others, and excluding them from the analysis may result in better product line-ability of the remaining products.

To address these challenges, we propose a behavioral product line-ability framework, which is based on the notion of behavior similarity presented in Sect. 3. The framework provides a method for product line-ability analysis, whose inputs are sets of products and the output is a graph highlighting the potential product line-ability of the input product sets. The method is composed of four main sequential steps:

  1. 1.

    Behavior similarity calculation – see Sect. 3 for details.

  2. 2.

    Similarity degree measurement.

  3. 3.

    Product-related variability degree measurement for each product in the set.

  4. 4.

    If there are products in the set whose product-related variability degree is too high (the specific threshold is an analyst’s decision), these products can be excluded and the method returns to step 2 for re-measuring the similarity and product-related variability degrees.

Figure 1 depicts the method’s steps, as well as its inputs and analyst involvement (upper part) and outputs (lower part). Next, we elaborate on the data structure used in the method – called similarity graph, as well as on the calculations of similarity and product-related variability degrees.

Fig. 1.
figure 1

The suggested method for product line-ability analysis

4.1 Similarity Graphs

The method is based on a graph data structure – called similarity graph – which is a colored undirected graph whose vertices represent components of a set of products (each vertex is colored, where different colors represent different products), and the edges represent reuse types of parametric, subtyping and overloading, defined in Sect. 3.

Definition 5 (similarity graph).

Given a set of a set of products p1, …, pN (N ≥ 2), a similarity graph is an undirected graph of the form G = (V, E), where V is a set of pairs (Cij, li) (Cij is the j-th component of product pi and li is the color associated with pi) and E is a set of triplets (Cij, Ckl, t) which associate components Cij and Ckl with a reuse type t (t ∈ {parametric, subtyping, overloading}).

Returning to our sorting example, assume that we have three products: P1 with sorting algorithm classes Merge 1 and Quick 1, P2 with Merge 2 and Quick 2 and P3 with Merge 3, Quick 3 and Optimized Quick 3 (the optimization extends the behavior of quick sort in order to improve performance). Figure 2 exemplifies a possible similarity graph with seven (class) components. According to this graph, the three quick sort implementations have very similar behaviors, and are therefore connected by parametric edges. As noted, Optimized Quick 3 extends the typical quick sort behavior by writing to additional attributes to increase performance. Thus, the edges between Optimized Quick 3 and the three quick sort algorithms are of type subtyping. Merge 3 behaves differently than Merge 1 and the other sorting algorithms, as it sorts objects rather than simple types (such as integers and strings).

Fig. 2.
figure 2

Example of a similarity graph

4.2 Similarity Degree Measurement

First, we need to assess the degree of similarity of the given set of products: low degree of similarity may not justify transforming the set of products into a product line. This metric relates to the bottom-up direction, and requires analyzing subgraphs of the corresponding similarity graph. Each “good enough” subgraph (i.e., a subgraph with high similarity between its constituent components) will correspond to a potential core asset in case the given set of products will be transformed into a product line. Hence, each such subgraph should include vertices colored in different colors (i.e., belonging to different products). Yet, not all colors must appear in the subgraph, indicating on the existence of optional components.

Intuitively, the most similar subgraphs are cliques in which all edges are parametric (restricting Sim to identity mapping corresponds to the case of SoC – Size of Commonality – metric from Table 1). This means that all components are similar to each other and thus potentially can be easily transformed into a core asset. We call such subgraphs m-colored parametric asset, where m is the minimal number of products (colors) that need to appear in the asset (m ≥ 2).

Definition 6 (m-colored parametric asset).

An m-colored parametric asset (m ≥ 2) in a similarity graph G = (V, E) is a subgraph G′ = (V′, E′), where (i) V′ ⊆ V, E′ ⊆ E, (ii) at least m colors appear in V′, and (iii) for each v1, v2 ∊ V′ (v1, v2, parametric) ∊ E′.

For instance, G1 in Fig. 3 is an example of a 3-colored parametric asset. G2 is not such an asset, but our framework allows also for analyzing subgraphs which are not (“full”) parametric assets, yet may yield useful core assets after some adaptation. In these cases, some non-parametric or missing edges may be allowed, indicating different variants, and some of the colors may not appear as well (m < N), identifying optional elements. G2 is indeed such an example, connecting three algorithms of two different types (merge and quick) in two products, all of which handle sorting of integers.

Fig. 3.
figure 3

A snapshot from VarMeR

To support more flexibility of similarity types, we define m-color behavioral similarity degree with respect to how “close” a given similarity subgraph is to being an m-colored parametric asset. Formally expressed:

Definition 7 (m-color behavioral similarity degree).

Let G = (V, E) be a similarity graph and G a subgraph of it with at least m colors (representing different products, m ≥ 2). \( {\text{P}}_{{{\text{G}}^{\prime } }} \), \( {\text{S}}_{{{\text{G}}^{\prime } }} \), \( {\text{O}}_{{{\text{G}}^{\prime } }} \) are the numbers of parametric, subtyping and overloading edges in G′, respectively; k = |V′| is the number of vertices in G′. The m-color behavioral similarity degree of G′ is a triplet (PS, SS, OS), where: \( {\text{PS}}\, = \,\frac{{2P_{{G^{{\prime}} }} }}{{k\left( {k\, - \,1} \right)}} \), \( {\text{SS}}\, = \,\frac{{2S_{G}^{{\prime}} }}{{k\left( {k\, - \,1} \right)}} \), \( {\text{OS}}\, = \,\frac{{2O_{G}^{{\prime}} }}{{k\left( {k\, - \,1} \right)}} \).

We further define a total order relation on the behavioral similarity degree as follows:

Definition 8 (total order relation).

Let (x1, x2, x3), (y1, y2, y3) be two behavioral similarity degrees. The total order relation < is defined as (x1, x2, x3) < (y1, y2, y3) iff x1 < y1, or x1 = y1 and x2 < y2, or x1 = y1 and x2 = y2 and x3 < y3.

As an example, consider G1 in the similarity graph of Fig. 2. The behavioral similarity degree is (1, 0, 0), indicating on a 3-colored parametric asset. For G2 the behavioral similarity degree is lower, (0.33, 0, 0.67), indicating on a “less parametric” and “more overloading” 2-colored parametric asset.

Using the order relation defined above, the subgraphs of the similarity graphs can be ordered, so that subgraphs appearing first in the list potentially require less effort to being transformed into core assets. We refer to this list as potential m-color core assets.

Definition 9 (potential m-color core assets).

Let G be a similarity graph. The potential m-color core assets (m≥2) are a list of all subgraphs of G including at least m colors - {Gi}, such that if Gi precedes Gj then the behavioral similarity degree of Gi is greater than that of Gj (using the < relation defined above).

4.3 Product-Related Variability Degree Measurement

A complementary way of analyzing product line-ability is measuring the differences between each product and the potential m-color core assets, as captured by the m-color behavioral similarity degree. This addresses the top-down direction, i.e., after transforming the set of products into a product line how much effort it will be to derive a specific product pi. Intuitively, greater coverage of vertices of the form Cij by potential m-color core assets indicates higher product line-ability. Note that there is a tradeoff between covering more components (vertices) and minimizing the number of missing and/or subtyping/overloading edges.

Definition 10 (m-color product-related variability degree).

Let G = (V, E) be a similarity graph including product p = {Cj}j=1..k (i.e., Cj are vertices of G), and {Gi} – a set of m-colored subgraphs of G (selected from the potential m-color core assets of G). The m-color product-related variability degree is a quadruplet (PV, SV, OV, PSV), where:

$$ {\text{PV}} = \frac{{|\{ C_{j} \, \in \,p|\exists G_{i} \, = \,\left( {Vi,Ei} \right) \;and\;C^{{\prime }} \, \in \,Vi\;s{.}t{.} \;C_{j} \in Vi\;and\; \left( {C_{j} ,{C^{{\prime }}}, parametric} \right)\, \in \,Ei\} |}}{k} $$

is the parametric variability, namely the percentage of components in p that require parametric adaptation (note that k is the number of components in p).

$$ {\text{SV}} = \frac{{|\{ C_{j} \, \in \,p|\exists G_{i} \, = \,\left( {Vi,Ei} \right) \;and\;C^{{\prime}} \, \in \,Vi\;s{.}t{.} \;C_{j} \in Vi \;and\; \left( {C_{j} ,C^{{\prime }} ,subtyping} \right)\, \in \,Ei\} |}}{k} $$

is the subtyping variability, namely the percentage of components in p that require subtyping adaptation.

$$ {\text{OV}} = \frac{{|\{ C_{j} \, \in \,p|\exists G_{i} \, = \,\left( {Vi,Ei} \right) \;and\;\,C^{{\prime }} \, \in \,Vi\;s{.}t{.} \;C_{j} \in Vi\;and\; \left( {C_{j} ,C^{{\prime }} ,overloading} \right)\, \in \,Ei\} |}}{k} $$

is the overloading variability, namely the percentage of components in p that require overloading adaptation.

$$ {\text{PSV}}\,{ = }\,\frac{{|\{ C_{j} p|\neg \exists G_{i} \, = \,\left( {Vi,Ei} \right)\;s{.}t{.} \;C_{j} \, \in \,Vi\} |}}{k} $$

is the product-specific variability, namely the percentage of components in p that require addition (are not developed based on existing core assets).

As an example consider the similarity graph in Fig. 2. The 2-color product-related variability degree with respect to {G1, G2} defined above is:

  • For the first (white) product P1 = {Merge 1, Quick 1}: (1, 0, 1, 0)

  • For the second (light grey) product P2 = {Merge 2, Quick 2}: (1, 0, 0.5, 0)

  • For the third (dark grey) product P3 = {Merge 3, Quick 3, Optimized Quick 3}: (0.33, 0, 0, 0.67)

The analyst may decide to exclude from the analysis the products which vary the most (such as P3 in our example), after which the bottom-up step of constructing the core assets can be repeated, thereby improving the product line-ability of the remaining products.

5 Implementation Aspects and Preliminary Results

The method defined above has been integrated into VarMeR – Variability Mechanisms Recommender [16]. The inputs of VarMeR are object-oriented code artifacts (in Java) that belong to different products. The outputs include (colored) similarity graphs that highlight the similarity and variability in the behaviors (public operations) of the classes of these products. The edges in these graphs represent different reuse types (parametric, subtyping, and overloading) between classes. Figure 3 depicts a snapshot from VarMeR for implementations of three types of sorting algorithms (Bubble, Quick, and Merge) in four products. To simplify discussion in the sequel, we annotated the vertices with labels of the form Pi-B, Pi-Q, Pi-M, where Pi indicates the product id and B, Q, M – the sorting type. The percentages appearing on the edges are of similar (deep) behaviors of the corresponding classes, as reflected by the public operations of these classes. In other words, classes that have more similar operations of a certain type of relation (USE, REF/EXT/REF-EXT, NONE) will result in higher similarity percentages of the corresponding reuse type (parametric, subtyping, overloading, respectively). The tool supports suppressing low percentages, indicating on low behavioral similarity, by defining a minimal threshold for each type of edges (see the upper bars in Fig. 3). Scalability support of VarMeR, namely, analyzing similarity and variability in different granularity levels (including, product, package, and class), is out of the scope of this paper and is discussed in [16]. On top of the similarity graphs, the analysts can present the different metrics defined in Sect. 4.

We explored the results of our method in the context of product line-ability decisions for a set of four projects consisting of implementations of different sorting algorithms. All projects were downloaded from GitHub repository and developed by different teams (thus low level of syntactic similarity was expected). The four projects involved three sorting algorithms (B-bubble, M-merge, and Q-quick; some projects included multi-classes for each type of sorting; Fig. 3 presents VarMeR outcome for this case).

Due to space limitations, we only exemplify how the tool can be used for supporting decision making in the bottom-up dimension. Table 5 presents two potential core assets for m = 3, namely it uses 3-color behavioral similarity degrees as indication for product line-ability. It can be seen that the first core asset highlights a highly reusable (close to parametric clique) artifact for bubble sort implemented in different projects. The second core asset requires more effort, as some of the artifacts are related via subtyping relations, requiring refinement and/or extension of their deep behaviors. Note that an asset consisting of both is better than the second core asset but worse than the first core asset (similarity degree of (0.33, 0.38, 0)). This means that we do not necessarily desire core assets of maximal size, but of minimal effort, as predicted by the different similarity degrees.

Table 5. Potential core assets and their similarity degrees as predictors for product line-ability

Let us assume that after carefully analyzing the specific effort required for transforming the different artifacts to core assets (based on the similarity graph, but also after inspecting the code), the analyst selects the two assets shown in Table 5. With respect to this selection, the tool can be used for supporting decision making in the bottom-up dimension, based on product-specific variability degrees. While none of the products is fully covered by this selection, P2 is completely unrelated (its product-related variability degree is (0, 0, 0, 1), meaning that all classes are specific to this product). P1, P3, and P4, on the other hand, result in higher product-related variability degrees of (0.33, 0.33, 0, 0.66), (0.5, 0.17, 0, 0.5) and (0.33, 0.66, 0, 0), respectively. This implies that it is better to discard P2 from the product line, but P1, P3, and P4 can be used to extract core assets, after performing the needed adaptations to the original artifacts.

6 Summary and Future Directions

Deciding whether to turn existing systems or products into a product line – referred to as product line-ability – is an important practical problem which requires measuring and analyzing similarities and differences among systems. The main challenge here is that detecting similarities in the context of systems developed by different teams and for different purposes is notoriously hard as these software artifacts may differ in their implementation, while still behaving similarly.

In this paper, we propose a framework for the identification of “similarly behaving” artifacts and calculation of their degree of similarity and variability. This framework provides in addition a method for analyzing the product line-ability of a given set of products both in bottom-up and top-down directions. We describe how the proposed framework has been integrated into the VarMeR tool to support product line-ability decisions on Java artifacts. The paper also demonstrates how such decisions can be taken using the framework and its method. While VarMeR is a tool oriented towards analysis of Java artifacts, the presented approach is general, and we plan to extend it to other types of artifacts such as various types of models and textual requirements.

A mandatory future research direction is further validation of the proposed framework by investigation of its potential to support reuse decisions of developers. We intend to empirically evaluate the outcomes of the tool-supported method in the context of academic software engineering courses and professional workshops. Another direction is extending the framework to support more types of reuse relations, including analogy and aggregation, but also cases in which the shallow behaviors are different. Finally, other product line-ability metrics for the bottom-up and top-down dimensions can be explored, drawing further inspiration from metrics proposed in [4].