Analyzing existing software for software reuse

https://doi.org/10.1016/S0164-1212(97)10004-8Get rights and content

Abstract

This paper describes an automated method to support analysis of existing software at the start of a systematic reuse initiative. By determining a baseline level of informal reuse, this method can be used to help identify promising domains for initial implementation of systematic reuse and provide information that may affect calculation of return on investment in reuse. It can also be used in conjunction with domain analysis to help identify ideas for reusable components. The method draws on techniques developed for detecting plagiarized student programs. The paper presents results of three case studies that used a prototype tool called SoftKin to test this approach on commercial application software. The case studies indicate that the results from plagiarism detection transfer nicely to the task of analyzing existing software for reuse.

Introduction

In recent years there have been quite a few reports of success with software reuse initiatives. These results indicate that reuse can reduce software costs and improve quality at the same time. In spite of that attractive combination of benefits, reuse in most organizations is limited to the informal reuse that programmers have always pursued on their own. There are still relatively few organizations that have made any investment in systematic reuse. Systematic reuse extends the benefits of informal reuse by making targeted investments to foster reuse in particular domains and processes. Systematic reuse is not easy to achieve however, and so this investment is risky. Even so, there is a great deal of interest in systematic reuse since the potential benefits are quite large. (Frakes and Isoda, 1994).

For an organization contemplating an investment in systematic reuse, one key question is how to use existing software in that effort. Intuitively, it would seem that organizations should make use of their existing software. After all, existing software embodies a great deal of information about an organization's problem domains. This information should be very valuable in creating reusable assets. In addition, existing software comprises critical systems which organizations can change only gradually. Reuse that builds on the current environment is consistent with the need for gradual transition to a reuse strategy. Finally, starting with the existing software may help create a feeling of ownership and lessen possible resistance to reuse.

In practice, applying existing software has not been a focus of systematic reuse. Many reports of reuse efforts make little or no mention of existing software. This is unfortunate given the advantages of using existing software as described above. However, it is not surprising given the generally uneven quality of existing software and its overwhelming volume. Jones documents the quality problem, estimating that 50% of existing information systems software is of low quality (Jones, 1994). In spite of these obstacles, there are several ways in which existing software might contribute to a software reuse effort.

First, existing software may be a source of reusable assets. It makes sense to try to reuse what you already have, and this approach is a natural extension of informal reuse. Several researchers mention using existing software in this way. Lim mentions existing software as one source of reusable components (Lim, 1994). Caldiera and Basili built the CARE system around the notion of creating a parts collection from existing software (Caldiera and Basili, 1991). On the other hand, an earlier effort to extract parts from existing software led to a conclusion that parts must be designed for reuse (Lenz et al., 1987).

These results seem to conflict, but on closer examination can be reconciled. Lenz tried to extract reusable parts from existing software without making any changes to the parts. This proved quite difficult for operating system software, the target domain for this project. (In part this was due to the use of global data in the form of operating system control blocks.) Both of the other reports mention reengineering or packaging the existing software to make it suitable for reuse. In short, existing software may be a source of reusable assets, but usually not without additional work. In addition, the potential for reuse may vary by domain. Even so, it appears that we can think of existing software as providing ideas for reusable assets, at least, and perhaps also providing material that we can reengineer for reuse.

Second, existing software can help identify good starting points for introducing systematic reuse. A large organization will need to introduce systematic reuse in phases. It is important to pick application domains that have the best probability of success for these initial efforts (Prieto-Diaz, 1991). It is reasonable to assume that systems with higher levels of informal reuse would offer higher probability of success, other things being equal. This may be because the domain offers more opportunity for reuse. Or it may be that the project team is more inclined to pursue reuse. In either case, the higher level of informal reuse should improve the chances of success in introducing systematic reuse. So for this second purpose, the goal in analyzing existing software is to provide a baseline measure of the level of informal reuse.

Third, a measure of informal reuse may also be relevant in calculating return on investment (ROI) in systematic reuse. Some approaches to calculating ROI exclude informal reuse from the calculation (Poulin and Caruso, 1993). The idea for this exclusion is that informal reuse should happen without investment and so should not count as part of ROI. Even so there are several interesting questions about the relationship between informal reuse and ROI. One is whether the level of informal reuse could serve as a predictor of ROI. This would be useful in justifying an investment in systematic reuse.

Another question is how informal and systematic reuse relate over time in a domain. In particular, as systematic reuse grows, does informal reuse increase or decrease? For example, informal and systematic reuse may be substitute goods in the economic sense. In that case an increase in systematic reuse would be partially offset by a decrease in informal reuse. If there is any general relationship like this, the change in informal reuse should be considered in calculating the ROI for systematic reuse.

While there are several ways that we might apply existing software to systematic reuse, it is important to adopt an effective strategy. A key issue is that any effort based on existing software may become prohibitively labor intensive due to the volume of material. Most organizations have a very large portfolio of existing software. This volume implies that any comprehensive analysis of existing software be substantially automated.

In addition, it seems likely that most existing software has little reuse potential. Caldiera and Basili suggest a target of selecting 5–10% of existing software for further analysis (Caldiera and Basili, 1991). Griss and Wosser report that Hewlett-Packard has found small reuse libraries most effective (Griss and Wosser, 1995). In short, it seems that our strategy should be oriented toward recovering a small percentage of valuable assets from a much larger portfolio. In this sense our problem is similar to classic information retrieval where the goal is to select a small set of documents in a large collection based on relevance to a particular query.

Putting these two aspects of our problem together, we would expect that successful strategies for applying existing software in a reuse initiative would be:

  • Automated: We need to process a lot of material to find a small amount that may be useful. We cannot use labor intensive techniques to deal with that volume of material. Instead, we must use approaches that are as automated as possible.

  • Partial: The marginal cost of recovering useful material increases. At some point the cost of recovering additional material will exceed the value of what you recover. To be cost-effective, the goal must be to skim the cream, rather than to recover every fragment of reusable software.

We should mention two final issues to establish the frame of reference for this work. The first relates to the definition of reuse. As mentioned above, the need to calculate ROI encourages a particular definition of reuse that may exclude reuse with modification (“white box” reuse). In addition, copying a routine within a program is generally not counted as reuse since it is a bad programming practice. We agree with these considerations in the context of systematic reuse. However, these types of activities are a reality in informal reuse, and they are of interest to us in this work. Cases of reuse with modification may be opportunities to reengineer for reuse without modification. Cases of duplicate routines within a program should be eliminated. In both instances there would be value to an organization in locating these cases as part of starting a systematic reuse effort.

Finally, we should note that our approach is probably best suited to reuse at the level of individual routines or modules. It would have much less value for large grain, architected reuse of systems. However, substantial experience and investment is a prerequisite for architected reuse (Griss and Wosser, 1995), and our approach is very relevant in the initial stages of reuse investment. We should also note that there have been many reports of successful reuse at the level of individual modules, including some indications that the best candidate reusable materials may be in the smaller modules of a given system (Selby, 1989).

Section snippets

General approach

The prior section outlines the potential value of analyzing existing software for systematic reuse and some strategy considerations. In this section we consider alternative approaches to the analysis. We organize this discussion around the general concepts of form and function. Software form includes physical characteristics like data and control structures. Most software metrics measure some aspect of software form. Software function includes logical characteristics related to the purpose of

Related work

We can divide the related work into three groups. The first are studies that looked at the relationship between form and function. If there was such a relationship, perhaps we could use form to predict function. But there is no known relationship between these two. Several studies have looked for some type of form-function relationship, but in each case no relationship was found (Belady and Lehman, 1979, De Marco and Lister, 1989).

A second group of studies explored ways to apply software

Measurement approach

As outlined above, there are substantial changes in context from student programs to a commercial software portfolio, so we cannot be sure how effective the plagiarism detection techniques will be in searching for informal reuse. Given this uncertainty, we decided to try several of the plagiarism detection measures in this study to see that the results paralleled the results in plagiarism detection. Therefore, the study includes analysis by several single-valued software metrics and a structure

Creating a structure profile

This section presents a structure profile for the source code shown in Fig. 1. The result of each step in constructing the profile is shown in Fig. 2. The steps for creating the structure profile are as follows.

1. Build an initial structure list from the source code. The initial list represents each executable program statement with one of the symbols shown in Fig. 3. The bounds of iteration and selection constructs are shown with square or curly braces. Alternatives within selections (e.g.,

Calculating similarity

The measurements and profile defined in the prior sections provide ways to describe each item in our existing software. In this section we discuss how we compare the values for two items to determine their similarity. For the single-valued metrics, we calculate similarity between two items as the percent difference in the metric values of each item. Thus, our similarity metric is a number between zero and one, with smaller numbers indicating greater similarity. For example, if modules A and B

Case study evaluation

To test the similarity measures described in the previous sections we created a prototype tool called SoftKin. SoftKin consists of a data collector and analyzer. The collector processes existing software and calculates the metrics and profile for each module. The analyzer computes similarity measures for each module pair in the collection. It then creates a set of ranked lists of module pairs, one list for each similarity measure, i.e. a list ranked by NCSS similarity, a list ranked by

Case study results

We applied SoftKin in a set of three case studies. Each case analyzed a selection of existing software from a large organization. Fig. 5 summarizes selected aspects of each case study. We can see from the figure that the I/S groups in each case study differ substantially in organization and operating methods.

The software selection for the three case studies included over 156,000 lines of code in 360 modules. These modules constitute eight separate application systems. Most of these applications

Conclusions

1. SoftKin as a production tool. The case study results indicate that the Structure Profile approach would be a useful tool for analysis of informal reuse. SoftKin does not provide a fully automated way to locate informal reuse, but it does provide an easy way to locate a substantial portion of reuse instances in existing software. In addition, an examination of the SoftKin ranking of reuse instances confirms that the SoftKin ranking reflects the amount of modification associated with the

Greg Hislop is on the faculty of the College of Information Science and Technology at Drexel University. He coordinates the College's Software Engineering curriculum. His interests include the creation and evolution of software, the application of software technology in large organizations, and software engineering education. He holds B.A. in Economics from Georgetown University, M.Sc. in Computing Science from Queen's University, and Ph.D. in Information Studies from Drexel University. Dr.

References (17)

  • G. Whale

    Software metrics and plagiarism detection

    Journal of Systems and Software

    (1990)
  • Belady, L.A., Lehman, M.M., 1979. The characteristics of large systems. In: Wegner, P. (Ed.), Research Directions in...
  • Caldiera, G., Basili, V.R., 1991. Identifying and Qualifying Reusable Software Components. IEEE Computer, 24 (2) pp....
  • DeMarco, T., Lister, T., 1989. Software development: state of the art vs. state of the practice. Proceedings of the...
  • Frakes, W., Isoda, S., 1994. Success Factors for Systematic Reuse. IEEE Software, pp....
  • Griss, M., Wosser, M., 1995. Making Reuse Work at Hewlett-Packard. IEEE Software, pp....
  • Halstead, M., 1977. Elements of Software Science. North-Holland, New...
  • Jones, C., 1994. Assessment and Control of Software Risks. Yourdon Press, Englewood Cliffs,...
There are more references available in the full text version of this article.

Cited by (6)

  • Challenges of structured reuse adoption — lessons learned

    2015, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
  • Corpora and text re-use

    2009, Corpus Linguistics: An International Handbook
  • Production workflows: A model for reuse

    2005, IEEE International Conference on Emerging Technologies and Factory Automation, ETFA
  • Contextual reusability metrics for event-based architectures

    2005, 2005 International Symposium on Empirical Software Engineering, ISESE 2005
  • Pattern matching in collections of java bytecode: Maxmising reuse potential

    2003, Intelligent Engineering Systems Through Artificial Neural Networks

Greg Hislop is on the faculty of the College of Information Science and Technology at Drexel University. He coordinates the College's Software Engineering curriculum. His interests include the creation and evolution of software, the application of software technology in large organizations, and software engineering education. He holds B.A. in Economics from Georgetown University, M.Sc. in Computing Science from Queen's University, and Ph.D. in Information Studies from Drexel University. Dr. Hislop has nearly 20 years industrial experience in software engineering and systems management. He is a member of ACM and IEEE.

1

Tel.: +1 215 895 2179; fax: +1 215 895 2494; e-mail: [email protected].

View full text