How to write comments suitable for automatic software indexing1

https://doi.org/10.1016/S0164-1212(98)00004-1Get rights and content

Abstract

This paper proposes a strategy for writing the comments (that is, natural-language text) embedded in the source code of software components. The novelty of the strategy is that it suggests to proceed step by step starting from given specifications. At the end of the writing process we have, besides the comments themselves, a set of indices forming a short description of the software component (its profile), which is much easier to manipulate than the full text. By storing profiles into a database a software catalog is built. The availability of software catalogs is strategic to locate reusable components matching specific requirements.

Section snippets

Motivation

The traditional purpose of comments is to facilitate code understandability; however, there is a recent trend, originated from the research in the field of software reuse (see, for instance, Shafer et al., 1994; Systematic reuse, 1994; Mili et al., 1995; Software reuse, 1995) which attempts to use the comments also in the software indexing process with the final aim of building a software catalog.

Previous efforts for building software catalogs can be roughly classified into three basic groups

A free-text automatic indexing scheme

The indexing scheme we refer to here is the LA-based one proposed by Maarek et al. (1991); being an LA the co-occurrence of a pair of (inflectional roots of) words (let say (w1,w2)) within the generic sentence of a document. Specifically, the authors consider as meaning-bearing only those LAs involving open-class words (namely, nouns, verbs, adjectives, and adverbs). The LA extraction algorithm they adopt (Fig. 3) takes advantage of the empirical finding that 98% of all LAs relate to words

The natural-language documentation investigated

As mentioned in Section 1, the natural-language documentation used to carry out the experiment reported in the next section concerns two different categories of text files selected ad hoc:

  • 1.

    The text files of 20 Unix commands,4 as available on line through the command man. These text files are structured in terms of 8 items (Fig. 4).

  • 2.

    The text files reproducing the typeset sections given as natural-language documentation of 20 IMSL routines being

An experiment of automatic indexing

The aim of the experiment reported below, concerning both text files mentioned in Section 3,5 is twofold, namely give evidence:

  • 1.

    of the already mentioned impossibility of making predictions about the meaningfulness of the results arising from the application of automatic free-text indexing schemes to a short text;

  • 2.

    that the appropriateness of the derived

A new scenario

Since the quality of text cannot be expressed simply by the values of parameters capturing its lexical structure, we propose adopting a novel way of writing the comments in order to have a direct control over the final result of the indexing process. Such a strategy keeps the two objectives of the comments distinct (Section 1); moreover it relies on an automatic tool to get information about the appropriateness of the profile extractable from them at any given moment.

The idea behind the

Conclusions

Besides facilitating the code understandability, comments should also be suitable for indexing the software (basic step in the construction of a software catalog essential for speeding up the process of locating reusable software components). In this paper we have defended the thesis that in order to achieve both such objectives comments, like programs, have to be written according to a given discipline fixing the comments' specifications as well as the procedure to meet them. In this way, the

Unlinked References

Ralston and Rabinowitz, 1978

Acknowledgements

We are grateful to two anonymous referees whose comments deeply influenced the presentation of our work.

Paolino Di Felice is an associate professor of computer science at the Department of Electrical Engineering of the University of L'Aquila, Italy.

He has published articles in the areas of programming methodologies, visual programming, scientific software, relational databases, and object-oriented data modeling. His current research interests concern software reuse, spatial relations, and approximate spatial reasoning.

He is an affiliate member of the IEEE Computer Society and the Association for

References (14)

  • M.R. Girardi et al.

    Using English to retrieve software

    J. Systems Software

    (1995)
  • P. Di Felice

    Reusability of mathematical software: a contribution

    IEEE Trans. Software Eng.

    (1993)
  • Di Felice, P., Fonzi, G., 1995. On automatic software indexing. Technical Rep. No.47–95, Univ. of...
  • Frakes, W.B., Nejmeh, B.A., 1987. Software reuse through information retrieval. In: Proceedings of the Twentieth Annual...
  • W.B. Frakes et al.

    Proteus: A reuse library system that supports multiple representation methods

    ACM SIGIR Forum

    (1990)
  • W.B. Frakes et al.

    An empirical study of representation methods for reusable software components

    IEEE Trans. Software Eng.

    (1994)
  • Y. Maarek et al.

    An information retrieval approach for automatically construct software libraries

    IEEE Trans. Software Eng.

    (1991)
There are more references available in the full text version of this article.

Cited by (4)

  • FNDS: A dialogue-based system for accessing digested financial news

    2005, Journal of Systems and Software
    Citation Excerpt :

    We envision that the techniques underlying the design of FNDS will ultimately be applied to knowledge grid research (Berman, 2001; Zhuge, 2004). Assuming that heterogeneous sources of information are described using natural language-like metadata (similar to that proposed in (Di Felice and Fonzi, 1998)), it may be possible to automatically integrate and digest information/knowledge by applying various information extraction techniques (with the aid of an ontology of the relevant domain). A representation may thus be generated for specializing the digested knowledge, which can facilitate the retrieval of the knowledge (Zhuge and Liu, 2004).

  • Improved method for the indexing of software

    1999, Information and Software Technology
  • Reuse-conducive development environments

    2005, Automated Software Engineering
  • Promoting reuse with active reuse repository systems

    2000, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Paolino Di Felice is an associate professor of computer science at the Department of Electrical Engineering of the University of L'Aquila, Italy.

He has published articles in the areas of programming methodologies, visual programming, scientific software, relational databases, and object-oriented data modeling. His current research interests concern software reuse, spatial relations, and approximate spatial reasoning.

He is an affiliate member of the IEEE Computer Society and the Association for Computing Machinery. He is a founder member of the ACM special interest group on applied computing (SIGAPP) and a program committee member of the annual symposium of the SIGAPP.

Goffredo Fonzi received the Dr. Ing. degree in Electronic Engineering from the University of L'Aquila, Italy, in 1995. His main research interest concern software reuse. He is an affiliate member of the IEEE Computer Society.

1

Work supported by the M.U.R.S.T.

View full text