Abstract
This paper presents an approach to recover time variant information from software repositories. It is widely accepted that software evolves due to factors such as defect removal, market opportunity or adding new features. Software evolution details are stored in software repositories which often contain the changes history. On the other hand there is a lack of approaches, technologies and methods to efficiently extract and represent time dependent information. Disciplines such as signal and image processing or speech recognition adopt frequency domain representations to mitigate differences of signals evolving in time. Inspired by time-frequency duality, this paper proposes the use of Linear Predictive Coding (LPC) and Cepstrum coefficients to model time varying software artifact histories. LPC or Cepstrum allow obtaining very compact representations with linear complexity. These representations can be used to highlight components and artifacts evolved in the same way or with very similar evolution patterns. To assess the proposed approach we applied LPC and Cepstral analysis to 211 Linux kernel releases (i.e., from 1.0 to 1.3.100), to identify files with very similar size histories. The approach, the preliminary results and the lesson learned are presented in this paper.
- Boehm, B. W. Software Engineering Echonomics. Prentice-Hall Inc., Englewood Cliffs, N.J., 1981. Google ScholarDigital Library
- Boehm, B., Clark, B., Horowitz, E., Westland, C., Madachy, R., and Selby, R. Cost Models for Future Software Life Cycle Processes: COCOMO 2.0. Annals of Software Engineering. vol. 1, 1987, 57--94.Google ScholarCross Ref
- Hastings, T. E., and Sajeev, A. S. M. A Vector-Based Approach to Software Size Measurement and Effort Estimation. IEEE Transactions on Software Enginnering, vol. 27, no. 4, 2001, 337--350. Google ScholarDigital Library
- Itakura F., Minimum prediction residual principle applied to speech recognition, IEEE Trans. Acoustics, Speech, and Signal Processing. vol. 23, pp.67--72, Feb. 1975Google ScholarCross Ref
- Kalpakis K., Gada D., and Puttagunta V., "Distance Measures for Effective Clustering of ARIMA Time-Series". In Proc. of the 2001 IEEE International Conference on Data Mining (ICDM'01), San Jose, CA, November 29-December 2, 2001, pp. 273--280. Google ScholarDigital Library
- Lindvall, M. Monitoring and Measuring the Change-Prediction Process at Different Granularity Levels: An Empirical Study. Software Process Improvement and Practice, no. 4, 1998, 3--10.Google ScholarCross Ref
- Markel, J. D. and Gray Jr, A. H. Linear Prediction of Speech. Springer-Verlag, New York, 1976. Google ScholarCross Ref
- Myers C. S. and Rabiner L. R. A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, 60(7):1389--1409, September 1981Google ScholarCross Ref
- Mockus A., Weiss D. M., Zhang P. Understanding and Predicting effort In Software Projects. Proc. of the 25th International Conference On Software Engineering, 2003, 274--284 Google ScholarDigital Library
- Nesi, P. Managing Object Oriented Projects Better, IEEE Software, vol. 15, no.4. 1998, 50--60. Google ScholarDigital Library
- Oppenheim A. V. and Schafer R. W., "From Frequency to Quefrency: A History of the Cepstrum", IEEE Signal Processing Magazine, September 2004.Google ScholarCross Ref
- Papamichalis, P. E. Practical Approaches to Speech Coding. Prentice Hall, Englewood Cliffs, NJ, 1987 Google ScholarDigital Library
- Rabiner, L. R. and Juang B. H. Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, NJ, 1993 Google ScholarDigital Library
- Ramil, J. F. Algorithmic Cost Estimation Software Evolution. Proceding of Int. Conference on Software Engineeringr, Limerick, Ireland, IEEE CS Press, 2000, 701--703. Google ScholarDigital Library
- Wu, Q. Z., Jou, I. C., Lee, S. Y., Online Signature Verification Using LPC Cepstrum and Neural Networks, IEEE Transactions on Systems, Man, and Cybernetics (27), No. 1, February 1997, pp. 148--153. Google ScholarDigital Library
Index Terms
- Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories
Recommendations
Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories
MSR '05: Proceedings of the 2005 international workshop on Mining software repositoriesThis paper presents an approach to recover time variant information from software repositories. It is widely accepted that software evolves due to factors such as defect removal, market opportunity or adding new features. Software evolution details are ...
Evaluating Software Evolution Based on Pattern Mining
Internetware '17: Proceedings of the 9th Asia-Pacific Symposium on InternetwareSoftware systems need constantly maintaining or adapting to continuously meet the changing business requirements. The process of maintenance or adaptation is software evolution. In general, people hope to evaluate software evolution for guiding software ...
Research friendly software repositories
IWPSE-Evol '09: Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshopsWhat is the future of software evolution? In 1974, Meir M. Lehman had a vision of software evolution being driven by empirical studies of software repositories, and of a theory based on those empirical results. However, that scenario is yet to come. ...
Comments