skip to main content
10.1145/2804360.2804368acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Estimating product evolution graph using Kolmogorov complexity

Published: 30 August 2015 Publication History

Abstract

This paper proposes a method of estimating a product evolution graph based on Kolmogorov complexity. The method EEGL applies lossless compression to the source code of products, then, presumes a derivation relationship between two products when the increase of information between the two products is small. An evaluation experiment confirms that EEGL and an existing method PRET tends to produce different errors when estimating evolution graph results.

References

[1]
T. Arbuckle. Studying software evolution using artefacts’ shared information content. Sci. Comput. Program., 76(12):1078–1097, Dec. 2011.
[2]
M. Burrows, D. J. Wheeler, M. Burrows, and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical report, 1994.
[3]
G. J. Chaitin. On the length of programs for computing finite binary sequences. J. ACM, 13(4):547–569, Oct. 1966.
[4]
S. S. Chawathe, A. Rajaraman, and H. G.-M. an d Jennifer Widom. Change detection in hierarchically structured information. SIGMOD Rec., 25(2):493–504, 1996.
[5]
R. Cilibrasi and P. Vitanyi. Clustering by compression. Information Theory, IEEE Transactions on, 51(4):1523–1545, April 2005.
[6]
B. Fluri, M. Wuersch, M. PInzger, and H. Gall. Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Trans. Softw. Eng., 33(11):725–743, Nov. 2007.
[7]
Y. Fujiwara, T. Gotoh, and H. Iguchi. Product/service value validation based on kolmogorov complexity (in Japanese). In Proceedings of Forum on Information Technology 2009, volume 8, pages 55–62. FIT committee, August 2009.
[8]
Y. Hayase, M. Matsushita, and K. Inoue. Revision control system using delta script of syntax tree. In Proceedings of the 12th International Workshop on Software Configuration Management, SCM ’05, pages 133–149, New York, NY, USA, 2005. ACM.
[9]
D. A. Huffman. A method for the construction of minimum-redundancy codes. Proceedings of the Institute of Radio Engineers, 40(9):1098–1101, September 1952.
[10]
J. W. Hunt and M. D. McIlroy. An algorithm for differential file comparison. Technical Report CSTR 41, Bell Laboratories, Murray Hill, NJ, 1976.
[11]
T. Kanda, T. Ishio, and K. Inoue. Extraction of product evolution tree from source code of product variants. In Proceedings of the 17th International Software Product Line Conference, SPLC ’13, pages 141–150, 2013.
[12]
S. R. Kirk and S. Jenkins. Information theory-based software metrics and obfuscation. Journal of Systems and Software, 72(2):179 – 186, 2004.
[13]
A. N. Kolmogorov. Three approaches to the quantitative definition of information. International Journal of Computer Mathematics, 2(1-4):157–168, 1968.
[14]
M. Lehman. Programs, life cycles, and laws of software evolution. Proceedings of the IEEE, 68(9):1060–1076, Sept 1980.
[15]
M. Lehman, J. Ramil, P. Wernick, D. Perry, and W. Turski. Metrics and laws of software evolution-the nineties view. In Software Metrics Symposium, 1997. Proceedings., Fourth International, pages 20–32, Nov 1997.
[16]
M. Li, X. Chen, X. Li, B. Ma, and P. Vitanyi. The similarity metric. Information Theory, IEEE Transactions on, 50(12):3250–3264, Dec 2004.
[17]
D. Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322–336, Apr. 1978.
[18]
G. N. N. Martin. Range encoding: an algorithm for removing redundancy from a digitized message. In Proceedings of the Video & Data Recording Conference, Southampton, Jul. 1979.
[19]
E. W. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1:251–266, 1986.
[20]
R. Solomonoff. A formal theory of inductive inference. part I. Information and Control, 7(1):1 – 22, 1964.
[21]
R. Solomonoff. A formal theory of inductive inference. part II. Information and Control, 7(2):224 – 254, 1964.
[22]
S. Wu, U. Manber, G. Myers, and W. Miller. An O(NP) sequence comparison algorithm. Inf. Process. Lett., 35(6):317–323, 1990.
[23]
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor., 23(3):337–343, Sept. 2006.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IWPSE 2015: Proceedings of the 14th International Workshop on Principles of Software Evolution
August 2015
78 pages
ISBN:9781450338165
DOI:10.1145/2804360
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Kolmogorov complexity
  2. Software evolution
  3. estimation
  4. evolution graph
  5. lossless compression

Qualifiers

  • Research-article

Conference

ESEC/FSE'15
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 84
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media