Skip to main content

Cluster: A Fast Tool to Identify Groups of Similar Programs

  • Conference paper
  • First Online:
Computing and Combinatorics (COCOON 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2387))

Included in the following conference series:

  • 562 Accesses

Abstract

cluster is a tool to partition a large pool of C programs into groups according to structural similarity. Its method involves calculating an alignment score for each program against a mosaic made of randomly selected code fragments of fixed size from the pool. The scores are then grouped together so that the distance between two adjacent members of a group is at most some threshold value. cluster is effective in identifying tight clusters of similar programs and is capable of distributing its workload over a network of workstations to achieve very fast running times. As a tool, cluster is highly configurable: the user can adjust its alignment scoring scheme and clustering threshold as well as obtain visual alignments of programs suspected to be similar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cormen, T., Leiserson, C., and Rivest, R. Introduction to Algorithms. MIT Press and McGraw Hill, 1992.

    Google Scholar 

  2. Gitchell, D., and Tran, N. Sim: A utility for detecting similarity in computer programs. SIGCSE Bulletin (ACM Special Interest Group on Computer Science Education) 31 (1999).

    Google Scholar 

  3. Hirschberg, D. A linear space algorithm for computing maximal commonsubse-quences. Communications of the ACM 18 (1975), 341–343.

    Article  MATH  MathSciNet  Google Scholar 

  4. Huang, X., Hardison, R. C., and Miller, W. A space-efficient algorithm for local similarities. Computer Applications in the Biosciences 6,4 (1990), 373–381.

    Google Scholar 

  5. Hunt, J. W., and Szymanski, T. G. A fast algorithm for computing longest common subsequences. Communications of the ACM 20,5 (May 1977), 350–353.

    Google Scholar 

  6. Myers, E. W., and Miller, W. Optimal alignments in linear space. Computer Applications in the Biosciences 4,1 (1988), 11–17.

    Google Scholar 

  7. Smith, T. F., and Waterman, M. S. Identification of common molecular subsequences. Journal of Molecular Biology 147 (1981), 195–197.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carter, C., Tran, N. (2002). Cluster: A Fast Tool to Identify Groups of Similar Programs. In: Ibarra, O.H., Zhang, L. (eds) Computing and Combinatorics. COCOON 2002. Lecture Notes in Computer Science, vol 2387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45655-4_22

Download citation

  • DOI: https://doi.org/10.1007/3-540-45655-4_22

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43996-7

  • Online ISBN: 978-3-540-45655-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics