Skip to main content

An Efficient Similarity Measure for Clustering of Categorical Sequences

  • Conference paper
AI 2006: Advances in Artificial Intelligence (AI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4304))

Included in the following conference series:

  • 3682 Accesses

Abstract

In this paper, we propose an efficient similarity measure as pre-processing method for clustering of categorical and sequential attributes. The similarity measure is based on a new dynamic programming algorithm, which computes sequence comparison scoring from the gap penalty matrix. This is presented by normalizing sequence comparison scoring. Self-evaluation of the proposed similarity measure is conducted by experimental results of clustering, which is an unsupervised learning algorithm greatly influenced by similarity measure between clusters. In the experiment, Tcpdump Data from DARPA 1999 Intrusion Detection Evaluation Data Sets are used. These transmission data are composed of sequential packet data in a network. Finally, the results of comparison experiments are discussed.

This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment) (IITA-2006-C1090-0603-0027).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to algorithms, 14th edn. MIT Press and McGraw-Hill Book (1994)

    Google Scholar 

  2. Sali, A., Blundell, T.L.: Definition of general topological equivalence in protein structures: A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J. Mol. Biol. 212, 403–428 (1990)

    Article  Google Scholar 

  3. Tillmann, C., Ney, H.: Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation. Computational Linguistics 29(1), 97–133 (2003)

    Article  Google Scholar 

  4. Myers, C., et al.: Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition. IEEE Trans. on acoustics, speech, and signal processing ASSP-28(6) (December 1980)

    Google Scholar 

  5. Atallah, M.J.: Algorithms and Theory of Computation Handbook, CRC Press, 2000 N.W. Corporate Blvd., Boca Raton, FL 33431-9868, USA (1999)

    Google Scholar 

  6. Allison, L.: Dynamic programming algorithm (DPA) for edit-distance. In: Algorithms and Data Structures Research & Reference Material, School of Computer Science and Software Engineering, Monash University, Australia (1999)

    Google Scholar 

  7. Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proceeding of the IEEE International Conference on Data Engineering, Sydney (March 1999)

    Google Scholar 

  8. MIT Lincoln Laboratory, DARPA Intrusion Detection Evaluation Data Sets, http://www.ll.mit.edu/IST/ideval/data/data_index.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Noh, SK., Kim, YM., Kim, D., Noh, BN. (2006). An Efficient Similarity Measure for Clustering of Categorical Sequences. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_41

Download citation

  • DOI: https://doi.org/10.1007/11941439_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49787-5

  • Online ISBN: 978-3-540-49788-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics