An Efficient Similarity Measure for Clustering of Categorical Sequences

Noh, Sang-Kyun; Kim, Yong-Min; Kim, DongKook; Noh, Bong-Nam

doi:10.1007/11941439_41

Sang-Kyun Noh²⁰,
Yong-Min Kim²¹,
DongKook Kim²² &
…
Bong-Nam Noh²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4304))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

3682 Accesses

Abstract

In this paper, we propose an efficient similarity measure as pre-processing method for clustering of categorical and sequential attributes. The similarity measure is based on a new dynamic programming algorithm, which computes sequence comparison scoring from the gap penalty matrix. This is presented by normalizing sequence comparison scoring. Self-evaluation of the proposed similarity measure is conducted by experimental results of clustering, which is an unsupervised learning algorithm greatly influenced by similarity measure between clusters. In the experiment, Tcpdump Data from DARPA 1999 Intrusion Detection Evaluation Data Sets are used. These transmission data are composed of sequential packet data in a network. Finally, the results of comparison experiments are discussed.

This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Assessment) (IITA-2006-C1090-0603-0027).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Two-stage pruning method for gram-based categorical sequence clustering

Article 15 November 2017

Clustering Categorical Sequences with Variable-Length Tuples Representation

Clustering of Biological Sequences

References

Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to algorithms, 14th edn. MIT Press and McGraw-Hill Book (1994)
Google Scholar
Sali, A., Blundell, T.L.: Definition of general topological equivalence in protein structures: A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J. Mol. Biol. 212, 403–428 (1990)
Article Google Scholar
Tillmann, C., Ney, H.: Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation. Computational Linguistics 29(1), 97–133 (2003)
Article Google Scholar
Myers, C., et al.: Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition. IEEE Trans. on acoustics, speech, and signal processing ASSP-28(6) (December 1980)
Google Scholar
Atallah, M.J.: Algorithms and Theory of Computation Handbook, CRC Press, 2000 N.W. Corporate Blvd., Boca Raton, FL 33431-9868, USA (1999)
Google Scholar
Allison, L.: Dynamic programming algorithm (DPA) for edit-distance. In: Algorithms and Data Structures Research & Reference Material, School of Computer Science and Software Engineering, Monash University, Australia (1999)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Proceeding of the IEEE International Conference on Data Engineering, Sydney (March 1999)
Google Scholar
MIT Lincoln Laboratory, DARPA Intrusion Detection Evaluation Data Sets, http://www.ll.mit.edu/IST/ideval/data/data_index.html

Download references

Author information

Authors and Affiliations

Interdisciplinary Program of Information Security, Chonnam National University, Korea
Sang-Kyun Noh
Dept. of Electronic Commerce, Chonnam National University, Korea
Yong-Min Kim
Div. of Electronics Computer Engineering, Chonnam National University, Korea
DongKook Kim & Bong-Nam Noh

Authors

Sang-Kyun Noh
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Min Kim
View author publications
You can also search for this author in PubMed Google Scholar
DongKook Kim
View author publications
You can also search for this author in PubMed Google Scholar
Bong-Nam Noh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DisPRR, National ICT Australia Ltd, QLD, Australia
Abdul Sattar
School of Computing, University of Tasmania, Sandy Bay, 7005, Tasmania, Australia
Byeong-ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Noh, SK., Kim, YM., Kim, D., Noh, BN. (2006). An Efficient Similarity Measure for Clustering of Categorical Sequences. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_41

Download citation

DOI: https://doi.org/10.1007/11941439_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Efficient Similarity Measure for Clustering of Categorical Sequences

Abstract

Access this chapter

Preview

Similar content being viewed by others

Two-stage pruning method for gram-based categorical sequence clustering

Clustering Categorical Sequences with Variable-Length Tuples Representation

Clustering of Biological Sequences

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Efficient Similarity Measure for Clustering of Categorical Sequences

Abstract

Access this chapter

Preview

Similar content being viewed by others

Two-stage pruning method for gram-based categorical sequence clustering

Clustering Categorical Sequences with Variable-Length Tuples Representation

Clustering of Biological Sequences

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation