Equi-Clustream: a framework for clustering time evolving mixed data

Sangam, Ravi Sankar; Om, Hari

doi:10.1007/s11634-018-0316-3

Equi-Clustream: a framework for clustering time evolving mixed data

Regular Article
Published: 26 February 2018

Volume 12, pages 973–995, (2018)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Ravi Sankar Sangam¹ &
Hari Om²

504 Accesses
6 Citations
Explore all metrics

Abstract

In data stream environment, most of the conventional clustering algorithms are not sufficiently efficient, since large volumes of data arrive in a stream and these data points unfold with time. The problem of clustering time-evolving metric data and categorical time-evolving data has separately been well explored in recent years, but the problem of clustering mixed type time-evolving data remains a challenging issue due to an awkward gap between the structure of metric and categorical attributes. In this paper, we devise a generalized framework, termed Equi-Clustream to dynamically cluster mixed type time-evolving data, which comprises three algorithms: a Hybrid Drifting Concept Detection Algorithm that detects the drifting concept between the current sliding window and previous sliding window, a Hybrid Data Labeling Algorithm that assigns an appropriate cluster label to each data vector of the current non-drifting window based on the clustering result of the previous sliding window, and a visualization algorithm that analyses the relationship between the clusters at different timestamps and also visualizes the evolving trends of the clusters. The efficacy of the proposed framework is shown by experiments on synthetic and real world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) Streamkm++: a clustering algorithm for data streams. J Exp Algorithm 17:2–4
Article MathSciNet Google Scholar
Aggarwal CC, Philip SY (2010) On clustering massive text and categorical data streams. Knowl Inf Syst 24(2):171–196
Article Google Scholar
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very Large Data Bases, VLDB Endowment, Berlin, Germany, 9–12 September, 2003. VLDB, vol 29, pp 81–92
Ahmad A, Dey L (2007) A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl Eng 63(2):503–527
Article Google Scholar
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM symposium on Discrete algorithms, New Orleans, Louisiana, 7–9 January, 2007. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. pp 1027–1035
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 26 Aug 2014
Bhatnagar V, Kaur S, Chakravarthy S (2014) Clustering data streams using grid-based synopsis. Knowl Inf Syst 41(1):127–152
Article Google Scholar
Can-Shi Z, Xiao D, Lin Z (2011) A study on the application of data stream clustering mining through a sliding and damped window to intrusion detection. In: 4th International conference on information and computing (ICIC), Phuket Island, Thailand, 25–27 April, 2011. IEEE Computer Society, pp 22–26
Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the 6th SIAM international conference on data mining (SDM), Bethesda, MD, USA, 20–22 April, 2006. SIAM, vol 6, pp 326–337
Cao F, Liang J, Bai L, Zhao X, Dang C (2010) A framework for clustering categorical timeevolving data. IEEE Trans Fuzzy Syst 18(5):872–882
Article Google Scholar
Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, USA, 20–23 August, 2006. ACM, pp 554–560
Cheeseman P, Stutz J (1996) Bayesian classification (AutoClass): theory and results. In: Fayyad UM et al (eds) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, pp 153–180
Google Scholar
Chen HL, Chen MS, Lin SC (2009) Catching the trend: a framework for clustering conceptdrifting categorical data. IEEE Trans Knowl Data Eng 21(5):652–665
Article Google Scholar
Chen L, Zou LJ, Tu L (2012) A clustering algorithm for multiple data streams based on spectral component similarity. Inf Sci 183(1):35–47
Article Google Scholar
Cheung YM, Jia H (2013) Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit 46(8):2228–2238
Article Google Scholar
Chi Y, Song X, Zhou D, Hino K, Tseng BL (2007) Evolutionary spectral clustering by incorporating temporal smoothness. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose, CA, USA, 12–15 August, 2007. ACM, pp 153–162
Chi Y, Song X, Zhou D, Hino K, Tseng BL (2010) Evolutionary spectral clustering by incorporating temporal smoothness. US Patent 7,831,538, 9 Nov 2010
Dai BR, Huang JW, Yeh MY, Chen MS (2006) Adaptive clustering for multiple evolving streams. IEEE Trans Knowl Data Eng 18(9):1166–1180
Article Google Scholar
David G, Averbuch A (2012) Spectralcat: categorical spectral clustering of numerical and nominal data. Pattern Recognit 45(1):416–433
Article Google Scholar
Dubes R, Jain AK (1980) Clustering methodologies in exploratory data analysis. Adv Comput 19:113–228
Article Google Scholar
Forestiero A, Pizzuti C, Spezzano G (2009) Flockstream: a bio-inspired algorithm for clustering evolving data streams. In: Proceeding of the 21st international conference on tools with artificial intelligence (ICTAI’09), Newark, New Jersey, 2–5 November, 2009. IEEE Computer Society, pp 1–8
Gaber MM, Yu PS (2006) Detection and classification of changes in evolving data streams. Int J Inf Technol Decis Mak 5(4):659–670
Article Google Scholar
Golab L, Özsu MT (2003) Issues in data stream management. ACM Sigmod Record 32(2):5–14
Article Google Scholar
Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15(3):515–528
Article Google Scholar
He Z, Xu X, Deng S (2005) Scalable algorithms for clustering large datasets with mixed type attributes. Int J Intell Syst 20(10):1077–1089
Article Google Scholar
Hsu CC, Chen YC (2007) Mining of mixed data with application to catalog marketing. Expert Syst Appl 32(1):12–23
Article Google Scholar
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Article MathSciNet Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River
MATH Google Scholar
Ji J, Bai T, Zhou C, Ma C, Wang Z (2013) An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120:590–596
Article Google Scholar
Jiawei H, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Khalilian M, Mustapha N (2010) Data stream clustering: challenges and issues. arXiv preprint arXiv:1006.5261
Li C, Biswas G (2002) Unsupervised learning with mixed numeric and nominal data. IEEE Trans Knowl Data Eng 14(4):673–690
Article Google Scholar
Luo H, Kong F, Li Y (2006) Clustering mixed data based on evidence accumulation. In: Proceedings of second international conference on advanced data mining and applications (ADMA), Xi’an, China, 14–16 August, 2006. Lecture Notes in Computer Science, vol 4093. Springer, Heidelberg, pp 348–355
Chapter Google Scholar
Mellier R, Myoupo JF (2006) A weighted clustering algorithm for mobile ad hoc networks with non unique weights. In: Proceedings of 2nd international conference on wireless and mobile communications (ICWMC’06) Bucharest, Romania, 29–31 July, 2006. IEEE Computer Society, pp 39–44
Nasraoui O, Rojas C (2006) Robust Clustering for tracking noisy evolving data streams. In: Proceedings of the 6th SIAM international conference on data mining (SDM), Bethesda, MD, USA, 20–22 April, 2006. SIAM, vol 6, pp 619–623
Chapter Google Scholar
Nasraoui O, Soliman M, Saka E, Badia A, Germain R (2008) A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans Knowl Data Eng 20(2):202–215
Article Google Scholar
Oh SH, Kang JS, Byun YC, Park GL, Byun SY (2005) Intrusion detection based on clustering a data stream. In: Proceedings of 3rd ACIS international conference on software engineering research, management and applications, Central Michigan University, Mount Pleasant, Michigan, USA, 11–13 August, 2005. IEEE Computer Society, pp 220–227
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Article Google Scholar
Rokach L (2010) A survey of clustering algorithms. In: Maimon OZ, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer, Heidelberg, pp 269–298
MATH Google Scholar
Sangam RS, Om H (2015) Hybrid data labeling algorithm for clustering large mixed type data. J Intell Inf Syst 45(2):273–293
Article Google Scholar
Su Q, Chen L (2015) A method for discovering clusters of e-commerce interest patterns using click-stream data. Electron Commer Res Appl 14(1):1–13. https://doi.org/10.1016/j.elerap.2014.10.002
Article MathSciNet Google Scholar
Yeh MY, Dai BR, Chen MS (2007) Clustering over multiple evolving streams by events and correlations. IEEE Trans Knowl Data Eng 19(10):1349–1362
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Tadepalligudem, Andhra Pradesh, 534101, India
Ravi Sankar Sangam
Department of Computer Science and Engineering, Indian Institute of Technology (Indian School of Mines), Dhanbad, Jharkhand, 826004, India
Hari Om

Authors

Ravi Sankar Sangam
View author publications
You can also search for this author in PubMed Google Scholar
Hari Om
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ravi Sankar Sangam.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sangam, R.S., Om, H. Equi-Clustream: a framework for clustering time evolving mixed data. Adv Data Anal Classif 12, 973–995 (2018). https://doi.org/10.1007/s11634-018-0316-3

Download citation

Received: 24 January 2015
Revised: 15 November 2017
Accepted: 17 February 2018
Published: 26 February 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11634-018-0316-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Equi-Clustream: a framework for clustering time evolving mixed data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Equi-Clustream: a framework for clustering time evolving mixed data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation