Skip to main content
Log in

Equi-Clustream: a framework for clustering time evolving mixed data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In data stream environment, most of the conventional clustering algorithms are not sufficiently efficient, since large volumes of data arrive in a stream and these data points unfold with time. The problem of clustering time-evolving metric data and categorical time-evolving data has separately been well explored in recent years, but the problem of clustering mixed type time-evolving data remains a challenging issue due to an awkward gap between the structure of metric and categorical attributes. In this paper, we devise a generalized framework, termed Equi-Clustream to dynamically cluster mixed type time-evolving data, which comprises three algorithms: a Hybrid Drifting Concept Detection Algorithm that detects the drifting concept between the current sliding window and previous sliding window, a Hybrid Data Labeling Algorithm that assigns an appropriate cluster label to each data vector of the current non-drifting window based on the clustering result of the previous sliding window, and a visualization algorithm that analyses the relationship between the clusters at different timestamps and also visualizes the evolving trends of the clusters. The efficacy of the proposed framework is shown by experiments on synthetic and real world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) Streamkm++: a clustering algorithm for data streams. J Exp Algorithm 17:2–4

    Article  MathSciNet  Google Scholar 

  • Aggarwal CC, Philip SY (2010) On clustering massive text and categorical data streams. Knowl Inf Syst 24(2):171–196

    Article  Google Scholar 

  • Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very Large Data Bases, VLDB Endowment, Berlin, Germany, 9–12 September, 2003. VLDB, vol 29, pp 81–92

  • Ahmad A, Dey L (2007) A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl Eng 63(2):503–527

    Article  Google Scholar 

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM symposium on Discrete algorithms, New Orleans, Louisiana, 7–9 January, 2007. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. pp 1027–1035

  • Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 26 Aug 2014

  • Bhatnagar V, Kaur S, Chakravarthy S (2014) Clustering data streams using grid-based synopsis. Knowl Inf Syst 41(1):127–152

    Article  Google Scholar 

  • Can-Shi Z, Xiao D, Lin Z (2011) A study on the application of data stream clustering mining through a sliding and damped window to intrusion detection. In: 4th International conference on information and computing (ICIC), Phuket Island, Thailand, 25–27 April, 2011. IEEE Computer Society, pp 22–26

  • Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the 6th SIAM international conference on data mining (SDM), Bethesda, MD, USA, 20–22 April, 2006. SIAM, vol 6, pp 326–337

  • Cao F, Liang J, Bai L, Zhao X, Dang C (2010) A framework for clustering categorical timeevolving data. IEEE Trans Fuzzy Syst 18(5):872–882

    Article  Google Scholar 

  • Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, USA, 20–23 August, 2006. ACM, pp 554–560

  • Cheeseman P, Stutz J (1996) Bayesian classification (AutoClass): theory and results. In: Fayyad UM et al (eds) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, pp 153–180

    Google Scholar 

  • Chen HL, Chen MS, Lin SC (2009) Catching the trend: a framework for clustering conceptdrifting categorical data. IEEE Trans Knowl Data Eng 21(5):652–665

    Article  Google Scholar 

  • Chen L, Zou LJ, Tu L (2012) A clustering algorithm for multiple data streams based on spectral component similarity. Inf Sci 183(1):35–47

    Article  Google Scholar 

  • Cheung YM, Jia H (2013) Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit 46(8):2228–2238

    Article  Google Scholar 

  • Chi Y, Song X, Zhou D, Hino K, Tseng BL (2007) Evolutionary spectral clustering by incorporating temporal smoothness. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose, CA, USA, 12–15 August, 2007. ACM, pp 153–162

  • Chi Y, Song X, Zhou D, Hino K, Tseng BL (2010) Evolutionary spectral clustering by incorporating temporal smoothness. US Patent 7,831,538, 9 Nov 2010

  • Dai BR, Huang JW, Yeh MY, Chen MS (2006) Adaptive clustering for multiple evolving streams. IEEE Trans Knowl Data Eng 18(9):1166–1180

    Article  Google Scholar 

  • David G, Averbuch A (2012) Spectralcat: categorical spectral clustering of numerical and nominal data. Pattern Recognit 45(1):416–433

    Article  Google Scholar 

  • Dubes R, Jain AK (1980) Clustering methodologies in exploratory data analysis. Adv Comput 19:113–228

    Article  Google Scholar 

  • Forestiero A, Pizzuti C, Spezzano G (2009) Flockstream: a bio-inspired algorithm for clustering evolving data streams. In: Proceeding of the 21st international conference on tools with artificial intelligence (ICTAI’09), Newark, New Jersey, 2–5 November, 2009. IEEE Computer Society, pp 1–8

  • Gaber MM, Yu PS (2006) Detection and classification of changes in evolving data streams. Int J Inf Technol Decis Mak 5(4):659–670

    Article  Google Scholar 

  • Golab L, Özsu MT (2003) Issues in data stream management. ACM Sigmod Record 32(2):5–14

    Article  Google Scholar 

  • Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15(3):515–528

    Article  Google Scholar 

  • He Z, Xu X, Deng S (2005) Scalable algorithms for clustering large datasets with mixed type attributes. Int J Intell Syst 20(10):1077–1089

    Article  Google Scholar 

  • Hsu CC, Chen YC (2007) Mining of mixed data with application to catalog marketing. Expert Syst Appl 32(1):12–23

    Article  Google Scholar 

  • Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304

    Article  MathSciNet  Google Scholar 

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  • Ji J, Bai T, Zhou C, Ma C, Wang Z (2013) An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120:590–596

    Article  Google Scholar 

  • Jiawei H, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Khalilian M, Mustapha N (2010) Data stream clustering: challenges and issues. arXiv preprint arXiv:1006.5261

  • Li C, Biswas G (2002) Unsupervised learning with mixed numeric and nominal data. IEEE Trans Knowl Data Eng 14(4):673–690

    Article  Google Scholar 

  • Luo H, Kong F, Li Y (2006) Clustering mixed data based on evidence accumulation. In: Proceedings of second international conference on advanced data mining and applications (ADMA), Xi’an, China, 14–16 August, 2006. Lecture Notes in Computer Science, vol 4093. Springer, Heidelberg, pp 348–355

    Chapter  Google Scholar 

  • Mellier R, Myoupo JF (2006) A weighted clustering algorithm for mobile ad hoc networks with non unique weights. In: Proceedings of 2nd international conference on wireless and mobile communications (ICWMC’06) Bucharest, Romania, 29–31 July, 2006. IEEE Computer Society, pp 39–44

  • Nasraoui O, Rojas C (2006) Robust Clustering for tracking noisy evolving data streams. In: Proceedings of the 6th SIAM international conference on data mining (SDM), Bethesda, MD, USA, 20–22 April, 2006. SIAM, vol 6, pp 619–623

    Chapter  Google Scholar 

  • Nasraoui O, Soliman M, Saka E, Badia A, Germain R (2008) A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Trans Knowl Data Eng 20(2):202–215

    Article  Google Scholar 

  • Oh SH, Kang JS, Byun YC, Park GL, Byun SY (2005) Intrusion detection based on clustering a data stream. In: Proceedings of 3rd ACIS international conference on software engineering research, management and applications, Central Michigan University, Mount Pleasant, Michigan, USA, 11–13 August, 2005. IEEE Computer Society, pp 220–227

  • Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356

    Article  Google Scholar 

  • Rokach L (2010) A survey of clustering algorithms. In: Maimon OZ, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer, Heidelberg, pp 269–298

    MATH  Google Scholar 

  • Sangam RS, Om H (2015) Hybrid data labeling algorithm for clustering large mixed type data. J Intell Inf Syst 45(2):273–293

    Article  Google Scholar 

  • Su Q, Chen L (2015) A method for discovering clusters of e-commerce interest patterns using click-stream data. Electron Commer Res Appl 14(1):1–13. https://doi.org/10.1016/j.elerap.2014.10.002

    Article  MathSciNet  Google Scholar 

  • Yeh MY, Dai BR, Chen MS (2007) Clustering over multiple evolving streams by events and correlations. IEEE Trans Knowl Data Eng 19(10):1349–1362

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ravi Sankar Sangam.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sangam, R.S., Om, H. Equi-Clustream: a framework for clustering time evolving mixed data. Adv Data Anal Classif 12, 973–995 (2018). https://doi.org/10.1007/s11634-018-0316-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-018-0316-3

Keywords

Mathematics Subject Classification

Navigation