Abstract
Due to the evolving nature of temporal data, clusters often exhibit complex dynamic patterns like birth and death. In particular, a cluster can branch into multiple clusters simultaneously. Intuitively, clusters can evolve as evolutionary trees over time. However, existing models are incapable of recovering the tree-like evolutionary trace in temporal data. To this end, we propose an Evolving Chinese Restaurant Process (ECRP), which is essentially a temporal non-parametric clustering model. ECRP incorporates dynamics of cluster number, parameters and popularity. ECRP allows each cluster to have multiple branches over time. We design an online learning framework based on Gibbs sampling to infer the evolutionary traces of clusters over time. In experiments, we validate that ECRP can capture tree-like evolutionary traces of clusters from real-world data sets and achieve better clustering results than the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahmed, A., Hong, L., Smola, A.: Nested chinese restaurant franchise process: applications to user tracking and document modeling. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1426–1434 (2013)
Ahmed, A., Xing, E.P.: Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. In: SDM, pp. 219–230. SIAM (2008)
Ahmed, A., Xing, E.P.: Timeline: A dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream (2012). arXiv preprint http://arxiv.org/abs/1203.3463arXiv:1203.3463
Blei, D.M., Frazier, P.I.: Distance dependent chinese restaurant processes. The Journal of Machine Learning Research 12, 2461–2488 (2011)
Blei, D.M., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested chinese restaurant process. In NIPS 16, (2003)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of machine Learning research 3, 993–1022 (2003)
Chakrabarti, D., Kumar, R., Tomkins, A.: Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD 2006, pp. 554–560. ACM, New York (2006)
Chi, Y., Song, X., Zhou, D., Hino, K., Tseng, B.L.: Evolutionary spectral clustering by incorporating temporal smoothness. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 153–162. ACM (2007)
Gao, Z., Song, Y., Liu, S., Wang, H., Wei, H., Chen, Y., Cui, W.: Tracking and connecting topics via incremental hierarchical dirichlet processes. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 1056–1061, December 2011
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian data analysis. CRC Press (2013)
Griffin, J.E., Steel, M.J.: Order-based dependent dirichlet processes. Journal of the American Statistical Association 101(473), 179–194 (2006)
Kawamae, N.: Theme chronicle model: Chronicle consists of timestamp and topical words over each theme. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 2065–2069. ACM, New York (2012)
Neal, R.M.: Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics 9(2), 249–265 (2000)
Pitman, J.: Exchangeable and partially exchangeable random partitions. Probability Theory and Related Fields 102(2), 145–158 (1995)
Ren, L., Dunson, D.B., Carin, L.: The dynamic hierarchical dirichlet process. In: Proceedings of the 25th International Conference on Machine Learning, pp. 824–831. ACM (2008)
Wang, X., McCallum, A.: Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433. ACM (2006)
Zhang, P., Li, J., Wang, P., Gao, B., Zhu, X., Guo, L.: Enabling fast prediction for ensemble models on data streams. In: KDD (2011)
Zhang, P., Zhou, C., Wang, P., Gao, B., Zhu, X., Guo, L.: E-tree: An efficient indexing structure for ensemble models on data streams. IEEE Trans. Knowl. Data Eng. 27(2), 461–474 (2015)
Zhang, P., Zhu, X., Shi, Y.: Categorizing and mining concept drifting data streams. In: KDD (2008)
Zhu, X., Ghahramani, Z., Lafferty, J.: Time-sensitive dirichlet process mixture models. Technical report, DTIC Document (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, P., Zhou, C., Zhang, P., Feng, W., Guo, L., Fang, B. (2015). Evolving Chinese Restaurant Processes for Modeling Evolutionary Traces in Temporal Data. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)