Abstract
In this paper, we introduce a novel interactive framework to handle both instance-level and temporal smoothness constraints for clustering large temporal data. It consists of a constrained clustering algorithm, called CVQE+, which optimizes the clustering quality, constraint violation and the historical cost between consecutive data snapshots. At the center of our framework is a simple yet effective active learning technique, named Border, for iteratively selecting the most informative pairs of objects to query users about, and updating the clustering with new constraints. Those constraints are then propagated inside each data snapshot and between snapshots via two schemes, called constraint inheritance and constraint propagation, to further enhance the results. Experiments show better or comparable clustering results than state-of-the-art techniques as well as high scalability for large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: SDM, pp. 333–344 (2004)
Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML (2004)
Birgé, L., Rozenholc, Y.: How many bins should be put in a regular histogram. ESAIM: Probab. Stat. 10, 24–45 (2006)
Chakrabarti, D., Kumar, R., Tomkins, A.: Evolutionary clustering. In: SIGKDD, pp. 554–560 (2006)
Cohn, D., Caruana, R., Mccallum, A.: Semi-supervised clustering with user feedback. Technical report (2003)
Davidson, I.: Two approaches to understanding when constraints help clustering. In: KDD, pp. 1312–1320 (2012)
Davidson, I., Basu, S.: A survey of clustering with instance level constraints. TKDD (2007)
Davidson, I., Ravi, S.S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: SDM, pp. 138–149 (2005)
Davidson, I., Ravi, S.S., Ester, M.: Efficient incremental constrained clustering. In: KDD, pp. 240–249 (2007)
Eaton, E., desJardins, M., Jacob, S.: Multi-view clustering with constraint propagation for learning with an incomplete mapping between views. In: CIKM, pp. 389–398 (2010)
Eaton, E., desJardins, M., Jacob, S.: Multi-view constrained clustering with an incomplete mapping between views. Knowl. Inf. Syst. 38(1), 231–257 (2014)
Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Huang, R., Lam, W.: Semi-supervised document clustering via active learning with pairwise constraints. In: ICDM, pp. 517–522 (2007)
Huang, Y., Mitchell, T.M.: Text clustering with extended user feedback. In: SIGIR, pp. 413–420 (2006)
Mallapragada, P.K., Jin, R., Jain, A.K.: Active query selection for semi-supervised clustering. In: ICPR, pp. 1–4 (2008)
Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: ICML, pp. 1073–1080 (2009)
Pelleg, D., Baras, D.: K-means with large and noisy constraint sets. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 674–682. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_67
Chouakria, A.D., Mai, S.T., Amer-Yahia, S.: Scalable active temporal constrained clustering. In: EDBT (2018)
Xiong, S., Azimi, J., Fern, X.Z.: Active learning of constraints for semi-supervised clustering. IEEE Trans. Knowl. Data Eng. 26(1), 43–54 (2014)
Acknowledgment
This work is supported by the CDP Life Project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Mai, S.T., Amer-Yahia, S., Chouakria, A.D., Nguyen, K.T., Nguyen, AD. (2018). Scalable Active Constrained Clustering for Temporal Data. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-91452-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91451-0
Online ISBN: 978-3-319-91452-7
eBook Packages: Computer ScienceComputer Science (R0)