Constrained data clustering by depth control and progressive constraint relaxation

Dai, Bi-Ru; Lin, Cheng-Ru; Chen, Ming-Syan

doi:10.1007/s00778-005-0164-6

Constrained data clustering by depth control and progressive constraint relaxation

Regular Paper
Published: 11 January 2007

Volume 16, pages 201–217, (2007)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Bi-Ru Dai¹,
Cheng-Ru Lin¹ &
Ming-Syan Chen¹

94 Accesses
5 Citations
Explore all metrics

Abstract

In order to import the domain knowledge or application-dependent parameters into the data mining systems, constraint-based mining has attracted a lot of research attention recently. In this paper, the attributes employed to model the constraints are called constraint attributes and those attributes involved in the objective function to be optimized are called optimization attributes. The constrained clustering considered in this paper is conducted in such a way that the objective function of optimization attributes is optimized subject to the condition that the imposed constraint is satisfied. Explicitly, we address the problem of constrained clustering with numerical constraints, in which the constraint attribute values of any two data items in the same cluster are required to be within the corresponding constraint range. This numerical constrained clustering problem, however, cannot be dealt with by any conventional clustering algorithms. Consequently, we devise several effective and efficient algorithms to solve such a clustering problem. It is noted that due to the intrinsic nature of the numerical constrained clustering, there is an order dependency on the process of attaining the clustering, which in many cases degrades the clustering results. In view of this, we devise a progressive constraint relaxation technique to remedy this drawback and improve the overall performance of clustering results. Explicitly, by using a smaller (tighter) constraint range in earlier iterations of merge, we will have more room to relax the constraint and seek for better solutions in subsequent iterations. It is empirically shown that the progressive constraint relaxation technique is able to improve not only the execution efficiency but also the clustering quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Büchner, A.G., Mulvenna, M.D.: Discovery internet marketing intelligence through online analytical web usage mining. In: ACM SIGMOD Rec., 27(4), 54–61, December (1998)
Chen, M.-S., Han, J., Yu, P.S.: Data mining: An overview from database perspective. IEEE Trans. Knowledge Data Eng. 8(6), 866–883 (1996)
Article Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA (1996)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2000)
Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.-S.: Fast algorithms for projected clustering. In: Proceedings of ACM SIGMOD (1999)
Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained K-Means Clustering. MSR-TR-2000-65, Microsoft Research, May (2000)
Lin, C.-R., Chen, M.-S.: A robust and efficient clustering algorithm based on cohesion self-merging. In: Proceedings of the ACM SIGKDD, July (2002)
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proceedings of the VLDB (1994)
Tung, A.K.H., Han, J., Lakshmanan, L.V.S., Ng, R.T.: Constraint-based clustering in large databases. In: Proceedings of the 2001 International Conference on Database Theory, January (2001)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large database. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 103–114 (1996)
Crescenzi, P., Kann, V.: A compendium of NP optimization problems [http://www.nada.kth.se/~viggo/problemlist/compendium.html]
Lu, S.Y., Fu, K.S.: A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Syst. Man Cybern. 8, 381–389 (1978)
Article MathSciNet MATH Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
MATH Google Scholar
Dubes, R.C.: How many clusters are best?—an experiment. Pattern Recognit. 20(6), 645–663 (1987)
Article Google Scholar
Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. Freeman, London (1973)
MATH Google Scholar
Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation. Addison-Wesley Longman, Reading, MA (1991)
Google Scholar
Rose, K., Gurewitz, E., Fox, G.: Constrained clustering as an optimization method. IEEE Trans. Pattern Anal. Mach. Intell. 15(8), 785–794 (1993)
Article Google Scholar
Estivill-Castro, V., Lee, I.: Autoclust+:automatic clustering of point-data sets in the presence of obstacles. In: Proceedings of the TSDM, pp. 133–146 (2000)
Tung, A.K.H., Hou, J., Han, J.: Spatial clustering in the presence of obstacles. In: Proceedings of the ICDE, pp. 359–367 (2001)
Zaïane, O.R., Foss, A., Lee, C.-H., Wang, W.: On data clustering analysis: Scalability, constraints, and validation. In: PAKDD, pp. 28–39 (2002)
Klein, D., Kamvar, S.D., Manning, C.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: Proceedings of the the Nineteenth International Conference on Machine Learning (ICML-2002), Sydney, Australia (2002)
Lin, C.-R., Chen, M.-S.: On the optimal clustering of sequential data. In: Proceedings of the 2nd SIAM International Conference on Data Mining, April (2002)
Jagadish, H.V., Madar, J., Ng, R.T.: Semantic compression and pattern extraction with fascicles. In: Proceedings of the VLDB, pp. 186–198 (1999)
Yeung, M., Yeo, B.L.: Time-constrained clustering for segmentation of video into story units. In: International Conference on Pattern Recognition, pp. 375–380, May (1996)
Palmer, C.R., Faloutsos, C.: Density-biased sampling: An improved method for data mining and clustering. In: ACM SIGMOD Int. Conf. Manage. Data (2000)
King, B.: Step-wise clustering procedures. J. Am. Stat. Assoc. 69, 86–101 (1967)
Article Google Scholar
Oyang, Y.-J., Chen, C.-Y., Yang, T.-W.: A study on the hierachical data clustering algorithm based on gravity theory. In: Proceedings of the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 350–361 (2001)

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC
Bi-Ru Dai, Cheng-Ru Lin & Ming-Syan Chen

Authors

Bi-Ru Dai
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Ru Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Syan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bi-Ru Dai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dai, BR., Lin, CR. & Chen, MS. Constrained data clustering by depth control and progressive constraint relaxation. The VLDB Journal 16, 201–217 (2007). https://doi.org/10.1007/s00778-005-0164-6

Download citation

Received: 31 July 2004
Accepted: 06 May 2005
Published: 11 January 2007
Issue Date: April 2007
DOI: https://doi.org/10.1007/s00778-005-0164-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained data clustering by depth control and progressive constraint relaxation

Abstract

Access this article

Similar content being viewed by others

An overview of machine learning techniques in constraint solving

Lightning search algorithm: a comprehensive survey

Evaluating the performance of meta-heuristic algorithms on CEC 2021 benchmark problems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constrained data clustering by depth control and progressive constraint relaxation

Abstract

Access this article

Similar content being viewed by others

An overview of machine learning techniques in constraint solving

Lightning search algorithm: a comprehensive survey

Evaluating the performance of meta-heuristic algorithms on CEC 2021 benchmark problems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation