A new efficient algorithm based on DC programming and DCA for clustering

An, Le Thi Hoai; Belghiti, M. Tayeb; Tao, Pham Dinh

doi:10.1007/s10898-006-9066-4

A new efficient algorithm based on DC programming and DCA for clustering

Published: 09 August 2006

Volume 37, pages 593–608, (2007)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Le Thi Hoai An¹,
M. Tayeb Belghiti² &
Pham Dinh Tao³

512 Accesses
41 Citations
Explore all metrics

Abstract

In this paper, a version of K-median problem, one of the most popular and best studied clustering measures, is discussed. The model using squared Euclidean distances terms to which the K-means algorithm has been successfully applied is considered. A fast and robust algorithm based on DC (Difference of Convex functions) programming and DC Algorithms (DCA) is investigated. Preliminary numerical solutions on real-world databases show the efficiency and the superiority of the appropriate DCA with respect to the standard K-means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alon N., Spencer J.H. (1991): The Probabilistic Method. Wiley, New York, NY
Google Scholar
Arora, S., Kannan, R.: Learning mixtures of arbitrary Gaussians. In: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, July 6–8, pp. 247–257. Heraklion, Crete, Greece (2001)
Al-Sultan K. (1995): A Tabu search approach to the clustering problem. Pattern Recogn. 28(9): 443–1453
Article Google Scholar
Bradley, P.S., Mangasarian, O.L., Street, W.N.: Clustering via concave minimization, Technical Report 96-03, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin May 1996. Advances In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Neural Information processing Systems 9, pp. 368–374. MIT Press, Cambridge, MA Available by ftp://ftp.cs.wisc.edu/math-prog/tech-trports/96-03.ps.Z.
Bradley, B.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Shavlik, J. (eds.) Machine Learning Proceedings of the Fifteenth International Conferences(ICML’98), pp. 82–90. San Francisco, CA 1998, Morgan Kaufmann.
Charikar, M., Guha, S.: Improved combinatorial algorithms for facility location and k-median problems. In: Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science, 17–18 October, pp. 378–388. New York, NY, USA (1999)
Charikar, M., Guha, S., Tardos, E., Shmoys, D.B.: A constant-factor approximation algorithm for the k-median problem. In: Proceedings of the 31st Annual ACM Symposium on Theory of Computing pp. 1–10 (1999)
De Leeuw J. (1997): Applications of convex analysis to multidimensional scaling, Recent developments. In: Barra J.R., et al. (eds). Statistics. North-Holland Publishing company, Amsterdam, pp. 133–145
Google Scholar
De Leeuw J. (1988): Convergence of the majorization method for multidimensional scaling. J. Classi. 5, 163–180
Article Google Scholar
Dhilon I.S., Korgan J., Nicholas C. (2003): Feature Selection and Document Clustering. In: Berry M.W. (eds). A Comprehensive Survey of Text Mining. Springer-Verlag, Berlin, pp. 73–100
Google Scholar
Duda R.O., Hart P.E. (1972): Pattern Classification and Scene Analysis. Wiley, New York
Google Scholar
Feder, T., Greene, D.: Optimal algorithms for approximate clustering. In: Proceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC) May 2–4, Chicago, Illinois, USA (1988)
Fisher D. (1988): Knowledge acquisition via incremental conceptual clusterin. Mach. Learning, 2, 139–172
Google Scholar
Fukunaga K. (1990): Statistical Pattern Recognition. Academic Press, NY
Google Scholar
Hiriart Urruty J.B., Lemarechal C. (1993): Convex Analysis and Minimization Algorithms. Springer Verlag, Berlin, Heidelberg
Google Scholar
Jain A.K., Dubes R.C. (1988): Algorithms for Clustering Data. Prentice-Hall Inc, Englewood Cliffs, NJ
Google Scholar
Hartigan J.A. (1975): Clustering Algorithms. Wiley, New York
Google Scholar
Le Thi Hoai An: Contribution à l’optimisation non convexe et l’optimisation globale: Théorie, Algoritmes et Applications. Habilitation à Diriger des Recherches, Université de Rouen, Juin (1997)
Le Thi Hoai An, Pham Dinh Tao (1997): Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J. Global Optim. 11(3): 253–285
Article Google Scholar
Le Thi Hoai An, Pham Dinh Tao: DC programming approach for large-scale molecular optimization via the general distance geometry problem. Nonconvex Optimization and Its Applications, Special Issue “Optimization in Computational Chemistry and Molecular Biology: Local and Global Approaches”, pp. 301–339. Kluwer Academic Publishers, Dordrecht (2000) (This special issue contains refereed invited papers submitted at the conference on optimization in computational chemistry and molecular biology: local and global approaches held at Princeton University, 7–9 May 1999)
Le Thi Hoai An, Pham Dinh Tao (2001): DC programming approach for solving the multidimensional scaling problem. Nonconvex optimizations and its applications: special issue “From local to global optimization”, pp. 231–276. Kluwer Academic Publishers, Dordrecht
Google Scholar
Le Thi Hoai An, Pham Dinh Tao: DC programming: theory, algorithms and applications. The state of the art. In Proceedings of The First International Workshop on Global Constrained Optimization and Constraint Satisfaction (Cocos’ 02), 28 p. Valbonne-Sophia Antipolis, France, 2–4 October (2002)
Le Thi Hoai An, Pham Dinh Tao (2003): Large Scale Molecular Optimization from distances matrices by a DC optimization approach. SIAM J. Optim. 14(1): 77–116
Article Google Scholar
Le Thi Hoai An, Pham Dinh Tao (2005): The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46
Article Google Scholar
Liu, Y., Shen, X., Doss, H.: Multicategory ψ–learning and support vector machine (29 p.). Conference on Machine Learning, Statistics and Discovery, June 22–26, Department of Statistics, the Ohio State University (2003)
Mangasarian O.L. (1997): Mathematical programming in data mining. Data Mining and Knowl. discov. 1, 183–201
Article Google Scholar
MacQueen J.B. (1967): Some Methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability”, Berkeley, University of California Press,1, 281–297
Meyerson A., O’Callaghan L., Plotkin S. (2004): A k-Median algorithm with running time independent of data size. Machine Learn. 56, 61–87
Article Google Scholar
Neumann, J., Schnörr, C., Steidl, G.: SVM-based feature selection by direct objective minimisation, Pattern Recognition. In: Proceeding of 26th DAGM Symposium, vol. 3175, pp. 212–219, LNCS (2004)
Pham Dinh Tao. (1984): Convergence of subgradient method for computing the bound-norm of matrices, Linear Algebra Appl. 62, 163–182
Article Google Scholar
Pham Dinh Tao. (1984): Algorithmes de calcul d’une forme quadratique sur la boule unité de la norme du maximum. Numer. Math. 45, 377–440
Article Google Scholar
Pham Dinh Tao, Le Thi Hoai An.(1997): Convex analysis approach to d.c. programming: Theory, Algorithms and Applications. Acta Mathematica Vietnamica, (dedicated to Professor Hoang Tuy on the occasion of his 70th birthday), 22(1): 289–355
Google Scholar
Pham Dinh Tao, Le Thi Hoai An. (1998): DC optimization algorithms for solving the trust region subproblem. SIAM J. Optim. 8, 476–505
Article Google Scholar
Rao M.R. (1971): Cluster analysis and mathematical programming. J. Amer. Stat. Associ., 66, 622–626
Article Google Scholar
Rockafellar R.T. (1970): Convex Analysis. Princeton University, Princeton
Google Scholar
Selim S.Z., Ismail M.A. (1984): K-means-Type algorithms: a generalized convergence theorem and characterization of local optimilaty. (Book Series) IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 81–87
Article Google Scholar
Schüle T., Schnörr C., Weber S., Hornegger J. (2005): Discrete tomography by convex-concave regularization and d.c. programming. Discr. Appl. Math. 151, 229–243
Article Google Scholar
Weber S., Schüle T., Schnörr C. (2005): Prior learning and convex-concave regularization of binary tomography. Electr. Notes in Discr. Math. 20, 313–327
Article Google Scholar
Weber S., Schnörr C., Schüle T., Hornegger J., (2005): Binary Tomography by Iterating Linear Programs, Klette, R., Kozera, R., Noakes, L., and Weickert, J., (eds.) Computational Imaging and Vision – Geometric Properties from Incomplete Data, Kluwer Academic Press, Dordrecht
Wong, T., Katz, R., McCanne, S.: A preference clustering protocol for large-Scale Multicast Applications, Proceedings of the First International COST264 Workshop on Networked Group Communication, November 17–20, LNCS, pp. 1–18. Pisa, Italy (1999)
Wolberg W.H., Street W.N., Mangasarian O.L. (1995): Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Anal. Quant. Cytol. Histol. 17(2): 77–87
Google Scholar
Wolberg W.H., Street W.N., Heisey D.M., Mangasarian O.L. (1995):Computerized breast cancer diagnosis and prognosis from fine-needle aspirates. Arch. Surg. 130, 511–516
Google Scholar
Wolberg W.H., Street W.N., Heisey D.M., Mangasarian O.L. (1995): Computer-derived nuclear features distinguish malignant from benign breast cytology. Hum. Patholo., 26, 792–796
Article Google Scholar
Yuille A.L., Rangarajan A. (2002): The Concave-Convex Procedure (CCCP). Avances in Neural Information Processing Systems 14, pp. 1033–1040. MIT Press, Cambridge, MA
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Theoretical and Applied Computer Science, UFR MIM, University of Paul, Verlaine - Metz, Ile du Saulcy, 57045, Metz, France
Le Thi Hoai An
Laboratory of Modelling, Optimization and Operations Research, National Institute for Applied Sciences - Rouen, BP 08, Place Emile Blondel, 76131, Mont Saint Aignan Cedex, France
M. Tayeb Belghiti
Laboratory of Modelling, Optimization and Operations Research, National Institute for Applied Sciences - Rouen, BP 08, Place Emile Blondel, 76131, Mont Saint Aignan Cedex, France
Pham Dinh Tao

Authors

Le Thi Hoai An
View author publications
You can also search for this author in PubMed Google Scholar
M. Tayeb Belghiti
View author publications
You can also search for this author in PubMed Google Scholar
Pham Dinh Tao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Le Thi Hoai An.

Rights and permissions

Reprints and permissions

About this article

Cite this article

An, L.T.H., Belghiti, M.T. & Tao, P.D. A new efficient algorithm based on DC programming and DCA for clustering. J Glob Optim 37, 593–608 (2007). https://doi.org/10.1007/s10898-006-9066-4

Download citation

Received: 29 June 2006
Accepted: 01 July 2006
Published: 09 August 2006
Issue Date: April 2007
DOI: https://doi.org/10.1007/s10898-006-9066-4

Keywords

AMS subject classifications

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new efficient algorithm based on DC programming and DCA for clustering

Abstract

Access this article

Similar content being viewed by others

A Computational Study of the DC Minimization Global Optimality Conditions Applied to K-Means Clustering

K-Means Clustering via a Nonconvex Optimization Approach

On a Nonconvex Distance-Based Clustering Problem

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

AMS subject classifications

Navigation

A new efficient algorithm based on DC programming and DCA for clustering

Abstract

Access this article

Similar content being viewed by others

A Computational Study of the DC Minimization Global Optimality Conditions Applied to K-Means Clustering

K-Means Clustering via a Nonconvex Optimization Approach

On a Nonconvex Distance-Based Clustering Problem

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS subject classifications

Search

Navigation