Abstract
Data clustering is an important theoretical topic and a sharp tool for various applications. Its main objective is to partition a given data set into clusters such that the data within the same cluster are “more” similar to each other with respect to certain measures. In this paper, we study the pairwise data clustering problem with pairwise similarity/ dissimilarity measures that need not satisfy the triangle inequality. By using a criterion, called the minimum normalized cut, we model the pairwise data clustering problem as a graph partition problem. The graph partition problem based on minimizing the normalized cut is known to be NP-hard. We present a ((4 + o(1)) ln n)-approximation polynomial time algorithm for the minimum normalized cut problem. We also give a more efficient algorithm for this problem by sacrificing the approximation ratio slightly. Further, our scheme achieves a ((2 + o(1)) ln n)-approximation polynomial time algorithm for computing the sparsest cuts in edge-weighted and vertex-weighted undirected graphs, improving the previously best known approximation ratio by a constant factor.
This research was supported in part by the 21st Century Research and Technology Fund from the State of Indiana.
The work of this author was supported in part by the Computing and Information Technology Center, and by the Faculty Research Council, University of Texas-Pan American, Edinburg, Texas, USA.
The work of this author was supported in part by the National Science Foundation under Grants CCR-9623585 and CCR-9988468.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
P. Agarwal and C. Procopiuc, Exact and Approximation Algorithms for Clustering, Proc. of ACM-SIAM SODA, 1998.
V. Arya, N. Garg, R. Khandekar, V. Pandit, A. Meyerson, and K. Munagala, Local Search Heuristics for k-median and Facility Location Problems, Proc. of ACM STOC, 2001, 21–29.
J. Aslam, A. Leblanc, and C. Stein, A New Approach to Clustering, Proc. of WAE, 2000.
Y. Bartal, M. Charikar, and D. Raz, Approximating Min-Sum k-clustering in Metric Spaces, Proc. of ACM STOC, 2001, 11–22.
A. Ben-Dor and Z. Yakhini, Clustering Gene Expression Patterns, Proc. of ACM RECOMB, 1999, 33–42.
D. Bienstock, January 1999. Talk at Oberwolfach, Germany.
M. Charikar, C. Chekuri, T. Feder, and R. Motwani, Incremental Clustering and Dynamic Information Retrieval, Proc. of ACM STOC, 1997, 626–635.
T.H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill, 1990.
P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay, Clustering in Large Graphs and Matrices, Proc. of ACM-SIAM SODA, 1999.
G. Even, J. Naor, S. Rao, and B. Schieber, Fast Approximate Graph Partitioning Algorithms, SIAM J. Computing, 28(1999), 2187–2214.
B. Everitt, Cluster Analysis, Oxford University Press, 1993.
N. Garg and J. Könemann, Faster and Simpler Algorithms for Multicommodity Flow and Other Fractional Packing Problems, Proc. 39th IEEE FOCS, 1998, 300–309.
N. Garg, V. V. Vazirani, and M. Yannakakis, Approximate Max-Flow Min-(Multi)Cut Theorems and Their Applications, SIAM J. Computing, 25(1996), 235–251.
S. Guattery and G. Miller, On the Performance of Spectral Graph Partitioning Methods, Proc. of ACM-SIAM SODA, 1995, 233–242.
S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan, Clustering Data Streams, Proc. of IEEE FOCS, 2000.
T. Hofmann and J. Buhmann, Pairwise Data Clustering by Deterministic Annealing, IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(1997), 1–14.
R. Kannan, S. Vempala, and A. Vetta, On Clusterings — Good, Bad and Spectral, Proc. of IEEE FOCS, 2000.
G. Karakostas, Faster Approximation Schemes for Fractional Multicommodity Flow Problems, Proc. 13th ACM-SIAM SODA, 2002, 166–173.
P. Klein, S. Plotkin, C. Stein, and É. Tardos, Faster Approximation Algorithms for the Unit Capacity Concurrent Flow Problem with Applications to Routing and Finding Sparse Cuts, SIAM J. on Computing, 23(1994), 466–487.
T. Leighton, F. Makedon, S. Plotkin, C. Stein, É. Tardos, and S. Tragoudas, Fast Approximation Algorithms for Multicommodity Flow Problems, J. of Computer and System Sciences, 50(1995), 228–243.
T. Leighton and S. Rao, Multicommodity Max-Flow Min-Cut Theorems and Their Use in Designing Approximation Algorithms, J. of the ACM, 46(1999), 787–832.
J. Matousek, On Approximate Geometric k-clustering, Discrete and Computational Geometry, 24(2000), 61–84.
B. Mirkin, Mathematical Classification and Clustering, Kluwer Academic Publishers, 1996.
F. Shahrokhi and D. Matula, The Maximum Concurrent Flow Problem. J. of the ACM, 37(1990), 318–334.
J. Shi and J. Malik, Normalized Cuts and Image Segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8) (2000), 888–905.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, X., Chen, D.Z., Mason, J.J., Schmid, S.R. (2003). Pairwise Data Clustering and Applications. In: Warnow, T., Zhu, B. (eds) Computing and Combinatorics. COCOON 2003. Lecture Notes in Computer Science, vol 2697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45071-8_46
Download citation
DOI: https://doi.org/10.1007/3-540-45071-8_46
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40534-4
Online ISBN: 978-3-540-45071-9
eBook Packages: Springer Book Archive