Pairwise Data Clustering and Applications

Wu, Xiaodong; Chen, Danny Z.; Mason, James J.; Schmid, Steven R.

doi:10.1007/3-540-45071-8_46

Xiaodong Wu⁶,
Danny Z. Chen⁷,
James J. Mason⁸ &
…
Steven R. Schmid⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2697))

Included in the following conference series:

International Computing and Combinatorics Conference

952 Accesses
2 Citations

Abstract

Data clustering is an important theoretical topic and a sharp tool for various applications. Its main objective is to partition a given data set into clusters such that the data within the same cluster are “more” similar to each other with respect to certain measures. In this paper, we study the pairwise data clustering problem with pairwise similarity/ dissimilarity measures that need not satisfy the triangle inequality. By using a criterion, called the minimum normalized cut, we model the pairwise data clustering problem as a graph partition problem. The graph partition problem based on minimizing the normalized cut is known to be NP-hard. We present a ((4 + o(1)) ln n)-approximation polynomial time algorithm for the minimum normalized cut problem. We also give a more efficient algorithm for this problem by sacrificing the approximation ratio slightly. Further, our scheme achieves a ((2 + o(1)) ln n)-approximation polynomial time algorithm for computing the sparsest cuts in edge-weighted and vertex-weighted undirected graphs, improving the previously best known approximation ratio by a constant factor.

This research was supported in part by the 21st Century Research and Technology Fund from the State of Indiana.

The work of this author was supported in part by the Computing and Information Technology Center, and by the Faculty Research Council, University of Texas-Pan American, Edinburg, Texas, USA.

The work of this author was supported in part by the National Science Foundation under Grants CCR-9623585 and CCR-9988468.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Spectral Clustering

Convex programming based spectral clustering

Article 14 April 2021

Shift of pairwise similarities for data clustering

Article Open access 22 June 2022

References

P. Agarwal and C. Procopiuc, Exact and Approximation Algorithms for Clustering, Proc. of ACM-SIAM SODA, 1998.
Google Scholar
V. Arya, N. Garg, R. Khandekar, V. Pandit, A. Meyerson, and K. Munagala, Local Search Heuristics for k-median and Facility Location Problems, Proc. of ACM STOC, 2001, 21–29.
Google Scholar
J. Aslam, A. Leblanc, and C. Stein, A New Approach to Clustering, Proc. of WAE, 2000.
Google Scholar
Y. Bartal, M. Charikar, and D. Raz, Approximating Min-Sum k-clustering in Metric Spaces, Proc. of ACM STOC, 2001, 11–22.
Google Scholar
A. Ben-Dor and Z. Yakhini, Clustering Gene Expression Patterns, Proc. of ACM RECOMB, 1999, 33–42.
Google Scholar
D. Bienstock, January 1999. Talk at Oberwolfach, Germany.
Google Scholar
M. Charikar, C. Chekuri, T. Feder, and R. Motwani, Incremental Clustering and Dynamic Information Retrieval, Proc. of ACM STOC, 1997, 626–635.
Google Scholar
T.H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill, 1990.
Google Scholar
P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay, Clustering in Large Graphs and Matrices, Proc. of ACM-SIAM SODA, 1999.
Google Scholar
G. Even, J. Naor, S. Rao, and B. Schieber, Fast Approximate Graph Partitioning Algorithms, SIAM J. Computing, 28(1999), 2187–2214.
Article MATH MathSciNet Google Scholar
B. Everitt, Cluster Analysis, Oxford University Press, 1993.
Google Scholar
N. Garg and J. Könemann, Faster and Simpler Algorithms for Multicommodity Flow and Other Fractional Packing Problems, Proc. 39th IEEE FOCS, 1998, 300–309.
Google Scholar
N. Garg, V. V. Vazirani, and M. Yannakakis, Approximate Max-Flow Min-(Multi)Cut Theorems and Their Applications, SIAM J. Computing, 25(1996), 235–251.
Article MATH MathSciNet Google Scholar
S. Guattery and G. Miller, On the Performance of Spectral Graph Partitioning Methods, Proc. of ACM-SIAM SODA, 1995, 233–242.
Google Scholar
S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan, Clustering Data Streams, Proc. of IEEE FOCS, 2000.
Google Scholar
T. Hofmann and J. Buhmann, Pairwise Data Clustering by Deterministic Annealing, IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(1997), 1–14.
Article Google Scholar
R. Kannan, S. Vempala, and A. Vetta, On Clusterings — Good, Bad and Spectral, Proc. of IEEE FOCS, 2000.
Google Scholar
G. Karakostas, Faster Approximation Schemes for Fractional Multicommodity Flow Problems, Proc. 13th ACM-SIAM SODA, 2002, 166–173.
Google Scholar
P. Klein, S. Plotkin, C. Stein, and É. Tardos, Faster Approximation Algorithms for the Unit Capacity Concurrent Flow Problem with Applications to Routing and Finding Sparse Cuts, SIAM J. on Computing, 23(1994), 466–487.
Article MATH MathSciNet Google Scholar
T. Leighton, F. Makedon, S. Plotkin, C. Stein, É. Tardos, and S. Tragoudas, Fast Approximation Algorithms for Multicommodity Flow Problems, J. of Computer and System Sciences, 50(1995), 228–243.
Article MATH MathSciNet Google Scholar
T. Leighton and S. Rao, Multicommodity Max-Flow Min-Cut Theorems and Their Use in Designing Approximation Algorithms, J. of the ACM, 46(1999), 787–832.
Article MATH MathSciNet Google Scholar
J. Matousek, On Approximate Geometric k-clustering, Discrete and Computational Geometry, 24(2000), 61–84.
MATH MathSciNet Google Scholar
B. Mirkin, Mathematical Classification and Clustering, Kluwer Academic Publishers, 1996.
Google Scholar
F. Shahrokhi and D. Matula, The Maximum Concurrent Flow Problem. J. of the ACM, 37(1990), 318–334.
Article MATH MathSciNet Google Scholar
J. Shi and J. Malik, Normalized Cuts and Image Segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8) (2000), 888–905.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Texas-Pan American, Edinburg, TX, 78539, USA
Xiaodong Wu
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
Danny Z. Chen
Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
James J. Mason & Steven R. Schmid

Authors

Xiaodong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Danny Z. Chen
View author publications
You can also search for this author in PubMed Google Scholar
James J. Mason
View author publications
You can also search for this author in PubMed Google Scholar
Steven R. Schmid
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Texas at Austin, One University Station, C0500, Austin, TX, 78712, USA
Tandy Warnow
Department of Computer Science, Montana State University, EPS 357, Bozeman, MT, 59717, USA
Binhai Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Chen, D.Z., Mason, J.J., Schmid, S.R. (2003). Pairwise Data Clustering and Applications. In: Warnow, T., Zhu, B. (eds) Computing and Combinatorics. COCOON 2003. Lecture Notes in Computer Science, vol 2697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45071-8_46

Download citation

DOI: https://doi.org/10.1007/3-540-45071-8_46
Published: 24 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40534-4
Online ISBN: 978-3-540-45071-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Pairwise Data Clustering and Applications

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Spectral Clustering

Convex programming based spectral clustering

Shift of pairwise similarities for data clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pairwise Data Clustering and Applications

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Spectral Clustering

Convex programming based spectral clustering

Shift of pairwise similarities for data clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation