Skip to main content

Clustering with Internal Connectedness

  • Conference paper
  • 1223 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6552))

Abstract

In this paper we study the problem of clustering entities that are described by two types of data: attribute data and relationship data. While attribute data describe the inherent characteristics of the entities, relationship data represent associations among them. Attribute data can be mapped to the Euclidean space, whereas that is not always possible for the relationship data. The relationship data is described by a graph over the vertices with edges denoting relationship between pairs of entities that they connect. We study clustering problems under the model where the relationship data is constrained by ‘internal connectedness,’ which requires that any two entities in a cluster are connected by an internal path, that is, a path via entities only from the same cluster. We study the k-median and k-means clustering problems under this model. We show that these problems are Ω(logn) hard to approximate and give O(logn) approximation algorithms for specific cases of these problems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arora, S., Raghavan, P., Rao, S.: Polynomial time approximation schemes for the Euclidian k-median problem. In: Symposium on Theory of Computing (1998)

    Google Scholar 

  2. Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the Web. In: World Wide Web Conference (WWW), pp. 391–404 (1997)

    Google Scholar 

  3. Bern, M., Eppstein, D.: Approximation algorithms for geometric problems. In: Hauchbaum, D.S. (ed.) Approximating algorithms for NP-Hard problem. PWS Publishing Company (1997)

    Google Scholar 

  4. Chen, K.: On k-median clustering in high dimensions. In: Symposium on Discrete Algorithms, pp. 1177–1185 (2006)

    Google Scholar 

  5. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  6. Dasgupta, S.: The hardness of k-means clustering, Technical Report CS2007-0890, University of California, San Diego (2007)

    Google Scholar 

  7. Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  8. Duda, R., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, New York (2001)

    MATH  Google Scholar 

  9. Faolutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image context. Journal of Intelligent Information Systems 3(3), 231–262 (1994)

    Article  Google Scholar 

  10. Ge, R., Ester, M., Gao, S.J., Hu, Z., Bhattacharya, B.: Join Cluster Analysis of Attribute Data and Relationship Data: The connected k-Center Problem, Algorithm and Applications. TKDD 2(2) (2008)

    Google Scholar 

  11. Goemans, M.X., Williamson, D.P.: A General Approximation Technique For Constrained Forest Problems. SIAM Journal on Computing 24(2), 296–317 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  12. Har-Peled, S., Mazumdar, S.: Coresets for k-means and k-median clustering and their applications. In: Symposium on Theory of Computing, pp. 291–300 (2004)

    Google Scholar 

  13. Kumar, A., Sabharwal, Y., Sen, S.: Linear-Time approximation schemes for Clustering Problems in any Dimensions. Journal of the ACM 57(2) (2010)

    Google Scholar 

  14. Megiddo, N., Supowit, K.J.: On the complexity of some common geometric location problems. SIAM Journal on Computing 13(1), 182–196 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  15. Raz, R., Safra, S.: A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP. In: Symposium on Theory of Computing, pp. 475–484 (1997)

    Google Scholar 

  16. Swain, M.J., Ballard, D.H.: Color indexing. International Journal of Conputer Vision 7, 11–32 (1991)

    Article  Google Scholar 

  17. Wasserman, K., Faust, K.: Social Network Analysis. Cambridge University Press, Cambridge (1994)

    Book  MATH  Google Scholar 

  18. Webster, C., Morrison, P.: Network analysis in marketing. Australasian Market. J. 12(2), 8–18 (2004)

    Article  Google Scholar 

  19. Zhang, Z., Gao, X., Wu, W.: Algorithms for connected set cover problem and fault-tolerant connected set cover problem. Theoretical Computer Science 410(8-10), 812–817 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gupta, N., Pancholi, A., Sabharwal, Y. (2011). Clustering with Internal Connectedness. In: Katoh, N., Kumar, A. (eds) WALCOM: Algorithms and Computation. WALCOM 2011. Lecture Notes in Computer Science, vol 6552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19094-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19094-0_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19093-3

  • Online ISBN: 978-3-642-19094-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics