Abstract
In this paper we study the problem of clustering entities that are described by two types of data: attribute data and relationship data. While attribute data describe the inherent characteristics of the entities, relationship data represent associations among them. Attribute data can be mapped to the Euclidean space, whereas that is not always possible for the relationship data. The relationship data is described by a graph over the vertices with edges denoting relationship between pairs of entities that they connect. We study clustering problems under the model where the relationship data is constrained by ‘internal connectedness,’ which requires that any two entities in a cluster are connected by an internal path, that is, a path via entities only from the same cluster. We study the k-median and k-means clustering problems under this model. We show that these problems are Ω(logn) hard to approximate and give O(logn) approximation algorithms for specific cases of these problems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arora, S., Raghavan, P., Rao, S.: Polynomial time approximation schemes for the Euclidian k-median problem. In: Symposium on Theory of Computing (1998)
Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the Web. In: World Wide Web Conference (WWW), pp. 391–404 (1997)
Bern, M., Eppstein, D.: Approximation algorithms for geometric problems. In: Hauchbaum, D.S. (ed.) Approximating algorithms for NP-Hard problem. PWS Publishing Company (1997)
Chen, K.: On k-median clustering in high dimensions. In: Symposium on Discrete Algorithms, pp. 1177–1185 (2006)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
Dasgupta, S.: The hardness of k-means clustering, Technical Report CS2007-0890, University of California, San Diego (2007)
Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41(6), 391–407 (1990)
Duda, R., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, New York (2001)
Faolutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image context. Journal of Intelligent Information Systems 3(3), 231–262 (1994)
Ge, R., Ester, M., Gao, S.J., Hu, Z., Bhattacharya, B.: Join Cluster Analysis of Attribute Data and Relationship Data: The connected k-Center Problem, Algorithm and Applications. TKDD 2(2) (2008)
Goemans, M.X., Williamson, D.P.: A General Approximation Technique For Constrained Forest Problems. SIAM Journal on Computing 24(2), 296–317 (1992)
Har-Peled, S., Mazumdar, S.: Coresets for k-means and k-median clustering and their applications. In: Symposium on Theory of Computing, pp. 291–300 (2004)
Kumar, A., Sabharwal, Y., Sen, S.: Linear-Time approximation schemes for Clustering Problems in any Dimensions. Journal of the ACM 57(2) (2010)
Megiddo, N., Supowit, K.J.: On the complexity of some common geometric location problems. SIAM Journal on Computing 13(1), 182–196 (1984)
Raz, R., Safra, S.: A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP. In: Symposium on Theory of Computing, pp. 475–484 (1997)
Swain, M.J., Ballard, D.H.: Color indexing. International Journal of Conputer Vision 7, 11–32 (1991)
Wasserman, K., Faust, K.: Social Network Analysis. Cambridge University Press, Cambridge (1994)
Webster, C., Morrison, P.: Network analysis in marketing. Australasian Market. J. 12(2), 8–18 (2004)
Zhang, Z., Gao, X., Wu, W.: Algorithms for connected set cover problem and fault-tolerant connected set cover problem. Theoretical Computer Science 410(8-10), 812–817 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gupta, N., Pancholi, A., Sabharwal, Y. (2011). Clustering with Internal Connectedness. In: Katoh, N., Kumar, A. (eds) WALCOM: Algorithms and Computation. WALCOM 2011. Lecture Notes in Computer Science, vol 6552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19094-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-19094-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19093-3
Online ISBN: 978-3-642-19094-0
eBook Packages: Computer ScienceComputer Science (R0)