Abstract
In this paper, we present our research on similarity search and clustering problems. Similarity search problems define the distances between data points and a given query point Q, efficiently and effectively selecting data points which are closest to Q. Clustering algorithms separate data points into different groups, in a way that data points in the same group have high similarity and data points from different groups are different from each other. In this paper, we explore the meaning of clusters from a new perspective, and propose an approach to reshape the clusters based on K nearest neighbor search results. The reconstructed clusters can help improve the performance of the following K nearest search process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C.: Towards meaningful high-dimensional nearest neighbor search by human-computer interaction. In: ICDE (2002)
Aggarwal, C.C., Yu, P.S.: The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space. In: Knowledge Discovery and Data Mining, pp. 119–129 (2000)
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 1999), Philadelphia, PA, pp. 49–60 (1999)
Bay, S.D.: The UCI KDD Archive. Department of Information and Computer Science. University of California, Irvine, http://kdd.ics.uci.edu
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation (2003)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. The VLDB Journal, 518–529 (1999)
Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? The VLDB Journal, 506–515 (2000)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A multi-resolution clustering approach for very large spatial databases. In: Proceedings of the 24th International Conference on Very Large Data Bases (1998)
Shi, Y., Song, Y., Zhang, A.: A shrinking-based clustering approach for multidimensional data. IEEE Transactions on Knowledge and Data Engineering 17, 1389–1403 (2005)
Shi, Y., Zhang, L.: Panknn: A dimension-wise approach to similarity search problems. In: DMIN, pp. 555–561 (2008)
Tung, A.K.H., Zhang, R., Koudas, N., Ooi, B.C.: Similarity search: a matching based approach. In: VLDB 2006, pp. 631–642. VLDB Endowment (2006)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pp. 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shi, Y., Graham, B. (2012). An Approach to Reshaping Clusters for Nearest Neighbor Search. In: Yin, H., Costa, J.A.F., Barreto, G. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2012. IDEAL 2012. Lecture Notes in Computer Science, vol 7435. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32639-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-32639-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32638-7
Online ISBN: 978-3-642-32639-4
eBook Packages: Computer ScienceComputer Science (R0)