k Nearest Neighbor Using Ensemble Clustering

AbedAllah, Loai; Shimshoni, Ilan

doi:10.1007/978-3-642-32584-7_22

Loai AbedAllah^18,19 &
Ilan Shimshoni^18,19

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7448))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

2176 Accesses
4 Citations

Abstract

The performance of the k Nearest Neighbor (kNN) algorithm depends critically on its being given a good metric over the input space. One of its main drawbacks is that kNN uses only the geometric distance to measure the similarity and the dissimilarity between the objects without using any statistical regularities in the data, which could help convey the inter-class distance. We found that objects belonging to the same cluster usually share some common traits even though their geometric distance might be large. We therefore decided to define a metric based on clustering. As there is no optimal clustering algorithm with optimal parameter values, several clustering runs are performed yielding an ensemble of clustering (EC) results. The distance between points is defined by how many times the objects were not clustered together. This distance is then used within the framework of the kNN algorithm (kNN-EC). Moreover, objects which were always clustered together in the same clusters are defined as members of an equivalence class. As a result the algorithm now runs on equivalence classes instead of single objects. In our experiments the number of equivalence classes is usually one tenth to one fourth of the number of objects. This equivalence class representation is in effect a smart data reduction technique which can have a wide range of applications. It is complementary to other data reduction methods such as feature selection and methods for dimensionality reduction such as for example PCA. We compared kNN-EC to the original kNN on standard datasets from different fields, and for segmenting a real color image to foreground and background. Our experiments show that kNN-EC performs better than or comparable to the original kNN over the standard datasets and is superior for the color image segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373–1396 (2003)
Article MATH Google Scholar
Bias, L.: Variance and arcing classifiers. Tec. Report 460, Statistics department (1996)
Google Scholar
Chabrier, S., Emile, B., Rosenberger, C., Laurent, H.: Unsupervised performance evaluation of image segmentation. EURASIP Journal on Applied Signal Processing, 1–12 (2006)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 26–33 (2005)
Google Scholar
Christoudias, C., Georgescu, B., Meer, P.: Synergism in low level vision. In: Proceedings of International Conference on Pattern Recognition, pp. 150–155 (2002)
Google Scholar
Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(5), 603–619 (2002)
Article Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Derbeko, P., El-Yaniv, R., Meir, R.: Explicit learning curves for transduction and application to clustering and compression algorithms. Journal of Artificial Intelligence Research 22(1), 117–142 (2004)
MathSciNet MATH Google Scholar
Domeniconi, C., Gunopulos, D., Peng, J.: Large margin nearest neighbor classifiers. IEEE Transactions on Neural Networks 16(4), 899–909 (2005)
Article Google Scholar
Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 36–43. ACM (2004)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Google Scholar
Georgescu, B., Shimshoni, I., Meer, P.: Mean shift based clustering in high dimensions: A texture classification example. In: Proceedings of the 9th International Conference on Computer Vision, pp. 456–463 (2003)
Google Scholar
Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood components analysis. In: Advances in Neural Information Processing Systems, vol. 17, pp. 513–520 (2004)
Google Scholar
Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2001)
Google Scholar
Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(6), 607–616 (1996)
Article Google Scholar
Lindenbaum, M., Markovitch, S., Rusakov, D.: Selective sampling for nearest neighbor classifiers. Machine Learning 54(2), 125–152 (2004)
Article MATH Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Symposium on Math., Statistics, and Probability, pp. 281–297 (1967)
Google Scholar
Min, J., Powell, M., Bowyer, K.W.: Automated performance evaluation of range image segmentation algorithms. IEEE Transactions on Systems Man and Cybernetics-Part B-Cybernetics 34(1), 263–271 (2004)
Article Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 846–850 (1971)
Article Google Scholar
Saul, L.K., Roweis, S.T.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. The Journal of Machine Learning Research 4, 119–155 (2003)
MathSciNet Google Scholar
Shalev-Shwartz, S., Singer, Y., Ng, A.Y.: Online and batch learning of pseudo-metrics. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 94–102. ACM (2004)
Google Scholar
Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 19–23 (2000)
Article Google Scholar
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research 10, 207–244 (2009)
MATH Google Scholar
Zhang, H., Fritts, J.E., Goldman, S.A.: Image segmentation evaluation: A survey of unsupervised methods. Computer Vision and Image Understanding 110(2), 260–280 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, The College of Saknin, University of Haifa, Israel
Loai AbedAllah & Ilan Shimshoni
Department of Information Systems, University of Haifa, Israel
Loai AbedAllah & Ilan Shimshoni

Authors

Loai AbedAllah
View author publications
You can also search for this author in PubMed Google Scholar
Ilan Shimshoni
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICAR-CNR and University of Calabria, via P. Bucci 41C, 87036, Rende (CS), Italy
Alfredo Cuzzocrea
Hewlett Packard Labs, 1501 Page Mill Road, MS 1142, 94304, Palo Alto, CA, USA
Umeshwar Dayal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

AbedAllah, L., Shimshoni, I. (2012). k Nearest Neighbor Using Ensemble Clustering. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2012. Lecture Notes in Computer Science, vol 7448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32584-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-32584-7_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32583-0
Online ISBN: 978-3-642-32584-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics