Efficient Distributed Data Condensation for Nearest Neighbor Classification

Angiulli, Fabrizio; Folino, Gianluigi

doi:10.1007/978-3-540-74466-5_37

Fabrizio Angiulli¹ &
Gianluigi Folino²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4641))

Included in the following conference series:

European Conference on Parallel Processing

758 Accesses
1 Citations

Abstract

In this work, PFCNN, a distributed method for computing a consistent subset of very large data sets for the nearest neighbor decision rule is presented. In order to cope with the communication overhead typical of distributed environments and to reduce memory requirements, different variants of the basic PFCNN method are introduced. Experimental results, performed on a class of synthetic datasets revealed that these methods can be profitably applied to enormous collections of data. Indeed, they scale-up well and are efficient in memory consumption and achieve noticeable data reduction and good classification accuracy. To the best of our knowledge, this is the first distributed algorithm for computing a training set consistent subset for the nearest neighbor rule.

Download to read the full chapter text

Chapter PDF

A Distributed Shared Nearest Neighbors Clustering Algorithm

Pivot-Based Distributed K-Nearest Neighbor Mining

Distributed adaptive nearest neighbor classifier: algorithm and theory

Article 03 July 2023

Ruiqi Liu, Ganggang Xu & Zuofeng Shang

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Angiulli, F.: Fast condensend nearest neighbor rule. In: Proc. of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 25–32 (2005)
Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. on Inform. Th. 13(1), 21–27 (1967)
Article MATH Google Scholar
Devi, F.S., Murty, M.N.: An incremental prototype set building technique. Pat. Recognition 35(2), 505–513 (2002)
Article MATH Google Scholar
Devroye, L.: On the inequality of cover and hart in nearest neighbor discrimination. IEEE Trans. on Pat. Anal. and Mach. Intel. 3, 75–78 (1981)
Article MATH Google Scholar
Foster, I., Kesselman, C.: The Grid2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. on Inform. Th. 14(3), 515–516 (1968)
Article Google Scholar
Karaçali, B., Krim, H.: Fast minimization of structural risk by nearest neighbor rule. IEEE Trans. on Neural Networks 14(1), 127–134 (2002)
Article Google Scholar
Stone, C.: Consistent nonparametric regression. Annals of Statistics 8, 1348–1360 (1977)
Article Google Scholar

Download references

Author information

Authors and Affiliations

DEIS, Università della Calabria, Via P. Bucci 41C, 87036, Rende (CS), Italy
Fabrizio Angiulli
Institute of High Performance Computing and Networking (ICAR-CNR), Via P. Bucci 41C, 87036 Rende (CS), Italy
Gianluigi Folino

Authors

Fabrizio Angiulli
View author publications
You can also search for this author in PubMed Google Scholar
Gianluigi Folino
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Anne-Marie Kermarrec Luc Bougé Thierry Priol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Angiulli, F., Folino, G. (2007). Efficient Distributed Data Condensation for Nearest Neighbor Classification. In: Kermarrec, AM., Bougé, L., Priol, T. (eds) Euro-Par 2007 Parallel Processing. Euro-Par 2007. Lecture Notes in Computer Science, vol 4641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74466-5_37

Download citation

DOI: https://doi.org/10.1007/978-3-540-74466-5_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74465-8
Online ISBN: 978-3-540-74466-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Distributed Data Condensation for Nearest Neighbor Classification

Abstract

Chapter PDF

Similar content being viewed by others

A Distributed Shared Nearest Neighbors Clustering Algorithm

Pivot-Based Distributed K-Nearest Neighbor Mining

Distributed adaptive nearest neighbor classifier: algorithm and theory

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficient Distributed Data Condensation for Nearest Neighbor Classification

Abstract

Chapter PDF

Similar content being viewed by others

A Distributed Shared Nearest Neighbors Clustering Algorithm

Pivot-Based Distributed K-Nearest Neighbor Mining

Distributed adaptive nearest neighbor classifier: algorithm and theory

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation