Improving Data Reduction by Merging Prototypes

Ponos, Pavlos; Ougiaroglou, Stefanos; Evangelidis, Georgios

doi:10.1007/978-3-030-28730-6_2

Pavlos Ponos¹²,
Stefanos Ougiaroglou^12,13 &
Georgios Evangelidis¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11695))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

745 Accesses

Abstract

A well-known and adaptable classifier is the k-Nearest Neighbor (kNN) that requires a training set of relatively small size in order to perform adequately. Training sets can be reduced in size by using conventional data reduction techniques. Unfortunately, these techniques are inappropriate in streaming environments or when executed in devices with limited resources. dRHC is a prototype generation algorithm that works in streaming environments by maintaining a condensed training set that can be updated by continuously arriving training data segments. Prototypes in dRHC carry an appropriate weight to indicate the number of instances of the same class that they represent. dRHC2 is an improvement over dRHC since it can handle fixed size condensing sets by removing the least important prototypes whenever the condensing set exceeds a predefined size. In this paper, we exploit the idea that dRHC or dRHC2 prototypes could be merged whenever they are close enough and represent the same class. Hence, we propose two new prototype merging algorithms. The first algorithm performs a single pass over a newly updated condensing set and merges all prototype pairs of the same class under the condition that each prototype is the nearest neighbor of the other. The second algorithm performs repetitive merging passes until there are no prototypes to be merged. The proposed algorithms are tested against several datasets and the experimental results reveal that the single pass variation performs better for both dRHC and dRHC2 taking into account the trade-off between preprocessing cost, reduction rate and accuracy. In addition, the merging appears to be more appropriate for the static version of the algorithm (dRHC) since it offers higher data reduction without sacrificing accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://sci2s.ugr.es/keel/datasets.php.

References

Aggarwal, C.: Data Streams: Models and Algorithms. Advances in Database Systems Series, Springer, Boston (2007). https://doi.org/10.1007/978-0-387-47534-9
Book MATH Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (2006). http://dx.doi.org/10.1109/TIT.1967.1053964
Article Google Scholar
Ougiaroglou, S., Evangelidis, G.: RHC: a non-parametric cluster-based data reduction for efficient k-NN classification. Pattern Anal. Appl. 19(1), 93–109 (2014). http://dx.doi.org/10.1007/s10044-014-0393-7
Article MathSciNet Google Scholar
Ougiaroglou, S., Arampatzis, G., Dervos, D.A., Evangelidis, G.: Generating fixed-size training sets for large and streaming datasets. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A. (eds.) ADBIS 2017. LNCS, vol. 10509, pp. 88–102. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66917-5_7
Chapter Google Scholar
Triguero, I., Derrac, J., Garcia, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. Trans. Sys. Man Cyber Part C 42(1), 86–100 (2012). http://dx.doi.org/10.1109/TSMCC.2010.2103939
Article Google Scholar
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012). http://dx.doi.org/10.1109/TPAMI.2011.142
Article Google Scholar
Lozano, M.: Data Reduction Techniques in Classification Processes. Ph.D. Thesis, Universitat Jaume I (2007)
Google Scholar
Ougiaroglou, S., Evangelidis, G.: Efficient dataset size reduction by finding homogeneous clusters. In: Proceedings of the Fifth Balkan Conference in Informatics, BCI 2012, ACM, New York, NY, USA, pp. 168–173 (2012). http://doi.acm.org/10.1145/2371316.2371349
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Multiple Valued Logic and Soft Comput. 17(2–3), 255–287 (2011)
Google Scholar

Download references

Acknowledgments

This research is funded by the University of Macedonia Research Committee as part of the “Principal Research 2019” funding program.

We thank Prof. Yannis Manolopoulos for his excellent remarks during ADBIS 2017 that led to the ideas presented in this paper.

Author information

Authors and Affiliations

Department of Applied Informatics, School of Information Sciences, University of Macedonia, 54636, Thessaloniki, Greece
Pavlos Ponos, Stefanos Ougiaroglou & Georgios Evangelidis
Department of Information Technology, Alexander TEI of Thessaloniki, 57400, Sindos, Greece
Stefanos Ougiaroglou

Authors

Pavlos Ponos
View author publications
You can also search for this author in PubMed Google Scholar
Stefanos Ougiaroglou
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Evangelidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavlos Ponos .

Editor information

Editors and Affiliations

University of Maribor, Maribor, Slovenia
Tatjana Welzer
Alpen-Adria Universität Klagenfurt, Klagenfurt, Austria
Johann Eder
University of Maribor, Maribor, Slovenia
Vili Podgorelec
University of Maribor, Maribor, Slovenia
Aida Kamišalić Latifić

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ponos, P., Ougiaroglou, S., Evangelidis, G. (2019). Improving Data Reduction by Merging Prototypes. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds) Advances in Databases and Information Systems. ADBIS 2019. Lecture Notes in Computer Science(), vol 11695. Springer, Cham. https://doi.org/10.1007/978-3-030-28730-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-28730-6_2
Published: 13 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28729-0
Online ISBN: 978-3-030-28730-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics