Abstract
Data Reduction techniques are commonly applied in instance-based classification tasks to lower the amount of data to be processed. Prototype Selection (PS) and Prototype Generation (PG) constitute the most representative approaches. These two families differ in the way of obtaining the reduced set out of the initial one: while the former aims at selecting the most representative elements from the set, the latter creates new data out of it. Although PG is considered to better delimit decision boundaries, operations required are not so well defined in scenarios involving structural data such as strings, trees or graphs. This work proposes a case of study with the use of the common RandomC algorithm for mapping the initial structural data to a Dissimilarity Space (DS) representation, thereby allowing the use of PG methods. A comparative experiment over string data is carried out in which our proposal is faced to PS methods on the original space. Results show that PG combined with RandomC mapping achieves a very competitive performance, although the obtained accuracy seems to be bounded by the representativity of the DS method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abreu, J., Rico-Juan, J.R.: A new iterative algorithm for computing a quality approximated median of strings based on edit operations. Pattern Recogn. Lett. 36, 74–80 (2014)
Angiulli, F.: Fast nearest neighbor condensation for large data sets classification. IEEE Trans. Knowl. Data Eng. 19(11), 1450–1464 (2007)
Bunke, H., Riesen, K.: Towards the unification of structural and statistical pattern recognition. Pattern Recogn. Lett. 33(7), 811–825 (2012)
Cano, J.R., Herrera, F., Lozano, M.: On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining. Appl. Soft Comput. 6(3), 323–332 (2006)
Decaestecker, C.: Finding prototypes for nearest neighbour classification by means of gradient descent and deterministic annealing. Pattern Recogn. 30(2), 281–288 (1997)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001)
Eshelman, L.J.: The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: Proceedings of the First Workshop on Foundations of Genetic Algorithms. Indiana, USA, pp. 265–283 (1990)
Fernández, F., Isasi, P.: Evolutionary design of nearest prototype classifiers. J. Heuristics 10(4), 431–454 (2004)
Ferrer, M., Bunke, H.: An iterative algorithm for approximate median graph computation. In: 20th International Conference on Pattern Recognition (ICPR), pp. 1562–1565 (2010)
Freeman, H.: On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. EC–10(2), 260–268 (1961)
García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer, Switzerland (2015)
Hull, J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. 16(5), 550–554 (1994)
Li, Y., Huang, J., Zhang, W., Zhang, X.: New prototype selection rule integrated condensing with editing process for the nearest neighbor rules. In: IEEE International Conference on Industrial Technology ICIT, pp. 950–954 (2005)
Mitchell, T.M.: Machine Learning. McGraw-Hill Inc., New York (1997)
Pekalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations And Applications (Machine Perception and Artificial Intelligence). World Scientific Publishing Co., Inc, USA (2005)
Rico-Juan, J.R., Iñesta, J.M.: New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recogn. Lett. 33(5), 654–660 (2012)
Riesen, K., Neuhaus, M., Bunke, H.: Graph embedding in vector spaces by means of prototype selection. In: Escolano, F., Vento, M. (eds.) GbRPR. LNCS, vol. 4538, pp. 383–393. Springer, Heidelberg (2007)
Sánchez, J.: High training set size reduction by space partitioning and prototype abstraction. Pattern Recogn. 37(7), 1561–1564 (2004)
Triguero, I., Derrac, J., García, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. C 42(1), 86–100 (2012)
Acknowledgements
This work was partially supported by the Spanish Ministerio de Educación, Cultura y Deporte through a FPU fellowship (AP2012–0939), the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R supported by EU FEDER funds), Consejería de Educación de la Comunidad Valenciana through project PROMETEO/2012/017 and Vicerrectorado de Investigación, Desarrollo e Innovación de la Universidad de Alicante through FPU program (UAFPU2014–5883).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Calvo-Zaragoza, J., Valero-Mas, J.J., Rico-Juan, J.R. (2015). Prototype Generation on Structural Data Using Dissimilarity Space Representation: A Case of Study. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-19390-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19389-2
Online ISBN: 978-3-319-19390-8
eBook Packages: Computer ScienceComputer Science (R0)