Abstract
Structured data bases may include both numerical and non-numerical attributes (categorical or CA). Databases which include CAs are called “mixed” databases (MD). Metric clustering algorithms are ineffectual when presented with MDs because, in such algorithms, the similarity between the objects is determined by measuring the differences between them, in accordance with some predefined metric. Nevertheless, the information contained in the CAs of MDs is fundamental to understand and identify the patterns therein. A practical alternative is to encode the instances of the CAs numerically. To do this we must consider the fact that there is a limited subset of codes which will preserve the patterns in the MD. To identify such pattern-preserving codes (PPC) we appeal to a statistical methodology. It is possible to statistically identify a set of PPCs by selectively sampling a bounded number of codes (corresponding to the different instances of the CAs) and demanding the method to set the size of the sample dynamically. Two issues have to be considered for this method to be defined in practice: (a) How to set the size of the sample and (b) How to define the adequateness of the codes. In this paper we discuss the method and present a case of study wherein the appropriateness of the method is illustrated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Norusis, M.: SPSS 16.0 Statistical Procedures Companion. Prentice Hall Press, Upper Saddle River (2008)
Goebel, M., Gruenwald, L.: A survey of data mining and knowledge discovery software tools. ACM SIGKDD Explor. Newslett. 1(1), 20–33 (1999)
Sokal, R.R.: The principles of numerical taxonomy: twenty-five years later. Comput. Assist. Bact. Syst. 15, 1 (1985)
Barbará, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 582–589. ACM (2002)
Kuri-Morales, A.F.: Categorical encoding with neural networks and genetic algorithms. In: Zhuang, X., Guarnaccia, C. (eds.) WSEAS Proceedings of the 6th International Conference on Applied Informatics and Computing Theory, pp. 167–175 (2015). ISBN 9781618043139, ISBN 1790-5109
Kuri-Morales, A., Sagastuy-Breña, J.: A parallel genetic algorithm for pattern recognition in mixed databases. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera-López, J.A. (eds.) MCPR 2017. LNCS, vol. 10267, pp. 13–21. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59226-8_2
Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: Schoenauer, M., Deb, K., Rudolph, G., Yao, X., Lutton, E., Merelo, J.J., Schwefel, H.-P. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 849–858. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45356-3_83
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Rudolph, G.: Convergence analysis of canonical genetic algorithms. IEEE Trans. Neural Netw. 5(1), 96–101 (1994)
Kuri-Morales, A., Aldana-Bobadilla, E.: The best genetic algorithm I. In: Castro, F., Gelbukh, A., González, M. (eds.) Advances in Soft Computing and Its Applications. LNCS (LNAI), vol. 8266, pp. 1–15. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45111-9_1
Widrow, B., Lehr, M.A.: 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proc. IEEE 78(9), 1415–1442 (1990)
Cheney, E.W.: Multivariate Approximation Theory: Selected Topics. CBMS-NSF Regional Series in Applied Mathematics. S.I.A.M, Philadelphia (1986)
Cheney, E.W.: Introduction to Approximation Theory. McGraw-Hill Book Company, New York (1966)
Kuri-Morales, A., Cartas-Ayala, A.: Polynomial multivariate approximation with genetic algorithms. In: Sokolova, M., van Beek, P. (eds.) AI 2014. LNCS (LNAI), vol. 8436, pp. 307–312. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06483-3_30
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Kuri-Morales, A. (2018). Pattern Discovery in Mixed Data Bases. In: Martínez-Trinidad, J., Carrasco-Ochoa, J., Olvera-López, J., Sarkar, S. (eds) Pattern Recognition. MCPR 2018. Lecture Notes in Computer Science(), vol 10880. Springer, Cham. https://doi.org/10.1007/978-3-319-92198-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-92198-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92197-6
Online ISBN: 978-3-319-92198-3
eBook Packages: Computer ScienceComputer Science (R0)