Skip to main content

Pattern Discovery in Mixed Data Bases

  • Conference paper
  • First Online:
Pattern Recognition (MCPR 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10880))

Included in the following conference series:

Abstract

Structured data bases may include both numerical and non-numerical attributes (categorical or CA). Databases which include CAs are called “mixed” databases (MD). Metric clustering algorithms are ineffectual when presented with MDs because, in such algorithms, the similarity between the objects is determined by measuring the differences between them, in accordance with some predefined metric. Nevertheless, the information contained in the CAs of MDs is fundamental to understand and identify the patterns therein. A practical alternative is to encode the instances of the CAs numerically. To do this we must consider the fact that there is a limited subset of codes which will preserve the patterns in the MD. To identify such pattern-preserving codes (PPC) we appeal to a statistical methodology. It is possible to statistically identify a set of PPCs by selectively sampling a bounded number of codes (corresponding to the different instances of the CAs) and demanding the method to set the size of the sample dynamically. Two issues have to be considered for this method to be defined in practice: (a) How to set the size of the sample and (b) How to define the adequateness of the codes. In this paper we discuss the method and present a case of study wherein the appropriateness of the method is illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Norusis, M.: SPSS 16.0 Statistical Procedures Companion. Prentice Hall Press, Upper Saddle River (2008)

    Google Scholar 

  2. Goebel, M., Gruenwald, L.: A survey of data mining and knowledge discovery software tools. ACM SIGKDD Explor. Newslett. 1(1), 20–33 (1999)

    Article  Google Scholar 

  3. Sokal, R.R.: The principles of numerical taxonomy: twenty-five years later. Comput. Assist. Bact. Syst. 15, 1 (1985)

    Google Scholar 

  4. Barbará, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 582–589. ACM (2002)

    Google Scholar 

  5. Kuri-Morales, A.F.: Categorical encoding with neural networks and genetic algorithms. In: Zhuang, X., Guarnaccia, C. (eds.) WSEAS Proceedings of the 6th International Conference on Applied Informatics and Computing Theory, pp. 167–175 (2015). ISBN 9781618043139, ISBN 1790-5109

    Google Scholar 

  6. Kuri-Morales, A., Sagastuy-Breña, J.: A parallel genetic algorithm for pattern recognition in mixed databases. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera-López, J.A. (eds.) MCPR 2017. LNCS, vol. 10267, pp. 13–21. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59226-8_2

    Chapter  Google Scholar 

  7. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: Schoenauer, M., Deb, K., Rudolph, G., Yao, X., Lutton, E., Merelo, J.J., Schwefel, H.-P. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 849–858. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45356-3_83

    Chapter  Google Scholar 

  8. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)

    Article  MathSciNet  Google Scholar 

  9. Rudolph, G.: Convergence analysis of canonical genetic algorithms. IEEE Trans. Neural Netw. 5(1), 96–101 (1994)

    Article  Google Scholar 

  10. Kuri-Morales, A., Aldana-Bobadilla, E.: The best genetic algorithm I. In: Castro, F., Gelbukh, A., González, M. (eds.) Advances in Soft Computing and Its Applications. LNCS (LNAI), vol. 8266, pp. 1–15. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45111-9_1

    Chapter  MATH  Google Scholar 

  11. Widrow, B., Lehr, M.A.: 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proc. IEEE 78(9), 1415–1442 (1990)

    Article  Google Scholar 

  12. Cheney, E.W.: Multivariate Approximation Theory: Selected Topics. CBMS-NSF Regional Series in Applied Mathematics. S.I.A.M, Philadelphia (1986)

    Book  Google Scholar 

  13. Cheney, E.W.: Introduction to Approximation Theory. McGraw-Hill Book Company, New York (1966)

    MATH  Google Scholar 

  14. Kuri-Morales, A., Cartas-Ayala, A.: Polynomial multivariate approximation with genetic algorithms. In: Sokolova, M., van Beek, P. (eds.) AI 2014. LNCS (LNAI), vol. 8436, pp. 307–312. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06483-3_30

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angel Kuri-Morales .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kuri-Morales, A. (2018). Pattern Discovery in Mixed Data Bases. In: Martínez-Trinidad, J., Carrasco-Ochoa, J., Olvera-López, J., Sarkar, S. (eds) Pattern Recognition. MCPR 2018. Lecture Notes in Computer Science(), vol 10880. Springer, Cham. https://doi.org/10.1007/978-3-319-92198-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92198-3_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92197-6

  • Online ISBN: 978-3-319-92198-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics