Skip to main content
Log in

Classification Using the Zipfian Kernel

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

We propose to use the Zipfian distribution as a kernel for the design of a nonparametric classifier in contrast to the Gaussian distribution used in most kernel methods. We show that the Zipfian distribution takes into account multifractal nature of data and gives a true picture of scaling properties inherent in data. We also show that this new look at data structure can lead to a simple classifier that can, for some tasks, outperform more complex systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ARYA, S., MOUNT, D.M., NARAYAN, O. (1996), “Accounting for Boundary Effects in Nearest Neighbor Searching”, Discrete and Computational Geometry, 16, 155–176.

  • BACHE, K., and LICHMAN, M. (2013), “UCI Machine Learning Repository”, University of California, School of Information and Computer Science, Irvine CA, http://archive.ics.uci.edu/ml

  • BORS, A.G., and NASIOS, N. (2013), “Kernel Bandwidth Estimation for Nonparametric Data Segmentation”, go to: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.159.7219.

  • CAMASTRA, F. (2003), “Data Dimensionality Estimation Methods: A Survey”, Pattern Recognition, 36, 2945–2954.

  • CAMASTRA, P., and VINCIARELLI, A. (2001), “Intrinstic Dimension Estimation of Data: An Approach Based on Grassberger-Procaccia’s Algorithm”, Neural Processing Letters, 14(1), 27–34.

    Article  Google Scholar 

  • COVER, T.M., and HART, P.E. (1967), “Nearest Neighbor Pattern Classification”, IEEE Transactions in Information Theory, IT-13(1), 23–27.

    Google Scholar 

  • GRASSBERGER, P., and PROCACCIA, I. (1983), “Measuring the Strangeness of Strange Attractors”, Physica, 9D, 189–208.

    MathSciNet  Google Scholar 

  • HAKL, F., JIŘINA, M., and RICHTER-WAS, E. (2005), “Hadronic Tau’s Identification Using Artificial Neural Network”, ATLAS Physics Communication, ATL-COMPHYS-2005-044, CERN, Geneve, http://www.marceljirina.cz/files/hadronic-tau-sidentification-using-artificial-neural-network.pdf

  • HERBRICH, R. (2002), Learning Kernel Classifiers. Theory and Algorithms, Cambridge MA; London UK: The MIT Press.

  • HERBRICH, R. (2001), Learning Kernel Classifiers. Theory and Algorithms, Cambridge MA: The MIT Press.

  • JIŘINA, M., and JIŘINA, JR., M. (2008), “Apparatus for Assessing a Control Value”, patent pending under number PV 2008-245; Z 7576, submitted on 22 April 2008 to the Industrial Property Office, Prague, Czech Republic.

  • JIŘINA, M., and JIŘINA JR., M. (2008), “Correlation Integral Decomposition for Classification. in Artificial Neural Networks”, Lecture Notes in Computer Science Vol. 5164, Berlin: Springer, pp 62–71.

  • JIŘINA, M., and JIŘINA JR., M. (2009), “Classification by the Use of Decomposition of Correlation Integral” in Studies in Computational Intelligence Vol 205, Foundations of Computational Intelligence Vol 5, Berlin: Springer, pp. 39–55

  • JIŘINA, M., and JIŘINA JR., M. (2013), “Utilization of Singularity Exponent in Nearest Neighbor Based Classifier”, Journal of Classification, 30(1), 3–29.

  • LUCAS, S.M. (2008), “Algoval: Algorithm Evaluation over the Web”, http://algoval.essex.ac.uk.

  • MANDELBROT, B.B. (1982), The Fractal Theory of Nature, New York: W. H. Freeman and Co.

  • MASLOV, V.P. (2005), “On a General Theorem of Set Theory Leading to the Gibbs, Bose-Einstein, and Pareto Distributions as Well as to the Zipf-Mandelbrot Law for the Stock Market”, Mathematical Notes, 78(6), 807–813.

    Article  MathSciNet  Google Scholar 

  • PAREDES, R. (2008), “CPW: Class and Prototype Weights Learning”, http://www.dsic.upv.es/~rparedes/research/CPW/index.html.

  • PAREDES, R., and VIDAL, E. (2006), “Learning Weighted Metrics to Minimize Nearest Neighbor Classification Error”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(7), 1100–1110.

  • SCHMULAND, B (2003), “Random Harmonic Series”, American Mathematical Monthly 110, 407–416.

    Article  MathSciNet  Google Scholar 

  • SCHÖLKOPF, B., and SMOLA, A.J. (2002), Learning with Kernels. Support Vector Machines, Regularization, Optimization, and Beyond, Cambridge MA; London UK: The MIT Press.

  • SCOTT, D.W. (1992), Multivariate Density Estimation. Theory, Practice, and Visualization, New York: John Wiley and Sons.

  • SILVERMAN, B.W. (1992), Density Estimation for Statistics and Data Analysis, CRC Monographs on Statistics and Applied Probability, London: Chapman and Hall.

  • ZIPF, G.K. (1968), The Psycho-Biology of Language. An Introduction to Dynamic Philology, Cambridge: The MIT Press.

    Google Scholar 

  • ZUO, W., WANG, K., ZHANG, H., and ZHANG, D. (2007), “Kernel Difference-Weighted k-Nearest Neighbors Classification”, Lecture Notes in Computer Science Vol. 4682, Berlin; Heidelberg: Springer-Verlag, pp. 861–870.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcel Jiřina.

Additional information

This work was supported by Technology Agency CR under project of series ALFA No. TA01010490 and by the Czech Technical University in Prague, Faculty of Information Technology, RVO: 68407700. We also thank the Institute of Computer Science of the Czech Academy of Sciences for its support in submitting application of patent for the classifier described.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiřina, M., Jiřina, M. Classification Using the Zipfian Kernel. J Classif 32, 305–326 (2015). https://doi.org/10.1007/s00357-015-9174-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-015-9174-2

Keywords

Navigation