Exploratory Data Analysis through the Inspection of the Probability Density Function of the Number of Neighbors

Neme, Antonio; Nido, Antonio

doi:10.1007/978-3-642-41398-8_27

Antonio Neme^19,20 &
Antonio Nido²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8207))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

2485 Accesses

Abstract

Exploratory data analysis is a fundamental stage in data mining of high-dimensional datasets. Several algorithms have been implemented to grasp a general idea of the geometry and patterns present in high-dimensional data. Here, we present a methodology based on the distance matrix of the input data. The algorithm is based in the number of points considered to be neighbors of each input vector. Neighborhood is defined in terms of an hypersphere of varying radius, and from the distance matrix the probability density function of the number of neighbor vectors is computed. We show that when the radius of the hypersphere is systematically increased, a detailed analysis of the probability density function of the number of neighbors unfolds relevant aspects of the overall features that describe the high-dimensional data. The algorithm is tested with several datasets and we show its pertinence as an exploratory data analysis tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Neighborhood Selection for Dimensionality Reduction

What are Clusters in High Dimensions and are they Difficult to Find?

Cluster Analysis of Data with Reduced Dimensionality: An Empirical Study

References

Dasu, T., Johnson, T.: Exploratory data mining and data cleaning. Wiley (2003)
Google Scholar
Basford, K.E., Tukey, J.: Graphical analysis of multiresponse data. Chapman & Hall/CRC (1998)
Google Scholar
Morgenthaler, S.: Exploratory data analysis. WIREs Computational Statistics 1, 33–44 (2009)
Article Google Scholar
Martinez, W., Martinez, W.: Exploratory data analysis with Matlab. Chapman & Hall / CRC (2005)
Google Scholar
Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high-dimensional data. In: New Vistas in Statistical Physics: Applications in Econophysics, Bioinformatics, and Pattern Recognition (2003)
Google Scholar
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. on Knowledge Discovery from Data 3(1), Article 1 (2009)
Google Scholar
Berthold, M., Wiswedel, B., Patterson, D.: Interactive exploration of fuzzy clusters using Neighborgrams Fuzzy Sets and Systems, vol. 149, pp. 21–37 (2005)
Google Scholar
Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and applications, 2nd edn. Springer (2005)
Google Scholar
Vesanto, J., Sulkava, M.: Distance Matrix Based Clustering of the Self-Organizing Map. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 951–956. Springer, Heidelberg (2002)
Chapter Google Scholar
Brim, S.: Near neighbor search in large metric spaces. In: Proc. 21st VLDB Conf., Zürich, Switzerland, pp. 574–584 (1995)
Google Scholar
Cha, S.H.: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. Int. J. of Mathematical Models and Methods in Applied Sciences 4(1), 300–307 (2007)
Google Scholar
Brough, R., Frankum, J., Sims, D.: Functional viability profiles of breast cancer. Cancer Discovery 1, 260–273 (2011)
Article Google Scholar
Blake, C.L., Merz, C.U.: Repository of machine learning databases University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/mlearn/MLRepository.html
Garcia-Vallve, S., Romeu, A., Palau, J.: Horizontal Gene Transfer in Bacterial and Archaeal Complete Genomes. Genome Res. 10, 1719–1725 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Network Medicine Group, Institute of Molecular Medicine, Finland (FIMM), Tukholmankatu 8, Helsinki, Finland
Antonio Neme
Complex Systems Group, Universidad Autónoma de la Ciudad de México, San Lorenzo 290, México, D.F., México
Antonio Neme & Antonio Nido

Authors

Antonio Neme
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Nido
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Systems, Computing and Mathematics, Brunel University, UB8 3PH, Uxbridge, Middlesex, UK
Allan Tucker & Stephen Swift &
Faculty of Computer Science/IT, Ostfalia University of Applied Sciences, Am Exer 2, 38302, Wolfenbüttel, Germany
Frank Höppner
Faculty of Science, Department of Information and Computing Science, Buys Ballot Laboratory, Universiteit Utrecht, Princetonplein 5, 3584 CC, Utrecht, The Netherlands
Arno Siebes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neme, A., Nido, A. (2013). Exploratory Data Analysis through the Inspection of the Probability Density Function of the Number of Neighbors. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds) Advances in Intelligent Data Analysis XII. IDA 2013. Lecture Notes in Computer Science, vol 8207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41398-8_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-41398-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41397-1
Online ISBN: 978-3-642-41398-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics