Abstract
The discovery of structures hidden in high-dimensional data space is of great significance for understanding and further processing of the data. Real world datasets are often composed of multiple low dimensional patterns, the interlacement of which may impede our ability to understand the distribution rule of the data. Few of the existing methods focus on the detection and extraction of the manifolds representing distinct patterns. Inspired by the nonlinear dimensionality reduction method ISOmap, in this paper we present a novel approach called Multi-Manifold Partition to identify the interlacing low dimensional patterns. The algorithm has three steps: first a neighborhood graph is built to capture the intrinsic topological structure of the input data, then the dimensional uniformity of neighboring nodes is analyzed to discover the segments of patterns, finally the segments which are possibly from the same low-dimensional structure are combined to obtain a global representation of distribution rules. Experiments on synthetic data as well as real problems are reported. The results show that this new approach to exploratory data analysis is effective and may enhance our understanding of the data distribution.
Similar content being viewed by others
References
Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59, 291–294.
Bruske, J., & Sommer, G. (1998). An algorithm for intrinsic dimensionality estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5), 572–575.
Camastra, F., & Vinciarelli, A. (2002). Estimating the intrinsic dimension of data with a fractal-based method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(10), 1404–1407.
Chang, H. D., & Wang, J. F. (1994). A robust stroke extraction method for handwritten Chinese characters. International Journal of Pattern Recognition and Artificial Intelligence, 8(5), 1223–1239.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2001). Introduction to algorithms (2nd ed., pp. 595–601). Cambridge: MIT and McGraw-Hill.
Costa, J., & Hero, A. O. (2004). Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Transactions on Signal Processing, 52(8), 2210–2221.
Cox, T., & Cox, M. (1994). Multidimensional scaling. London: Chapman & Hall.
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification (2nd ed., pp. 517–556). New York: Wiley.
Fukunaga, K., & Olsen, D. R. (1971). An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 20, 176–183.
Hyvarinen, A., & Oja, E. (2000). Independent component analysis: Algorithms and applications. Neural Networks, 13, 411–430.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs: Prentice Hall.
Jolliffe, I. T. (1986). Principal component analysis. New York: Springer-Verlag.
Kambhatla, N., & Leen, T. K. (1997). Dimension reduction by local principal component analysis. Neural Computation, 9(7), 1493–1516.
Kégl, B. (2003). Intrinsic dimension estimation using packing numbers. In T. G. Dietterich (Ed.), Advances in neural information processing systems 14 (NIPS2002). Cambridge: MIT.
Kohonen, T. (2001). Self-organizing maps, third extended edition, Springer series in information sciences (Vol. 30). Berlin, Heidelberg, New York: Springer.
Kumar, V., Grama, A., Gupta, A., & Karypis, G. (1994). Introduction to parallel computing: Design and analysis of algorithms (pp. 257–297). Redwood City: Benjamin/Cummings.
Levina, E., & Bickel, P. J. (2005). Maximum likelihood estimation of intrinsic dimension. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (NIPS2004). Cambridge: MIT.
Pettis, K., Bailey, I., Jain, T., & Dubes, R. (1979). An intrinsic dimensionality estimator from near-neighbor information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 25–37.
Raginsky, M., & Lazebnik, S. (2006). Estimation of intrinsic dimensionality using high-rate vector quantization. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in neural information processing systems 18 (NIPS2005). Cambridge: MIT.
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326.
Sklansky, J., & Wassel, G. N. (1981). Pattern classifiers and trainable machines (pp. 112–113). New York: Springer-Verlag.
Tenenbaum, J. B., Silvam, V. de., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323.
Verveer, P. J., & Duin, R. P. W. (1995). An evaluation of intrinsic dimensionality estimators. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 81–86.
Zeng, J., & Liu, Z. Q. (2006). Stroke segmentation of Chinese characters using Markov random fields. In Proceedings of 18th international conference on pattern recognition (ICPR’06), (pp. 868–871).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ban, T., Zhang, C. & Abe, S. A new approach to discover interlacing data structures in high-dimensional space. J Intell Inf Syst 33, 3–22 (2009). https://doi.org/10.1007/s10844-008-0055-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-008-0055-6