Abstract
Principal component analysis (PCA) is a statistical technique to identify the dependency structure of multivariate stochastic observations. PCA is frequently used in data mining applications. This paper considers PCA in the context of the emerging network-based computing environments. It offers a technique to perform PCA from distributed and heterogeneous data sets with relatively small communication overhead. The technique is evaluated against different data sets, including a data set for a web mining application. This approach is likely to facilitate the development of distributed clustering, associative link analysis, and other heterogeneous data mining applications that frequently use PCA.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Boley, D.: Principal direction divisive partitioning. Data Mining and Knowledge Discovery 2 (1998) 325–344
Faloutsos, C., Korn, F., Labrinidis, A., Kotidis, Y., Kaplunovich, A., Perkovic, D.: Quantifiable data mining using principal component analysis. Technical report (1997) Institute for Systems Research, University of Maryland technical Report TR 97–25.
Golub, G.H., Loan, C.F.V.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1989)
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. Third edn. Society for Industrial & Applied Mathematics (1999)
Chan, P.K., Stolfo, S.J.: Sharing learned models among remote database partitions by local meta-learning. In Simoudis, E., Han, J., Fayyad, U., eds.: The Second International Conference on Knowledge Discovery and Data Mining, AAAI Press (1996) 2–7
Grossman, R., Bailey, S., Kasif, S., Mon, D., Ramu, A., Malhi, B.: The preliminary design of papyrus: A system for high performance, distributed data mining over clusters, meta-clusters and super-clusters. Fourth International Conference of Knowledge Discovery and Data Mining, New York, New York, Pages 37–43 (1998)
Kargupta, H., Park, B., Hershbereger, D., Johnson, E.: Collective data mining: A new perspective toward distributed data mining. To be published in the Advances in Distributed and Parallel Knowledge Discovery, Eds: Hillol Kargupta and Philip Chan, AAAI/MIT Press (1999)
Jackson, J.E.: A User’s Guide to Principal Components. John Wiley (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kargupta, H., Huang, W., Sivakumar, K., Park, BH., Wang, S. (2000). Collective Principal Component Analysis from Distributed, Heterogeneous Data. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_50
Download citation
DOI: https://doi.org/10.1007/3-540-45372-5_50
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive