Collective Principal Component Analysis from Distributed, Heterogeneous Data

Kargupta, Hillol; Huang, Weiyun; Sivakumar, Krishnamoorthy; Park, Byung-Hoon; Wang, Shuren

doi:10.1007/3-540-45372-5_50

Hillol Kargupta⁴,
Weiyun Huang⁴,
Krishnamoorthy Sivakumar⁴,
Byung-Hoon Park⁴ &
…
Shuren Wang⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1910))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3073 Accesses
15 Citations

Abstract

Principal component analysis (PCA) is a statistical technique to identify the dependency structure of multivariate stochastic observations. PCA is frequently used in data mining applications. This paper considers PCA in the context of the emerging network-based computing environments. It offers a technique to perform PCA from distributed and heterogeneous data sets with relatively small communication overhead. The technique is evaluated against different data sets, including a data set for a web mining application. This approach is likely to facilitate the development of distributed clustering, associative link analysis, and other heterogeneous data mining applications that frequently use PCA.

Download to read the full chapter text

Chapter PDF

Principal Component Analysis for Exponential Family Data

The Alternating Least-Squares Algorithm for CDPCA

Principal Component Analysis

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Boley, D.: Principal direction divisive partitioning. Data Mining and Knowledge Discovery 2 (1998) 325–344
Article Google Scholar
Faloutsos, C., Korn, F., Labrinidis, A., Kotidis, Y., Kaplunovich, A., Perkovic, D.: Quantifiable data mining using principal component analysis. Technical report (1997) Institute for Systems Research, University of Maryland technical Report TR 97–25.
Google Scholar
Golub, G.H., Loan, C.F.V.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1989)
Google Scholar
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. Third edn. Society for Industrial & Applied Mathematics (1999)
Google Scholar
Chan, P.K., Stolfo, S.J.: Sharing learned models among remote database partitions by local meta-learning. In Simoudis, E., Han, J., Fayyad, U., eds.: The Second International Conference on Knowledge Discovery and Data Mining, AAAI Press (1996) 2–7
Google Scholar
Grossman, R., Bailey, S., Kasif, S., Mon, D., Ramu, A., Malhi, B.: The preliminary design of papyrus: A system for high performance, distributed data mining over clusters, meta-clusters and super-clusters. Fourth International Conference of Knowledge Discovery and Data Mining, New York, New York, Pages 37–43 (1998)
Google Scholar
Kargupta, H., Park, B., Hershbereger, D., Johnson, E.: Collective data mining: A new perspective toward distributed data mining. To be published in the Advances in Distributed and Parallel Knowledge Discovery, Eds: Hillol Kargupta and Philip Chan, AAAI/MIT Press (1999)
Google Scholar
Jackson, J.E.: A User’s Guide to Principal Components. John Wiley (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, Washington State University Pullman, WA 99164-2752, USA
Hillol Kargupta, Weiyun Huang, Krishnamoorthy Sivakumar, Byung-Hoon Park & Shuren Wang

Authors

Hillol Kargupta
View author publications
You can also search for this author in PubMed Google Scholar
Weiyun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Krishnamoorthy Sivakumar
View author publications
You can also search for this author in PubMed Google Scholar
Byung-Hoon Park
View author publications
You can also search for this author in PubMed Google Scholar
Shuren Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, Norwegian University of Science and Technology, O.S. Bragstads plass 2E, 7491, Trondheim, Norway
Jan Komorowski
Department of Computer Science, University of North Carolina, Charlotte, NC 28223, USA
Jan Żytkow
Laboratoire ERIC, Université Lyon 2, 5 avenue Pierre Mendès-France, 69676, Bron, France
Djamel A. Zighed

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kargupta, H., Huang, W., Sivakumar, K., Park, BH., Wang, S. (2000). Collective Principal Component Analysis from Distributed, Heterogeneous Data. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_50

Download citation

DOI: https://doi.org/10.1007/3-540-45372-5_50
Published: 18 July 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics