Low-dimensional tracking of association structures in categorical data

Iodice D’Enza, Alfonso; Markos, Angelos

doi:10.1007/s11222-014-9470-4

Low-dimensional tracking of association structures in categorical data

Published: 22 April 2014

Volume 25, pages 1009–1022, (2015)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Alfonso Iodice D’Enza¹ &
Angelos Markos²

257 Accesses
4 Citations
Explore all metrics

Abstract

In modern applications, such as text mining and signal processing, large amounts of categorical data are produced at a high rate and are characterized by association structures changing over time. Multiple correspondence analysis (MCA) is a well established dimension reduction method to explore the associations within a set of categorical variables. A critical step of the MCA algorithm is a singular value decomposition (SVD) or an eigenvalue decomposition (EVD) of a suitably transformed matrix. The high computational and memory requirements of ordinary SVD and EVD make their application impractical on massive or sequential data sets. Several enhanced SVD/EVD approaches have been recently introduced in an effort to overcome these issues. The aim of the present contribution is twofold: (1) to extend MCA to a split-apply-combine framework, that leads to an exact and parallel MCA implementation; (2) to allow for incremental updates (downdates) of existing MCA solutions, which lead to an approximate yet highly accurate solution. For this purpose, two incremental EVD and SVD approaches with desirable properties are revised and embedded in the context of MCA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

A Systematic Review of Hidden Markov Models and Their Applications

Article 12 May 2020

References

Baglama, J., Reichel, L.: Augmented implicitly restarted Lanczos bidiagonalization methods. Siam. J. Sci. Comput. 27, 19–42 (2007)
Article MathSciNet Google Scholar
Baker, C., Gallivan, K., Van Dooren, P.: Low-rank incremental methods for computing dominant singular subspaces. Linear Algebra Appl. 436(8), 2866–2888 (2012)
Article MathSciNet MATH Google Scholar
Brand M.: Fast online svd revision for lightweigtht recommender systems. In Proceedings of SIAM International Conference on Data Mining, 37–46 (2003)
Brand, M.: Fast low-rank modifications of the thin singular value decomposition. Linear Algebra Appl. 415(1), 20–30 (2006)
Article MathSciNet MATH Google Scholar
Chahlaoui Y., Gallivan K., Van Dooren P.: An incremental method for computing dominant singular spaces, in M. W. Berry (ed.) Computational Information Retrieval, SIAM 53–62 (2001)
Chandrasekaran, S., Manjunth, B.S., Wang, Y.F., Winkeler, J., Zhang, H.: An eigenspace update algorithm for image analysis. Graph. Model Im. Proc. 59(5), 321–332 (1997)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplied Data Processing on Large Clusters. Commun. Acm. 51, 107–113 (2008)
Article Google Scholar
DeGroat, R.D., Roberts, R.: Efficient, numerically stablized rank-one eigenstructure updating. IEEE T Acoust Speech 38(2), 301–316 (1990)
Article Google Scholar
Fidler, S., Skocaj, D., Leonardis, A.: Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE T Pattern Anal. 28(3), 337–350 (2006)
Article Google Scholar
Gentry J.: twitteR: R based Twitter client. http://cran.r-project.org/web/packages/twitteR/ (2011)
Golub, G., van Loan, A.: Matrix Computations. John Hopkins U. Press, Baltimore (1996)
MATH Google Scholar
Greenacre, M.J.: Theory and Applications of Correspondence Analysis. Academic Press, London (1984)
MATH Google Scholar
Greenacre, M.J.: Correspondence Analysis in Practice, 2nd edn. Chapman and Hall/CRC, Boca Raton (2007)
Book MATH Google Scholar
Greenacre, M., Hastie, T.: Dynamic visualization of statistical learning in the context of high-dimensional textual data. J. Web. Semant. 8, 163–168 (2010)
Article Google Scholar
Gu, M., Eisenstat, S.C.: A stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. Siam. J. Matrix. Anal. A. 15, 1266–1276 (1994)
Article MathSciNet MATH Google Scholar
Hall, P., Marshall, D., Martin, R.: Adding and subtracting eigenspaces with eigenvalue decomposition and singular value decomposition. Image Vision Comput. 20, 1009–1016 (2002)
Article Google Scholar
Herbster, M., Warmuth, M.K.: Tracking the best linear predictor. J. Mach. Learn Res. 1, 281–309 (2001)
MathSciNet MATH Google Scholar
Hu M., Liu B.: Mining and summarizing customer reviews, 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 168–177 (2004)
Jackson, D.A.: PROTEST: a Procrustean randomization test of community environment concordance. Ecoscience 2, 297–303 (1995)
Google Scholar
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, Berlin (2002)
MATH Google Scholar
Levy, A., Lindenbaum, M.: Sequential Karhunen-Loeve basis extraction. IEEE T Image Process. 9(8), 1371–1374 (2000)
Article MATH Google Scholar
Lin, L., Shyu, M.L.: Weighted association rule mining for video semantic detection. Int. J. Multimed. Data Eng. Manag. 1(1), 37–54 (2010)
Google Scholar
Murakami, H., Kumar, B.V.: Efficient calculation of primary images from a set of images. IEEE Trans. Pattern Anal. Mach. Intell. 4(5), 511–515 (1982)
Article Google Scholar
Nenadić O., Greenacre M.J., Correspondence analysis in R, with two- and three-dimensional graphics: the ca package. J Stat Software 20, 1–13 (URL: http://www.jstatsoft.org/v20/i03/) (2007)
Google Scholar
Oksanen J., Kindt R., Legendre P., O’Hara B., Simpson G.L., Solymos P. et al.: Vegan: Community ecology package (2008)
Petrović, S., Bašic, B.D., Morin, A., Zupan, B.: Textual features for corpus visualization using correspondence analysis. Intell. Data. Anal. 13(5), 795–813 (2009)
Google Scholar
Pham, N.K., Morin, A., Gros, P., Le, Q.T.: Intensive use of correspondence analysis for large scale content-based image retrieval. Stud. Comp. Intell. 292, 57–76 (2010)
Article Google Scholar
Rao, C.R.: Maximum likelihood estimation for the multinomial distribution. Sankhya Indian J. Stat. 18(1), 139–148 (1957)
MATH Google Scholar
Ross, D., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77, 125–141 (2008)
Article Google Scholar
Roweis, S.: EM algorithms for PCA and SPCA. Advances in neural information processing systems, pp. 626–632 (1998)
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. Series B Stat. Methodol. 61(3), 611–622 (1999)
Article MathSciNet MATH Google Scholar
Wickam H.: ggplot2: An implementation of the Grammar of Graphics. R package version 0.8.2 (2009)
Wickam, H.: A split-apply-combine strategy for data analysis. J. Stat. Softw. 11(1), 1–29 (2011)
Google Scholar
Zhu Q., Lin L., Shyu M.L., Chen S.C.: Effective supervised discretization for classification based on correlation Maximization, 12th IEEE international conference on information reuse and integration (IRI 2011), Las Vegas, Nevada, USA, 390–395 (2011)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat 15, 265–286 (2006)
Article MathSciNet Google Scholar

Download references

Acknowledgments

The authors are grateful to all the anonymous reviewers for their helpful comments and suggestions on an earlier version of this paper. Many thanks are due to Yoshio Takane, George Menexes and Hervé Abdi for their help and constructive feedback

Author information

Authors and Affiliations

Department of Economics and Law, Università di Cassino e del Lazio Meridionale, Cassino, Italy
Alfonso Iodice D’Enza
Department of Primary Education, Democritus University of Thrace, Alexandroupoli, Greece
Angelos Markos

Authors

Alfonso Iodice D’Enza
View author publications
You can also search for this author in PubMed Google Scholar
Angelos Markos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alfonso Iodice D’Enza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iodice D’Enza, A., Markos, A. Low-dimensional tracking of association structures in categorical data. Stat Comput 25, 1009–1022 (2015). https://doi.org/10.1007/s11222-014-9470-4

Download citation

Received: 22 April 2013
Accepted: 31 March 2014
Published: 22 April 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s11222-014-9470-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low-dimensional tracking of association structures in categorical data

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

A Systematic Review of Hidden Markov Models and Their Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Low-dimensional tracking of association structures in categorical data

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

A Systematic Review of Hidden Markov Models and Their Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation