Descriptive matrix factorization for sustainability Adopting the principle of opposites

Thurau, Christian; Kersting, Kristian; Wahabzada, Mirwaes; Bauckhage, Christian

doi:10.1007/s10618-011-0216-z

Descriptive matrix factorization for sustainability Adopting the principle of opposites

Published: 02 March 2011

Volume 24, pages 325–354, (2012)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Christian Thurau¹,
Kristian Kersting¹,
Mirwaes Wahabzada¹ &
…
Christian Bauckhage¹

618 Accesses
46 Citations
Explore all metrics

Abstract

Climate change, the global energy footprint, and strategies for sustainable development have become topics of considerable political and public interest. The public debate is informed by an exponentially growing amount of data and there are diverse partisan interest when it comes to interpretation. We therefore believe that data analysis methods are called for that provide results which are intuitively understandable even to non-experts. Moreover, such methods should be efficient so that non-experts users can perform their own analysis at low expense in order to understand the effects of different parameters and influential factors. In this paper, we discuss a new technique for factorizing data matrices that meets both these requirements. The basic idea is to represent a set of data by means of convex combinations of extreme data points. This often accommodates human cognition. In contrast to established factorization methods, the approach presented in this paper can also determine over-complete bases. At the same time, convex combinations allow for highly efficient matrix factorization. Based on techniques adopted from the field of distance geometry, we derive a linear time algorithm to determine suitable basis vectors for factorization. By means of the example of several environmental and developmental data sets we discuss the performance and characteristics of the proposed approach and validate that significant efficiency gains are obtainable without performance decreases compared to existing convexity constrained approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable Selection in the Regularized Simultaneous Component Analysis Method for Multi-Source Data Integration

Article Open access 09 December 2019

Functional principal component analysis for multivariate multidimensional environmental data

Article 21 April 2015

Connecting the multivariate partial least squares with canonical analysis: a path-following approach

Article 16 August 2019

References

Achlioptas D, McSherry F (2007) Fast computation of low-rank matrix approximations. J ACM 54(9): 1–19
MathSciNet Google Scholar
Aguilar O, Huerta G, Prado R, West M (1998) Bayesian inference on latent structure in time series. In: Bernardo J, Bergen J, Dawid A, Smith A (eds) Bayesian statistics. Oxford University Press, Oxford
Google Scholar
Blumenthal LM (1953) Theory and applications of distance geometry. Oxford University Press, Oxford
MATH Google Scholar
Chan B, Mitchell D, Cram L (2003) Archetypal analysis of galaxy spectra. Mon Not R Astron Soc 338(3): 790–795
Article Google Scholar
Chang CI, Wu CC, Liu WM, Ouyang YC (2006) A new growing method for simplex-based endmember extraction algorithm. IEEE T Geosci Remote 44(10): 2804–2819
Article Google Scholar
Crippen G (1988) Distance geometry and molecular conformation. Wiley, New York
MATH Google Scholar
Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4): 338–347
Article MATH MathSciNet Google Scholar
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1): 107–113
Article Google Scholar
Ding C, Li T, Jordan M (2010) Convex and semi-nonnegative matrix factorizations. IEEE T Pattern Anal 32(1): 45–55
Article Google Scholar
Drineas P, Kannan R, Mahoney M (2006) Fast Monte Carlo algorithms III: computing a compressed approixmate matrix decomposition. SIAM J Comput 36(1): 184–206
Article MATH MathSciNet Google Scholar
Faloutsos C, Lin KI (1995) FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of the ACM SIGMOD international conference on management of data, San Diego
Foster D, Nascimento S, Amano K (2004) Information limits on neural identification of coloured surfaces in natural scenes. Visual Neurosci 21: 331–336
Article Google Scholar
Gomes C (2009) Computational sustainability. The Bridge, National Academy of Engineering 39(4): 6–11
Google Scholar
Goreinov SA, Tyrtyshnikov EE (2001) The maximum-volume concept in approximation by low-rank matrices. Contemp Math 280: 47–51
MathSciNet Google Scholar
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(7): 498–520
Article Google Scholar
Kersting K, Wahabzada M, Thurau C, Bauckhage C (2010) Hierarchical convex NMF for clustering massive data. In: Proceedings of the 2nd Asian Conference on Machine Learning (ACML-10)
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755): 788–799
Article Google Scholar
Lucas A, Klaassen P, Spreij P, Straetmans S (2003) Tail behaviour of credit loss distributions for general latent factor models. Appl Math Finance 10(4): 337–357
Article MATH Google Scholar
MacKay D (2009) Sustainable energy—without the hot air. UIT Cambridge Ltd, Cambridge
Google Scholar
Miao L, Qi H (2007) Endmember extraction from highly mixed data using minimum volume constrained nonnegative matrix factorization. IEEE T Geosci Remote 45(3): 765–777
Article Google Scholar
Nascimento JMP, Dias JMB (2005) Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE T Geosci Remote 43(4): 898–910
Article Google Scholar
Ostrouchov G, Samatova N (2005) On fastmap and the convex hull of multivariate data: toward fast and robust dimension reduction. IEEE T Pattern Anal 27(8): 1340–1434
Article Google Scholar
Sippl M, Sheraga H (1986) Cayley-Menger coordinates. Proc Natl Acad Sci 83(8): 2283–2287
Article MATH Google Scholar
Spearman C (1904) General intelligence objectively determined and measured. Am J Psychol 15: 201–293
Article Google Scholar
Thurau C, Kersting K, Bauckhage C (2009) Convex non-negative matrix factorization in the wild. In: Proceedings of the IEEE International Conference on Data Mining, Miami
Thurau C, Kersting K, Wahabzada M, Bauckhage C (2010) Convex non-negative matrix factorization for massive datasets. Knowl Inf Syst (KAIS). doi:10.1007/s10115-010-0352-6
Winter ME (1999) N-FINDR: an algorithm for fast and autonomous spectral endmember determination in hyperspectral data. In: Proceedings of the International Conference on Applied Geologic Remote Sensing, Vancouver

Download references

Author information

Authors and Affiliations

Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, Sankt Augustin, Germany
Christian Thurau, Kristian Kersting, Mirwaes Wahabzada & Christian Bauckhage

Authors

Christian Thurau
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Kersting
View author publications
You can also search for this author in PubMed Google Scholar
Mirwaes Wahabzada
View author publications
You can also search for this author in PubMed Google Scholar
Christian Bauckhage
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Thurau.

Additional information

Responsible editor: Katharina Morik, Kanishka Bhaduri and Hillol Kargupta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thurau, C., Kersting, K., Wahabzada, M. et al. Descriptive matrix factorization for sustainability Adopting the principle of opposites. Data Min Knowl Disc 24, 325–354 (2012). https://doi.org/10.1007/s10618-011-0216-z

Download citation

Received: 05 June 2010
Accepted: 05 February 2011
Published: 02 March 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10618-011-0216-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Descriptive matrix factorization for sustainability Adopting the principle of opposites

Abstract

Access this article

Similar content being viewed by others

Variable Selection in the Regularized Simultaneous Component Analysis Method for Multi-Source Data Integration

Functional principal component analysis for multivariate multidimensional environmental data

Connecting the multivariate partial least squares with canonical analysis: a path-following approach

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Descriptive matrix factorization for sustainability Adopting the principle of opposites

Abstract

Access this article

Similar content being viewed by others

Variable Selection in the Regularized Simultaneous Component Analysis Method for Multi-Source Data Integration

Functional principal component analysis for multivariate multidimensional environmental data

Connecting the multivariate partial least squares with canonical analysis: a path-following approach

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation