Integer Matrix Approximation and Data Mining

Dong, Bo; Lin, Matthew M.; Park, Haesun

doi:10.1007/s10915-017-0531-7

Integer Matrix Approximation and Data Mining

Published: 08 September 2017

Volume 75, pages 198–224, (2018)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

Bo Dong¹,
Matthew M. Lin² &
Haesun Park³

603 Accesses
4 Citations
Explore all metrics

Abstract

Integer datasets frequently appear in many applications in science and engineering. To analyze these datasets, we consider an integer matrix approximation technique that can preserve the original dataset characteristics. Because integers are discrete in nature, to the best of our knowledge, no previously proposed technique developed for real numbers can be successfully applied. In this study, we first conduct a thorough review of current algorithms that can solve integer least squares problems, and then we develop an alternative least square method based on an integer least squares estimation to obtain the integer approximation of the integer matrices. We discuss numerical applications for the approximation of randomly generated integer matrices as well as studies of association rule mining, cluster analysis, and pattern extraction. Our computed results suggest that our proposed method can calculate a more accurate solution for discrete datasets than other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust regression via error tolerance

Article Open access 27 January 2022

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

Article 02 August 2021

Inverse integer optimization with multiple observations

Article 23 March 2021

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min Knowl. Discov. 11(1), 5–33 (2005)
Article MathSciNet Google Scholar
Agrell, E., Eriksson, T., Vardy, A., Zeger, K.: Closest point search in lattices. IEEE Trans. Inf. Theory 48(8), 2201–2214 (2002). doi:10.1109/TIT.2002.800499
Article MathSciNet MATH Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
Banerjee, A., Krumpelman, C., Ghosh, J., Basu, S., Mooney, R.J.: Model-based overlapping clustering. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, pp. 532–537. ACM, New York, NY, USA (2005). doi:10.1145/1081870.1081932
Breen, S., Chang, X.W.: Column reordering for box-constrained integer least squares problems. arXiv:1204.1407 (2012)
CBCL face database \(\sharp \) 1, MIT center for biological and computation learning (2000). http://www.ai.mit.edu/projects/cbcl
Chang, X.W., Golub, G.H.: Solving ellipsoid-constrained integer least squares problems. SIAM J. Matrix Anal. Appl. 31(3), 1071–1089 (2009). doi:10.1137/060660680
Article MathSciNet MATH Google Scholar
Chang, X.W., Han, Q.: Solving box-constrained integer least squares problems. IEEE Trans. Wirel. Commun. 7(1), 277–287 (2008). doi:10.1109/TWC.2008.060497
Article Google Scholar
Chang, X.W., Yang, X.: An efficient regularization approach for underdetermined mimo system decoding (2007). http://www.cs.mcgill.ca/~chang/pub/ChaY07a.pdf
Chowdhury, G.G.: Introduction to Modern Information Retrieval, 2nd edn. Facet Publishing, Abingdon (2004)
Google Scholar
Chu, M.T., Lin, M.M.: Low-dimensional polytope approximation and its applications to nonnegative matrix factorization. SIAM J. Sci. Comput. 30(3), 1131–1155 (2008). doi:10.1137/070680436
Article MathSciNet MATH Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, Hoboken (2006)
MATH Google Scholar
Donoho, D., Stodden, V.: When does nonnegative matrix factorization give a correct decomposition into parts? In: Proceedings of 17th Annual Conference on Neural Information Processing Systems. NIPS, Stanford University, Stanford, CA, 2003 (2003)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95(25), 14863–14868 (1998)
Article Google Scholar
Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2765–2781 (2013)
Article Google Scholar
Filipovych, R., Resnick, S.M., Davatzikos, C.: Semi-supervised cluster analysis of imaging data. NeuroImage 54(3), 2185–2197 (2011)
Article Google Scholar
Gillis, N., Glineur, F.: Using underapproximations for sparse nonnegative matrix factorization. Pattern Recogn. 43(4), 1676–1687 (2010). doi:10.1016/j.patcog.2009.11.013. http://www.sciencedirect.com/science/article/pii/S0031320309004324
Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences, 4th edn. Johns Hopkins University Press, Baltimore (2013)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hassibi, A., Boyd, S.: Integer parameter estimation in linear models with applications to gps. In: Proceedings of the 35th IEEE Conference on Decision and Control, 1996, vol. 3, pp. 3245–3251 vol.3 (1996). doi:10.1109/CDC.1996.573639
Houseman, E.A., Accomando, W.P., Koestler, D.C., Christensen, B.C., Marsit, C.J., Nelson, H.H., Wiencke, J.K., Kelsey, K.T.: Dna methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinform. 13, 86 (2012). doi:10.1186/1471-2105-13-86
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)
MathSciNet MATH Google Scholar
Kabán, A., Bingham, E.: Factorisation and denoising of 0–1 data: a variational approach. Neurocomputing 71(10), 2291–2308 (2008)
Article Google Scholar
Kannan, R., Ishteva, M., Park, H.: Bounded matrix low rank approximation. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 319–328 (2012). doi:10.1109/ICDM.2012.131
Kawamoto, T., Hotta, K., Mishima, T., Fujiki, J., Tanaka, M., Kurita, T.: Estimation of single tones from chord sounds using non-negative matrix factorization. Neural Netw. World 3, 429–436 (2000)
Google Scholar
Kim, H., Park, H.: Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J. Matrix Anal. Appl. 30(2), 713–730 (2008)
Article MathSciNet MATH Google Scholar
Kim, H., Park, H., Drake, B.L.: Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations. BMC Bioinform. 9, 211 (2008). doi:10.1186/1471-2105-9-211
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Global Optim. 58(2), 285–319 (2014). doi:10.1007/s10898-013-0035-4
Article MathSciNet MATH Google Scholar
Kim, J., Park, H.: Fast nonnegative matrix factorization: an active-set-like method and comparisons. SIAM J. Sci. Comput. 33(6), 3261–3281 (2011). doi:10.1137/110821172
Article MathSciNet MATH Google Scholar
Koyutürk, M., Grama, A.: Proximus: a framework for analyzing very high dimensional discrete-attributed datasets. In: KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 147–156. ACM, New York, NY, USA (2003). doi:10.1145/956750.956770
Koyuturk, M., Grama, A., Ramakrishnan, N.: Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets. IEEE Trans. Knowl. Data Eng. 17(4), 447–461 (2005). doi:10.1109/TKDE.2005.55
Article Google Scholar
Koyutürk, M., Grama, A., Ramakrishnan, N.: Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis. ACM Trans. Math. Softw. 32(1), 33–69 (2006). doi:10.1145/1132973.1132976
Article MathSciNet MATH Google Scholar
Liao, J.C., Boscolo, R., Yang, Y.L., Tran, L.M., Sabatti, C., Roychowdhury, V.P.: Network component analysis: reconstruction of regulatory signals in biological systems. In: Proceedings of the National Academy of Sciences 100(26), 15522–15527 (2003). doi:10.1073/pnas.2136632100. http://www.pnas.org/content/100/26/15522.abstract
Lin, M.M.: Discrete eckart-young theorem for integer matrices. SIAM J. Matrix Anal. Appl. 32(4), 1367–1382 (2011). doi:10.1137/10081099X
Article MathSciNet MATH Google Scholar
Meeds, E., Ghahramani, Z., Neal, R.M., Roweis, S.T.: Modeling dyadic data with binary latent factors. Adv. Neural Inf. Process. Syst. 19, 977 (2007)
Google Scholar
Morgan, S.D.: Cluster analysis in electronic manufacturing. Ph.D. dissertation, North Carolina State University, Raleigh, NC 27695 (2001)
Mow, W.H.: Universal lattice decoding: a review and some recent results. In: 2004 IEEE International Conference on Communications, vol. 5, pp. 2842–2846 (2004). doi:10.1109/ICC.2004.1313048
Segal, E., Battle, A., Koller, D.: Decomposing gene expression into cellular processes. In: In Proceedings of the 8th Pacific Symposium on Biocomputing (2003)
Slawski, M., Hein, M., Lutsik, P.: Matrix factorization with binary components. In: Advances in Neural Information Processing Systems, pp. 3210–3218 (2013)
Su, K., Wassell, I.: A new ordering for efficient sphere decoding. In: 2005 IEEE International Conference on Communications, ICC 2005, vol. 3, pp. 1906–1910 (2005). doi:10.1109/ICC.2005.1494671
Tu, S., Chen, R., Xu, L.: Transcription network analysis by a sparse binary factor analysis algorithm. J. Integr. Bioinform. 9(2), 68–79 (2012)
van der Veen, A.J.: Analytical method for blind binary signal separation. IEEE Trans. Signal Process. 45(4), 1078–1082 (1997). doi:10.1109/78.564198
Article Google Scholar
van Emde-Boas, P.: Another NP-complete partition problem and the complexity of computing short vectors in a lattice. Report. Department of Mathematics. University of Amsterdam. Department, University (1981). http://books.google.com.tw/books?id=tCQiHQAACAAJ
Zhang, Z., Li, T., Ding, C., Zhang, X.: Binary matrix factorization with applications. In: Seventh IEEE International Conference on Data Mining, 2007, ICDM 2007, pp. 391–400. IEEE (2007)

Download references

Author information

Authors and Affiliations

School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, Liaoning, China
Bo Dong
Department of Mathematics, National Cheng Kung University, Tainan, 701, Taiwan
Matthew M. Lin
School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0765, USA
Haesun Park

Authors

Bo Dong
View author publications
You can also search for this author in PubMed Google Scholar
Matthew M. Lin
View author publications
You can also search for this author in PubMed Google Scholar
Haesun Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew M. Lin.

Additional information

Bo Dong research was supported in part by the National Natural Science Foundation of China under Grant 11101067 and the Fundamental Research Funds for the Central Universities. Matthew M. Lin research was supported in part by the National Center for Theoretical Sciences of Taiwan and by the Ministry of Science and Technology of Taiwan under Grants 104-2115-M-006-017-MY3 and 105-2634-E-002-001. Haesun Park research was supported in part by the Defense Advanced Research Projects Agency (DARPA) XDATA program Grant FA8750-12-2-0309 and NSF Grants CCF-0808863, IIS-1242304, and IIS-1231742.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, B., Lin, M.M. & Park, H. Integer Matrix Approximation and Data Mining. J Sci Comput 75, 198–224 (2018). https://doi.org/10.1007/s10915-017-0531-7

Download citation

Received: 26 September 2016
Revised: 20 June 2017
Accepted: 02 August 2017
Published: 08 September 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10915-017-0531-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integer Matrix Approximation and Data Mining

Abstract

Access this article

Similar content being viewed by others

Robust regression via error tolerance

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

Inverse integer optimization with multiple observations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integer Matrix Approximation and Data Mining

Abstract

Access this article

Similar content being viewed by others

Robust regression via error tolerance

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

Inverse integer optimization with multiple observations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation