Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

Trapp, Andrew C.; Li, Chao; Flaherty, Patrick

doi:10.1007/s10479-016-2173-9

Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

Data Mining and Analytics
Published: 25 March 2016

Volume 263, pages 385–404, (2018)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Andrew C. Trapp¹,
Chao Li² &
Patrick Flaherty³

255 Accesses
Explore all metrics

Abstract

Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (J Comput Biol 10(3–4):373–384, 2003) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is ${{\mathscr {N}}}{{\mathscr {P}}}$-hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulations to global optimality. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse clusterability: testing for cluster structure in high dimensions

Article Open access 31 March 2023

Clustering: A Novel Meta-Analysis Approach for Differentially Expressed Gene Detection

A Comparative Analysis of Clustering and Biclustering Algorithms in Gene Analysis

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Ben-Dor, A., Chor, B., Karp, R., & Yakhini, Z. (2003). Discovering local structure in gene expression data: The order-preserving submatrix problem. Journal of Computational Biology, 10(3–4), 373–384.
Article Google Scholar
Busygin, S., Prokopyev, O., & Pardalos, P. M. (2008). Biclustering in data mining. Computers & Operations Research, 35(9), 2964–2987.
Article Google Scholar
Causton, H. C., Ren, B., Koh, S. S., Harbison, C. T., Kanin, E., Jennings, E. G., et al. (2001). Remodeling of yeast genome expression in response to environmental changes. Molecular Biology of the Cell, 12(2), 323–337.
Article Google Scholar
Chui, C. K., Kao, B., Yip, K. Y., & Lee, S.D. (2008). Mining order-preserving submatrices from data with repeated measurements. In The 8th IEEE international conference on data mining (ICDM) (pp. 133–142). IEEE.
Cooper, S. J., Trinklein, N. D., Anton, E. D., Nguyen, L., & Myers, R. M. (2006). Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Research, 16(1), 1–10.
Article Google Scholar
Fang, Q., Ng, W., Feng, J., & Li, Y. (2012). Mining bucket order-preserving submatrices in gene expression data. IEEE Transactions on Knowledge and Data Engineering, 24(12), 2218–2231.
Article Google Scholar
Fang, Q., Ng, W., Feng, J., & Li, Y. (2014). Mining order-preserving submatrices from probabilistic matrices. ACM Transactions on Database Systems, 39(1), 1–43.
Article Google Scholar
Gao, B. J., Griffith, O. L., Ester, M., & Jones, S. J. (2006). Discovering significant OPSM subspace clusters in massive gene expression data. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 922–928). New York, NY, Philadelphia, PA: ACM.
Gao, B. J., Griffith, O. L., Ester, M., Xiong, H., Zhao, Q., & Jones, S. J. (2012). On the deep order-preserving submatrix problem: A best effort approach. IEEE Transactions on Knowledge and Data Engineering, 24(2), 309–325.
Article Google Scholar
Griffith, O. L., Gao, B. J., Bilenky, M., Prychyna, Y., Ester, M., & Jones, S. J. (2009). KiWi: A scalable subspace clustering algorithm for gene expression analysis. In Proceedings of the 3rd international conference on bioinformatics and biomedical engineering (iCBBE) (pp. 1–9). IEEE.
Hochbaum, D. S., & Levin, A. (2013). Approximation algorithms for a minimization variant of the order-preserving submatrices and for biclustering problems. ACM Transactions on Algorithms, 9(2), 1–12.
Article Google Scholar
Humrich, J., Gartner, T., & Garriga, G. C. (2011). A fixed parameter tractable integer program for finding the maximum order preserving submatrix. In The 11th international conference on data mining (ICDM) (pp. 1098–1103). IEEE.
IBM. (2015). IBM ILOG CPLEX 12.5.1 user’s manual. IBM ILOG CPLEX Division, Incline Village, NV.
King, J. Y., Ferrara, R., Tabibiazar, R., Spin, J. M., Chen, M. M., Kuchinsky, A., et al. (2005). Pathway analysis of coronary atherosclerosis. Physiological Genomics, 23(1), 103–118.
Article Google Scholar
Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE Transactions on Computational Biology and Bioinformatics, 1(1), 24–45.
Article Google Scholar
Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., et al. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9(12), 3273–3297.
Article Google Scholar
Trapp, A. C., & Prokopyev, O. A. (2010). Solving the order-preserving submatrix problem via integer programming. INFORMS Journal on Computing, 22(3), 387–400.
Article Google Scholar
Yip, K. Y., Kao, B., Zhu, X., Chui, C. K., Lee, S. D., & Cheung, D. W. (2013). Mining order-preserving submatrices from data with repeated measurements. IEEE Transactions on Knowledge and Data Engineering, 25(7), 1587–1600.
Article Google Scholar
Zhang, M., Wang, & W., Liu, J. (2008). Mining approximate order preserving clusters in the presence of noise. In IEEE 24th international conference on data engineering (ICDE) (pp. 160–168). IEEE.

Download references

Author information

Authors and Affiliations

Foisie School of Business, Worcester Polytechnic Institute, 100 Institute Rd., Worcester, MA, 01609, USA
Andrew C. Trapp
Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Rd., Worcester, MA, 01609, USA
Chao Li
Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA, 01003, USA
Patrick Flaherty

Authors

Andrew C. Trapp
View author publications
You can also search for this author inPubMed Google Scholar
Chao Li
View author publications
You can also search for this author inPubMed Google Scholar
Patrick Flaherty
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Andrew C. Trapp.

Appendix: Additional algorithmic and computational details

Tables 4 and 5 below identify, for a given column level $\gamma $, the corresponding minimum number of rows $\rho _{\gamma }^{\alpha }$ necessary for a GOPSM to meet the statistical significance threshold for level $\alpha $ as motivated in Ben-Dor et al. (2003). These values are computed via (13), and are used in the construction of constraints (14) and (15). We detail these right-hand side values for both the Cooper promoter (Table 4) and the Spellman yeast (Table 5) data sets.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Trapp, A.C., Li, C. & Flaherty, P. Recovering all generalized order-preserving submatrices: new exact formulations and algorithms. Ann Oper Res 263, 385–404 (2018). https://doi.org/10.1007/s10479-016-2173-9

Download citation

Published: 25 March 2016
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10479-016-2173-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse clusterability: testing for cluster structure in high dimensions

Clustering: A Novel Meta-Analysis Approach for Differentially Expressed Gene Detection

A Comparative Analysis of Clustering and Biclustering Algorithms in Gene Analysis

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Additional algorithmic and computational details

Appendix: Additional algorithmic and computational details

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now