Skip to main content
Log in

Finding checkerboard patterns via fractional 0–1 programming

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

Biclustering is a data mining technique used to simultaneously partition the set of samples and the set of their attributes (features) into subsets (clusters). Samples and features clustered together are supposed to have a high relevance to each other. In this paper we provide a new mathematical programming formulation for unsupervised biclustering. The proposed model involves the solution of a fractional 0–1 programming problem. A linear-mixed 0–1 reformulation as well as two heuristic-based approaches are developed. Encouraging computational results on clustering real DNA microarray data sets are presented. In addition, we also discuss theoretical computational complexity issues related to biclustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baz M, Hunsaker B, Brooks JP, Gosavi A (2007) Automated tuning of optimization parameters. Technical Report, University of Pittsburgh

  • Ben-Dor A, Chor B, Karp R, Yakhini Z (2000b) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10:373–384

    Article  Google Scholar 

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Busygin S, Jacobsen G, Krämer E (2002) Double conjugated clustering applied to leukemia microarray data. In: SDM 2002 workshop on clustering high dimensional data and its applications

  • Busygin S, Prokopyev OA, Pardalos PM (2005) Feature selection for consistent biclustering via fractional 0–1 programming. J Combin Optim 10(1):7–21

    Article  MATH  MathSciNet  Google Scholar 

  • Busygin S, Prokopyev OA, Pardalos PM (2008) Biclustering in data mining. Comput Oper Res 35:2964–2987

    Article  MATH  MathSciNet  Google Scholar 

  • Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the 8th international conference on intelligent systems for molecular biology, pp 93–103

  • Cho H, Dhillon IS, Guan Y, Sra S (2004) Minimum sum-squared residue co-clustering of gene expression data. In: Proceedings of the 4th SIAM international conference on data mining (SDM), April 22–24, Lake Buena Vista, Florida

  • Coclustering Software. http://www.cs.utexas.edu/users/dml/Software/cocluster.html

  • Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), August 26–29, San Francisco, CA

  • Dhillon IS, Guan Y (2003) Information theoretic clustering of sparse co-occurrence data. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), November 19–22, Melbourne, FL

  • Dhillon IS, Mallela S, Modha D (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, August 24–27, Washington, DC

  • Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, San Francisco

    MATH  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  Google Scholar 

  • Hansen P, Poggi de Aragão M, Ribeiro CC (1991) Hyperbolic 0–1 programming and query optimization in information retrieval. Math Program 52:256–263

    Google Scholar 

  • Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129

    Article  Google Scholar 

  • Hsiao L-L, Dangond F, Yoshida T, Hong R, Jensen RV, Misra J, Dillon W, Lee KF, Clark KE, Haverty P, Weng Z, Mutter G, Frosch MP, MacDonald ME, Milford EL, Crum CP, Bueno R, Pratt RE, Mahadevappa M, Warrington JA, Stephanopoulos G, Stephanopoulos G, Gullans SR (2001) A compendium of gene expression in normal human tissues. Physiol Genomics 7:97–104

    Google Scholar 

  • HuGE Index.org Website. http://www.hugeindex.org

  • Kluger Y, Basri R, Chang JT, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13:703–716

    Article  Google Scholar 

  • Madeira S, Oliviera AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1:24–45

    Article  Google Scholar 

  • Martin R (1999) Large scale linear and integer optimization: a unified approach. Kluwer Academic, Boston

    MATH  Google Scholar 

  • National Human Genome Research Institute (2008) DNA microarray fact sheet. http://www.genome.gov/page.cfm?pageID=10000533#2, last accessed January

  • Prokopyev OA, Huang H-X, Pardalos PM (2005a) On complexity of unconstrained hyperbolic 0–1 programming problems. Oper Res Lett 33:312–318

    Article  MATH  MathSciNet  Google Scholar 

  • Prokopyev OA, Meneses C, Oliveira CAS, Pardalos PM (2005b) On multiple-ratio hyperbolic 0–1 programming problems. Pac J Optim 1(2):327–345

    MATH  MathSciNet  Google Scholar 

  • Saipe S (1975) Solving a (0,1) hyperbolic program by branch and bound. Naval Res Logist Q 22:497–515

    Article  MATH  Google Scholar 

  • Tawarmalani M, Ahmed S, Sahinidis N (2002) Global optimization of 0–1 hyperbolic programs. J Glob Optim 24:385–416

    Article  MATH  MathSciNet  Google Scholar 

  • Wu T-H (1997) A note on a global approach for general 0–1 fractional programming. Eur J Oper Res 101:220–223

    Article  MATH  Google Scholar 

  • Yang J, Wang W, Wang H, Yu P (2002) δ-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th IEEE international conference on data engineering, pp 517–528

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleg A. Prokopyev.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Trapp, A., Prokopyev, O.A. & Busygin, S. Finding checkerboard patterns via fractional 0–1 programming. J Comb Optim 20, 1–26 (2010). https://doi.org/10.1007/s10878-008-9186-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-008-9186-5

Keywords

Navigation