Constrained query of order-preserving submatrix in gene expression data

Jiang, Tao; Li, Zhanhuai; Shang, Xuequn; Chen, Bolin; Li, Weibang; Yin, Zhilei

doi:10.1007/s11704-016-5487-5

Constrained query of order-preserving submatrix in gene expression data

Research Article
Published: 25 May 2016

Volume 10, pages 1052–1066, (2016)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Tao Jiang¹,
Zhanhuai Li¹,
Xuequn Shang¹,
Bolin Chen¹,
Weibang Li¹ &
…
Zhilei Yin¹

52 Accesses
2 Citations
Explore all metrics

Abstract

Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst’s expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly focus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OMEGA: An Order-Preserving SubMatrix Mining, Indexing and Search Tool

A Comparative Analysis of Clustering and Biclustering Algorithms in Gene Analysis

Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

Article 25 March 2016

References

Pensa R G, Boulicaut J F. Constrained coclustering of gene expression data. In: Proceedings of the 8th SIAM International Conference on Data Mining. 2008, 25–36
Google Scholar
Alqadah F, Bader J S, Anand R, Reddy C K. Query-based biclustering using formal concept analysis. In: Proceedings of the 12th SIAM International Conference on Data Mining. 2012, 648–659
Google Scholar
Jiang T, Li Z H, Chen Q, Li K W, Wang Z, Pan W. Towards orderpreserving submatrix search and indexing. In: Proceedings of the 20th International Conference on Database Systems for Advanced Applications. 2015, 309–326
Google Scholar
Gao B J, Griffith O L, Ester M, Jones S J M. Discovering significant OPSMsubspace clusters in massive gene expression data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 922–928
Chapter Google Scholar
Gao B J, Griffith O L, Ester M, Xiong H, Zhao Q, Jones S J M. On the deep order-preserving submatrix problem: a best effort approach. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(2): 309–325
Article Google Scholar
Sim K, Gopalkrishnan V, Zimek A, Cong G. A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery, 2013, 26(2): 332–397
Article MathSciNet MATH Google Scholar
Madeira S C, Oliveira A L. Biclustering algorithms for biological data analysis: a survey. IEEE/ACMTransactions on Computational Biology and Bioinformatics, 2004, 1(1): 24–45
Article Google Scholar
Jiang D X, Tang C, Zhang A D. Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1370–1386
Article Google Scholar
Kriegel H P, Kröger P, Zimek A. Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data, 2009, 3(1): 337–348
Article Google Scholar
Yue F, Sun L, Wang K Q, Wang Y J, Zuo W M. State-of-the-art of cluster analysis of gene expression data. Acta Automatica Sinica, 2008, 34(2): 113–120
Article MathSciNet MATH Google Scholar
Zou Q, Li X B, JiangWR, Lin Z Y, Li G L, Chen K. Survey ofMapReduce frame operation in bioinformatics. Briefings in Bioinformatics, 2014, 15(4): 637–647
Article Google Scholar
Zou Q, Guo M Z, Liu Y, Wang J. A classification method for classimbalanced data and its application on bioinformatics. Journal of Computer Research and Development, 2010, 47(8): 1407–1414
Google Scholar
Dhollander T, Sheng Q, Lemmens K, Moor B D, Marchal K, Moreau Y. Query-driven module discovery in microarray data. Bioinformatics, 2007, 23(19): 2573–2580
Article Google Scholar
Zhao H, Cloots L, Bulcke T V D, Wu Y, Smet R D, Storms V, Meysman P, Engelen K, Marchal K. Query-based biclustering of gene expression data using probabilistic relational models. BMC Bioinformatics, 2011, 12(S-1): S37
Google Scholar
Pensa R G, Robardet C, Boulicaut J F. Towards constrained coclustering in ordered 0/1 data sets. In: Proceedings of the 16th International Symposium on Foundations of Intelligent Systems. 2006, 425–434
Chapter Google Scholar
Pensa R G, Robardet C, Boulicaut J F. Constraint-driven co-clustering of 0/1 data. Constrained Clustering: Advances in Algorithms, Theory and Applications, 2008, 145–170
Google Scholar
Cheng Y, Church GM. Biclustering of expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. 2000, 93–103
Google Scholar
Yang J, Wang W, Wang H, Yu P S. Delta-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th International Conference on Data Engineering. 2002, 517–528
Chapter Google Scholar
Wang H, Wang W, Yang J, Yu P S. Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of Data. 2002, 394–405
Chapter Google Scholar
Wang H, Pei J, Yu P S. Pattern-based similarity search for microarray data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2005, 814–819
Google Scholar
Ben-Dor A, Chor B, Karp R M, Yakhini Z. Discovering local structure in gene expression data: the order-preserving submatrix problem. Journal of Computational Biology, 2003, 10(3-4): 373–384
Article Google Scholar
Liu J, Wang W. OP-cluster: clustering by tendency in high dimensional space. In: Proceedings of the 3rd IEEE International Conference on Data Mining. 2003, 187–194
Chapter Google Scholar
Zhao Y, Yu J X, Wang G, Chen L, Wang B, Yu G. Maximal subspace coregulated gene clustering. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 83–98
Article Google Scholar
Kriegel H P, Kröger P, Renz M, Wurst S H R. A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 250–257
Google Scholar
An P. Research on biclustering methods for gene expression data analysis. Disseration for the Master Degree. Suzhou: Soochow University, 2013
Google Scholar
Jiang T, Li Z H, Chen Q, Wang Z, Pan W, Wang Z. Parallel partitioning and mining gene expression data with butterfly network. In: Pro ceedings of the 24th International Conference on Database and Expert Systems Applications. 2013, 129–144
Chapter Google Scholar
Jiang T, Li Z H, Chen Q, Wang Z, Li K, Pan W. OMEGA: an orderpreserving submatrix mining, indexing and search tool. In: Proceedings of the European Conference onMachine Learning and Knowledge Discovery in Databases. 2015, 303–307
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Northwestern Polytechnical University, Xi’an, 710072, China
Tao Jiang, Zhanhuai Li, Xuequn Shang, Bolin Chen, Weibang Li & Zhilei Yin

Authors

Tao Jiang
View author publications
Search author on:PubMed Google Scholar
Zhanhuai Li
View author publications
Search author on:PubMed Google Scholar
Xuequn Shang
View author publications
Search author on:PubMed Google Scholar
Bolin Chen
View author publications
Search author on:PubMed Google Scholar
Weibang Li
View author publications
Search author on:PubMed Google Scholar
Zhilei Yin
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Zhanhuai Li.

Additional information

Tao Jiang is a PhD candidate at School of Computer Science and Technology, Northwestern Polytechnical University, China. He is a student member of China Computer Federation, Association for Computing Machinery, and IEEE Computer Society. His current research interests include biological data mining, big data analysis, and data management.

Zhanhuai Li is a professor at School of Computer Science and Technology, Northwestern Polytechnical University (NWPU), China. He is vice chairman of Database Technical Committee, China Computer Federation. He received his MS and PhD degrees from NWPU. His research interests include data management and data mining.

Xuequn Shang is a professor and vice dean at School of Computer Science and Technology, Northwestern Polytechnical University, China. She is a senior member of China Computer Federation. She received her PhD degree from University of Magdeburg, Germany in 2005. Her current research interests include data mining, bioinformatics, and data management.

Bolin Chen is an associate professor at School of Computer Science and Technology, Northwestern Polytechnical University, China. He received his PhD degree from University of Saskatchewan, Saskatoon, Canada. His current research interests include bioinformatics, computational and systems biology, data mining, and data management.

Weibang Li is a PhD candidate of the School of Computer Science and Technology, Northwestern Polytechnical University, China. He received his BS degree from Nanjing University of Aeronautics and Astronautics, China in 2003. His current research interests include data quality management, big data analysis, and cloud computing.

Zhilei Yin is a PhD candidate of the School of Computer Science and Technology, Northwestern Polytechnical University, China. He is a lecturer of Zhengzhou University of Light Industry, China. His current research interests include bioinformatics, data mining, database management, and machine learning.

Electronic supplementary material

Supplementary material, approximately 194 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, T., Li, Z., Shang, X. et al. Constrained query of order-preserving submatrix in gene expression data. Front. Comput. Sci. 10, 1052–1066 (2016). https://doi.org/10.1007/s11704-016-5487-5

Download citation

Received: 24 November 2015
Accepted: 11 March 2016
Published: 25 May 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11704-016-5487-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained query of order-preserving submatrix in gene expression data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

OMEGA: An Order-Preserving SubMatrix Mining, Indexing and Search Tool

A Comparative Analysis of Clustering and Biclustering Algorithms in Gene Analysis

Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 194 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now