A new approach for the deep order preserving submatrix problem based on sequential pattern mining

Xue, Yun; Li, Tiechen; Liu, Zhiwen; Pang, Chaoyi; Li, Meihang; Liao, Zhengling; Hu, Xiaohui

doi:10.1007/s13042-015-0384-z

A new approach for the deep order preserving submatrix problem based on sequential pattern mining

Original Article
Published: 06 June 2015

Volume 9, pages 263–279, (2018)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Yun Xue¹,
Tiechen Li¹,
Zhiwen Liu¹,
Chaoyi Pang²,
Meihang Li¹,
Zhengling Liao¹ &
…
Xiaohui Hu¹

350 Accesses
20 Citations
Explore all metrics

Abstract

As an effective biclustering model, order-preserving submatrix (OPSM) has been widely applied to biological gene expression data mining, which can capture the general tendency of the gene expression under some experimental conditions. Recently, biologists hope to find deep OPSMs with long patterns and comparatively fewer support rows, which are important for the interpretation of gene regulatory networks. However, the traditional exact mining algorithms based on Apriori principle could not deal with the deep OPSM problem, which often take a large minimum support threshold for pattern pruning, and inevitably miss some significant deep OPSMs. In this paper, a new exact algorithm is proposed for mining deep OPSMs. Firstly all the common subsequences shared by every two rows are found out, then the row sets corresponding to the same common subsequences are formed. Finally all the deep OPSMs with support beyond the given threshold will be obtained. Experiments have been done in both real and synthetic data sets, and the results show that this new algorithm is capable of mining all the deep OPSMs over a small support. Under different thresholds of minimum support, this algorithm reveals better performance than the traditional sequential pattern mining algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Bioinformatics

Algorithms for frequent itemset mining: a literature review

Article Open access 24 March 2018

Finding Discriminative Subsequences Via a Coverage Measure and Mutual Information Selection Strategy for Multi-Class Time Series Classification

Article Open access 11 April 2024

References

Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE Trans Comput Biol Bioinform 1(1):24–45
Article Google Scholar
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386
Article Google Scholar
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) “Automatic subspace clustering of high dimensional data for data mining applications”. In Proceedings of the 24th ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, vol. 27, no. 2, pp. 94–105
Aggarwal CC, Wolf JL, Yu PS, Procopiuc CM, Park JS (1999) Fast algorithms for projected clustering. In Proceedings of the 25th ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, vol. 22, no. 2, pp. 61–72
Aggarwal CC, Yu PS (2000) Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the 26th ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, vol. 29, no. 2, pp. 70–81
Jagadish HV, Madar J, Ng RT (1999) Semantic compression and pattern extraction with fascicles. In: Proceedings of the 25th International Conference on Very Large Data Bases, San Francisco, CA, USA, pp. 186–198
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, San Diego, La Jolla, California, USA, pp. 93–103
Lazzeroni L, Owen A (2002) Plaid models for gene expression data. Statistica Sinica 12(1):61–86
MathSciNet MATH Google Scholar
Liu J, Wang W (2003) OP-Cluster: clustering by tendency in high dimensional space. In: Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 187–194
Cheung L, Kevin YY, Cheung DW, Kao B, Michael KN (2007) On mining micro-array data by order-preserving submatrix. Int J Bioinform Res Appl 3(1):42–64
Article Google Scholar
Gao BJ, Griffith OL, Ester M et al (2012) On the deep order-preserving submatrix problem: a best effort approach. J IEEE Trans Knowled Data Eng 24(2):309–325
Article Google Scholar
Trapp AC, Prokopyev OA (2010) Solving the order-preserving submatrix problem via integer programming. J INFORMS J Comp 22(3):387–400
Article MATH Google Scholar
Das C, Maji P (2013) Possibilistic biclustering algorithm for discovering value-coherent overlapping δ-biclusters. Int J Mach Learn Cybernet 1–13
Xu X (2013) Enhancing gene expression clustering analysis using tangent transformation. Int J Mach Learn Cybernet 4(1):31–40
Article Google Scholar
Liu N, Chen F, Lu M (2013) Spectral co-clustering documents and words using fuzzy K-harmonic means. Int J Mach Learn Cybernet 4(1):75–83
Article Google Scholar
Ben-Dor A, Chor B, Karp R, Yakhini Z (2002) Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the 6th Annual International Conference on Computational Biology, Washington, DC, USA, vol. 10, no. 3–4, pp. 49–57
Barrett T, Troup DB, Wilhite SE et al (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37:D885–D890
Article Google Scholar
Hubble J, Demeter J, Jin H et al (2009) Implementation of gene pattern within the Stanford microarray database. Nucleic Acids Res 37:D898–D901
Article Google Scholar
Albert R (2005) Scale-Free networks in cell biology. J Cell Sci 118(21):4947–4957
Article Google Scholar
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, Avignon, France, vol. 1057, pp. 3–17
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) Parallel algorithms for discovery of association rules. Data Min Knowl Disc 1(4):343–373
Article Google Scholar
Pei J, Han J, Mortazavi-asl B et al. (2001) PrefixSpan: mining sequential patterns efficiently by prefix projected pattern growth. In: Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, pp. 215–226
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential PAttern mining using a bitmap representation. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 429–435
Wang H, Lin Z (2007) A novel algorithm for counting all common subsequences. In: Proceedings of IEEE International Conference on Granular Computing, pp. 635–640
Bayer R (1972) Symmetric binary B-Trees: data structure and maintenance algorithms. Acta Informatica 1(4):290–306
Article MathSciNet MATH Google Scholar
Fredkin E (1960) Trie memory. Commun ACM 3(9):490–499
Article Google Scholar
Tavazoie S, Hughes JD, Campbel MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecture. Nat Genet 22(3):281–285
Article Google Scholar
Ideker T, Thorsson V, Ranish J et al (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292(5518):929–934
Article Google Scholar
Xiao R, Badger TM, Simmen FA (2005) Dietary exposure to soy or whey proteins alters colonic global gene expression profiles during rat colon tumorigenesis. Mol Cancer 4(1):1
Article Google Scholar
Martin D, Brun C, Remy E et al (2004) GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol 5(12):R101
Article Google Scholar
McLachlan GJ, Do K-A, Ambroise C (2005) Analyzing Microarray Gene Expression Data. John Wiley and Sons, Hoboken
MATH Google Scholar
Eckart Z (2014) ETH-SOP-BicAT: Biclustering Analysis Toolbox. http://people.ee.ethz.ch/~sop/bicat/. Accessed on 5 Oct 2014
Prelic A, Bleuler S, Zimmermann P et al (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129
Article Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Int Inform Syst 17(2–3):107–145
Article MATH Google Scholar
Ihmels J, Bergmann S, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13):1993–2003
Article Google Scholar
Ihmels J, Friedlander G, Bergmann S et al (2002) Revealing modular organization in the yeast transcriptional network. Nat Genet 31(4):370–377
Google Scholar
Murali TM, Kasif S (2003) Extracting conserved gene expression Motifs from gene expression data. In Pacific Symposium on Biocomputing, Kauai, Hawaii, pp. 77–88
Voorhees EM (1986) Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf Process Manage 22(6):465–476
Article Google Scholar
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28(1):100–108
Article MATH Google Scholar

Download references

Acknowledgments

The authors thank gratefully for the colleagues who have been concerned with the work and have provided much more powerfully technical supports. The work is supported by Guangdong Science and Technology Department under Grant No. 2009B090300336, No. 2012B091100349, No. 2012B091100490; Guangdong Economy & Trade Committee under Grant No. GDEID2010IS034; Guangzhou Yuexiu District science and Technology Bureau under Grant No 2012-GX-004; National Natural Science Foundation of China (Grant No: 3100958).

Conflict of interests

The authors declare that there is no conflict of interest regarding the publication of this paper.

Author information

Authors and Affiliations

School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou, 510006, China
Yun Xue, Tiechen Li, Zhiwen Liu, Meihang Li, Zhengling Liao & Xiaohui Hu
The Australian E-Health Research Centre, CSIRO, Adelaide, Australia
Chaoyi Pang

Authors

Yun Xue
View author publications
You can also search for this author in PubMed Google Scholar
Tiechen Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chaoyi Pang
View author publications
You can also search for this author in PubMed Google Scholar
Meihang Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhengling Liao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun Xue.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xue, Y., Li, T., Liu, Z. et al. A new approach for the deep order preserving submatrix problem based on sequential pattern mining. Int. J. Mach. Learn. & Cyber. 9, 263–279 (2018). https://doi.org/10.1007/s13042-015-0384-z

Download citation

Received: 27 January 2015
Accepted: 28 May 2015
Published: 06 June 2015
Issue Date: February 2018
DOI: https://doi.org/10.1007/s13042-015-0384-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new approach for the deep order preserving submatrix problem based on sequential pattern mining

Abstract

Access this article

Similar content being viewed by others

Introduction to Bioinformatics

Algorithms for frequent itemset mining: a literature review

Finding Discriminative Subsequences Via a Coverage Measure and Mutual Information Selection Strategy for Multi-Class Time Series Classification

References

Acknowledgments

Conflict of interests

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new approach for the deep order preserving submatrix problem based on sequential pattern mining

Abstract

Access this article

Similar content being viewed by others

Introduction to Bioinformatics

Algorithms for frequent itemset mining: a literature review

Finding Discriminative Subsequences Via a Coverage Measure and Mutual Information Selection Strategy for Multi-Class Time Series Classification

References

Acknowledgments

Conflict of interests

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation