Deep Large-Scale Multi-task Learning Network for Gene Expression Inference

Dizaji, Kamran Ghasedi; Chen, Wei; Huang, Heng

doi:10.1007/978-3-030-45257-5_2

Kamran Ghasedi Dizaji⁹,
Wei Chen¹¹ &
Heng Huang^9,10

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 12074))

Included in the following conference series:

International Conference on Research in Computational Molecular Biology

2203 Accesses
2 Citations

Abstract

Gene expressions profiling empowers many biological studies in various fields by comprehensive characterization of cellular status under different experimental conditions. Despite the recent advances in high-throughput technologies, profiling the whole-genome set is still challenging and expensive. Based on the fact that there is high correlation among the expression patterns of different genes, the above issue can be addressed by a cost-effective approach that collects only a small subset of genes, called landmark genes, as the representative of the entire genome set and estimates the remaining ones, called target genes, via the computational model. Several shallow and deep regression models have been presented in the literature for inferring the expressions of target genes. However, the shallow models suffer from underfitting due to their insufficient capacity in capturing the complex nature of gene expression data, and the existing deep models are prone to overfitting due to the lack of using the interrelations of target genes in the learning framework. To address these challenges, we formulate the gene expression inference as a multi-task learning problem and propose a novel deep multi-task learning algorithm with automatically learning the biological interrelations among target genes and utilizing such information to enhance the prediction. In particular, we employ a multi-layer sub-network with low dimensional latent variables for learning the interrelations among target genes (i.e. distinct predictive tasks), and impose a seamless and easy to implement regularization on deep models. Unlike the conventional complicated multi-task learning methods, which can only deal with tens or hundreds of tasks, our proposed algorithm can effectively learn the interrelations from the large-scale ($\sim $10,000) tasks on the gene expression inference problem, and does not suffer from cost-prohibitive operations. Experimental results indicate the superiority of our method compared to the existing gene expression inference models and alternative multi-task learning algorithms on two large-scale datasets.

This work was partially supported by NSF IIS 1836938, DBI 1836866, IIS 1845666, IIS 1852606, IIS 1838627, IIS 1837956, and NIH AG049371.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Learning to Analyze RNA-Seq Gene Expression Data

Designing and Evaluating Deep Learning Models for Cancer Detection on Gene Expression Data

Deep Learning for Predicting Gene Regulatory Networks: A Step-by-Step Protocol in R

Notes

References

Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33(8), 831 (2015)
Article Google Scholar
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73, 243–272 (2008). https://doi.org/10.1007/s10994-007-5040-8
Article Google Scholar
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)
Article Google Scholar
Bakker, B., Heskes, T.: Task clustering and gating for Bayesian multitask learning. J. Mach. Learn. Res. 4, 83–99 (2003)
MATH Google Scholar
Brazma, A., et al.: ArrayExpress—a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31(1), 68–71 (2003)
Article Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28, 41–75 (1997). https://doi.org/10.1023/A:1007379606734
Article Google Scholar
Chen, Y., Li, Y., Narayan, R., Subramanian, A., Xie, X.: Gene expression inference with deep learning. Bioinformatics 32(12), 1832–1839 (2016)
Article Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1923 (1998)
Article Google Scholar
Edgar, R., Domrachev, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)
Article Google Scholar
Evgeniou, T., Micchelli, C.A., Pontil, M.: Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615–637 (2005)
MathSciNet MATH Google Scholar
Ghasedi Dizaji, K., Wang, X., Huang, H.: Semi-supervised generative adversarial network for gene expression inference. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1435–1444. ACM (2018)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Guo, X., Zhang, Y., Hu, W., Tan, H., Wang, X.: Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation. PLoS ONE 9(2), e87446 (2014)
Article Google Scholar
Heimberg, G., Bhatnagar, R., El-Samad, H., Thomson, M.: Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2(4), 239–250 (2016)
Article Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks (2017)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp. 448–456 (2015)
Google Scholar
Jacob, L., Vert, J.P., Bach, F.R.: Clustered multi-task learning: a convex formulation. In: Advances in Neural Information Processing Systems (NIPS), pp. 745–752 (2009)
Google Scholar
Kandoth, C., et al.: Mutational landscape and significance across 12 major cancer types. Nature 502(7471), 333 (2013)
Article Google Scholar
Kang, Z., Grauman, K., Sha, F.: Learning with whom to share in multi-task feature learning. In: International Conference on Machine Learning (ICML), pp. 521–528 (2011)
Google Scholar
Keenan, A.B., et al.: The library of integrated network-based cellular signatures NIH program: system-level cataloging of human cells response to perturbations. Cell Syst. 6(1), 13–24 (2017)
Article Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014)
Google Scholar
Kishan, K., Li, R., Cui, F., Yu, Q., Haake, A.R.: GNE: a deep learning framework for gene network inference by aggregating biological information. BMC Syst. Biol. 13(2), 38 (2019)
Google Scholar
Kumar, A., Daumé III, H.: Learning task grouping and overlap in multi-task learning. In: Proceedings of the 29th International Conference on International Conference on Machine Learning (ICML), pp. 1723–1730. Omnipress (2012)
Google Scholar
Lee, G., Yang, E., Hwang, S.: Asymmetric multi-task learning based on task relatedness and loss. In: International Conference on Machine Learning (ICML), pp. 230–238 (2016)
Google Scholar
Lee, H., Yang, E., Hwang, S.J.: Deep asymmetric multi-task feature learning. In: Proceedings of the 35th International Conference on International Conference on Machine Learning (ICML) (2018)
Google Scholar
Leung, M.K., Xiong, H.Y., Lee, L.J., Frey, B.J.: Deep learning of the tissue-regulated splicing code. Bioinformatics 30(12), i121–i129 (2014)
Article Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: International Conference on Machine Learning (ICML), vol. 30 (2013)
Google Scholar
Maurer, A., Pontil, M., Romera-Paredes, B.: Sparse coding for multitask and transfer learning. In: International Conference on Machine Learning (ICML), pp. 343–351 (2013)
Google Scholar
Medema, M.H., et al.: Minimum information about a biosynthetic gene cluster. Nat. Chem. Biol. 11(9), 625 (2015)
Article Google Scholar
Nelms, B.D., et al.: CellMapper: rapid and accurate inference of gene expression in difficult-to-isolate cell types. Genome Biol. 17(1), 201 (2016)
Article Google Scholar
Ntranos, V., Kamath, G.M., Zhang, J.M., Pachter, L., David, N.T.: Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts. Genome Biol. 17(1), 112 (2016)
Article Google Scholar
Peck, D., Crawford, E.D., Ross, K.N., Stegmaier, K., Golub, T.R., Lamb, J.: A method for high-throughput gene expression signature analysis, vol. 7, p. R61. BioMed Central (2006)
Google Scholar
Rees, M.G., et al.: Correlating chemical sensitivity and basal gene expression reveals mechanism of action. Nat. Chem. Biol. 12(2), 109 (2016)
Article Google Scholar
Richiardi, J., et al.: Correlated gene expression supports synchronous activity in brain networks. Science 348(6240), 1241–1244 (2015)
Article Google Scholar
Ruder, S.: An overview of multi-task learning in deep neural networks (2017)
Google Scholar
Ruder, S., Bingel, J., Augenstein, I., Søgaard, A.: Learning what to share between loosely related tasks (2017)
Google Scholar
Shah, S., Lubeck, E., Zhou, W., Cai, L.: In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92(2), 342–357 (2016)
Article Google Scholar
Singh, R., Lanchantin, J., Robins, G., Qi, Y.: DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 32(17), i639–i648 (2016)
Article Google Scholar
Speicher, N.K., Pfeifer, N.: Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics 31(12), i268–i275 (2015)
Article Google Scholar
Spencer, M., Eickholt, J., Cheng, J.: A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(1), 103–112 (2015)
Article Google Scholar
Stephens, P.J., et al.: The landscape of cancer genes and mutational processes in breast cancer. Nature 486(7403), 400 (2012)
Article Google Scholar
Thrun, S., O’Sullivan, J.: Discovering structure in multiple learning tasks: the TC algorithm. In: International Conference on Machine Learning (ICML), vol. 96, pp. 489–497 (1996)
Google Scholar
Van’t Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530 (2002)
Article Google Scholar
Wang, Z., He, Z., Shah, M., Zhang, T., Fan, D., Zhang, W.: Network-based multi-task learning models for biomarker selection and cancer outcome prediction. Bioinformatics (2019)
Google Scholar
Yan, W., et al.: Transcriptional analysis of immune-related gene expression in p53-deficient mice with increased susceptibility to influenza A virus infection. BMC Med. Genomics 8(1), 52 (2015). https://doi.org/10.1186/s12920-015-0127-8
Article Google Scholar
Yang, Y., Hospedales, T.: Deep multi-task representation learning: a tensor factorisation approach. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Yang, Y., Hospedales, T.M.: Trace norm regularised deep multi-task learning (2016)
Google Scholar
Yıldırım, M.A., Goh, K.I., Cusick, M.E., Barabási, A.L., Vidal, M.: Drug-target network. Nat. Biotechnol. 25(10), 1119–1126 (2007)
Article Google Scholar
Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12(10), 931 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, USA
Kamran Ghasedi Dizaji & Heng Huang
Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, USA
Heng Huang
Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, USA
Wei Chen

Authors

Kamran Ghasedi Dizaji
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Heng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heng Huang .

Editor information

Editors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Russell Schwartz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dizaji, K.G., Chen, W., Huang, H. (2020). Deep Large-Scale Multi-task Learning Network for Gene Expression Inference. In: Schwartz, R. (eds) Research in Computational Molecular Biology. RECOMB 2020. Lecture Notes in Computer Science(), vol 12074. Springer, Cham. https://doi.org/10.1007/978-3-030-45257-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-45257-5_2
Published: 21 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45256-8
Online ISBN: 978-3-030-45257-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics