Skip to main content

Deep Large-Scale Multi-task Learning Network for Gene Expression Inference

  • Conference paper
  • First Online:
Book cover Research in Computational Molecular Biology (RECOMB 2020)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 12074))

Abstract

Gene expressions profiling empowers many biological studies in various fields by comprehensive characterization of cellular status under different experimental conditions. Despite the recent advances in high-throughput technologies, profiling the whole-genome set is still challenging and expensive. Based on the fact that there is high correlation among the expression patterns of different genes, the above issue can be addressed by a cost-effective approach that collects only a small subset of genes, called landmark genes, as the representative of the entire genome set and estimates the remaining ones, called target genes, via the computational model. Several shallow and deep regression models have been presented in the literature for inferring the expressions of target genes. However, the shallow models suffer from underfitting due to their insufficient capacity in capturing the complex nature of gene expression data, and the existing deep models are prone to overfitting due to the lack of using the interrelations of target genes in the learning framework. To address these challenges, we formulate the gene expression inference as a multi-task learning problem and propose a novel deep multi-task learning algorithm with automatically learning the biological interrelations among target genes and utilizing such information to enhance the prediction. In particular, we employ a multi-layer sub-network with low dimensional latent variables for learning the interrelations among target genes (i.e. distinct predictive tasks), and impose a seamless and easy to implement regularization on deep models. Unlike the conventional complicated multi-task learning methods, which can only deal with tens or hundreds of tasks, our proposed algorithm can effectively learn the interrelations from the large-scale (\(\sim \)10,000) tasks on the gene expression inference problem, and does not suffer from cost-prohibitive operations. Experimental results indicate the superiority of our method compared to the existing gene expression inference models and alternative multi-task learning algorithms on two large-scale datasets.

This work was partially supported by NSF IIS 1836938, DBI 1836866, IIS 1845666, IIS 1852606, IIS 1838627, IIS 1837956, and NIH AG049371.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.lincsproject.org/.

  2. 2.

    https://cbcl.ics.uci.edu/public_data/D-GEX/.

References

  1. Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33(8), 831 (2015)

    Article  Google Scholar 

  2. Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73, 243–272 (2008). https://doi.org/10.1007/s10994-007-5040-8

    Article  Google Scholar 

  3. Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)

    Article  Google Scholar 

  4. Bakker, B., Heskes, T.: Task clustering and gating for Bayesian multitask learning. J. Mach. Learn. Res. 4, 83–99 (2003)

    MATH  Google Scholar 

  5. Brazma, A., et al.: ArrayExpress—a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31(1), 68–71 (2003)

    Article  Google Scholar 

  6. Caruana, R.: Multitask learning. Mach. Learn. 28, 41–75 (1997). https://doi.org/10.1023/A:1007379606734

    Article  Google Scholar 

  7. Chen, Y., Li, Y., Narayan, R., Subramanian, A., Xie, X.: Gene expression inference with deep learning. Bioinformatics 32(12), 1832–1839 (2016)

    Article  Google Scholar 

  8. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1923 (1998)

    Article  Google Scholar 

  9. Edgar, R., Domrachev, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)

    Article  Google Scholar 

  10. Evgeniou, T., Micchelli, C.A., Pontil, M.: Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615–637 (2005)

    MathSciNet  MATH  Google Scholar 

  11. Ghasedi Dizaji, K., Wang, X., Huang, H.: Semi-supervised generative adversarial network for gene expression inference. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1435–1444. ACM (2018)

    Google Scholar 

  12. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  13. Guo, X., Zhang, Y., Hu, W., Tan, H., Wang, X.: Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation. PLoS ONE 9(2), e87446 (2014)

    Article  Google Scholar 

  14. Heimberg, G., Bhatnagar, R., El-Samad, H., Thomson, M.: Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2(4), 239–250 (2016)

    Article  Google Scholar 

  15. Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks (2017)

    Google Scholar 

  16. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp. 448–456 (2015)

    Google Scholar 

  17. Jacob, L., Vert, J.P., Bach, F.R.: Clustered multi-task learning: a convex formulation. In: Advances in Neural Information Processing Systems (NIPS), pp. 745–752 (2009)

    Google Scholar 

  18. Kandoth, C., et al.: Mutational landscape and significance across 12 major cancer types. Nature 502(7471), 333 (2013)

    Article  Google Scholar 

  19. Kang, Z., Grauman, K., Sha, F.: Learning with whom to share in multi-task feature learning. In: International Conference on Machine Learning (ICML), pp. 521–528 (2011)

    Google Scholar 

  20. Keenan, A.B., et al.: The library of integrated network-based cellular signatures NIH program: system-level cataloging of human cells response to perturbations. Cell Syst. 6(1), 13–24 (2017)

    Article  Google Scholar 

  21. Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014)

    Google Scholar 

  22. Kishan, K., Li, R., Cui, F., Yu, Q., Haake, A.R.: GNE: a deep learning framework for gene network inference by aggregating biological information. BMC Syst. Biol. 13(2), 38 (2019)

    Google Scholar 

  23. Kumar, A., Daumé III, H.: Learning task grouping and overlap in multi-task learning. In: Proceedings of the 29th International Conference on International Conference on Machine Learning (ICML), pp. 1723–1730. Omnipress (2012)

    Google Scholar 

  24. Lee, G., Yang, E., Hwang, S.: Asymmetric multi-task learning based on task relatedness and loss. In: International Conference on Machine Learning (ICML), pp. 230–238 (2016)

    Google Scholar 

  25. Lee, H., Yang, E., Hwang, S.J.: Deep asymmetric multi-task feature learning. In: Proceedings of the 35th International Conference on International Conference on Machine Learning (ICML) (2018)

    Google Scholar 

  26. Leung, M.K., Xiong, H.Y., Lee, L.J., Frey, B.J.: Deep learning of the tissue-regulated splicing code. Bioinformatics 30(12), i121–i129 (2014)

    Article  Google Scholar 

  27. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: International Conference on Machine Learning (ICML), vol. 30 (2013)

    Google Scholar 

  28. Maurer, A., Pontil, M., Romera-Paredes, B.: Sparse coding for multitask and transfer learning. In: International Conference on Machine Learning (ICML), pp. 343–351 (2013)

    Google Scholar 

  29. Medema, M.H., et al.: Minimum information about a biosynthetic gene cluster. Nat. Chem. Biol. 11(9), 625 (2015)

    Article  Google Scholar 

  30. Nelms, B.D., et al.: CellMapper: rapid and accurate inference of gene expression in difficult-to-isolate cell types. Genome Biol. 17(1), 201 (2016)

    Article  Google Scholar 

  31. Ntranos, V., Kamath, G.M., Zhang, J.M., Pachter, L., David, N.T.: Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts. Genome Biol. 17(1), 112 (2016)

    Article  Google Scholar 

  32. Peck, D., Crawford, E.D., Ross, K.N., Stegmaier, K., Golub, T.R., Lamb, J.: A method for high-throughput gene expression signature analysis, vol. 7, p. R61. BioMed Central (2006)

    Google Scholar 

  33. Rees, M.G., et al.: Correlating chemical sensitivity and basal gene expression reveals mechanism of action. Nat. Chem. Biol. 12(2), 109 (2016)

    Article  Google Scholar 

  34. Richiardi, J., et al.: Correlated gene expression supports synchronous activity in brain networks. Science 348(6240), 1241–1244 (2015)

    Article  Google Scholar 

  35. Ruder, S.: An overview of multi-task learning in deep neural networks (2017)

    Google Scholar 

  36. Ruder, S., Bingel, J., Augenstein, I., Søgaard, A.: Learning what to share between loosely related tasks (2017)

    Google Scholar 

  37. Shah, S., Lubeck, E., Zhou, W., Cai, L.: In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92(2), 342–357 (2016)

    Article  Google Scholar 

  38. Singh, R., Lanchantin, J., Robins, G., Qi, Y.: DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 32(17), i639–i648 (2016)

    Article  Google Scholar 

  39. Speicher, N.K., Pfeifer, N.: Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics 31(12), i268–i275 (2015)

    Article  Google Scholar 

  40. Spencer, M., Eickholt, J., Cheng, J.: A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(1), 103–112 (2015)

    Article  Google Scholar 

  41. Stephens, P.J., et al.: The landscape of cancer genes and mutational processes in breast cancer. Nature 486(7403), 400 (2012)

    Article  Google Scholar 

  42. Thrun, S., O’Sullivan, J.: Discovering structure in multiple learning tasks: the TC algorithm. In: International Conference on Machine Learning (ICML), vol. 96, pp. 489–497 (1996)

    Google Scholar 

  43. Van’t Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530 (2002)

    Article  Google Scholar 

  44. Wang, Z., He, Z., Shah, M., Zhang, T., Fan, D., Zhang, W.: Network-based multi-task learning models for biomarker selection and cancer outcome prediction. Bioinformatics (2019)

    Google Scholar 

  45. Yan, W., et al.: Transcriptional analysis of immune-related gene expression in p53-deficient mice with increased susceptibility to influenza A virus infection. BMC Med. Genomics 8(1), 52 (2015). https://doi.org/10.1186/s12920-015-0127-8

    Article  Google Scholar 

  46. Yang, Y., Hospedales, T.: Deep multi-task representation learning: a tensor factorisation approach. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  47. Yang, Y., Hospedales, T.M.: Trace norm regularised deep multi-task learning (2016)

    Google Scholar 

  48. Yıldırım, M.A., Goh, K.I., Cusick, M.E., Barabási, A.L., Vidal, M.: Drug-target network. Nat. Biotechnol. 25(10), 1119–1126 (2007)

    Article  Google Scholar 

  49. Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12(10), 931 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dizaji, K.G., Chen, W., Huang, H. (2020). Deep Large-Scale Multi-task Learning Network for Gene Expression Inference. In: Schwartz, R. (eds) Research in Computational Molecular Biology. RECOMB 2020. Lecture Notes in Computer Science(), vol 12074. Springer, Cham. https://doi.org/10.1007/978-3-030-45257-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45257-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45256-8

  • Online ISBN: 978-3-030-45257-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics