Abstract
Traditional computational techniques are recently being improved with the use of prior biological knowledge from open-access repositories in the area of gene expression data analysis. In this work, we propose the use of prior knowledge as heuristic in an inference method of gene-gene associations from gene expression profiles. In this paper, we use Gene Ontology, which is an open-access ontology where genes are annotated using their biological functionality, as a source of prior knowledge together with a gene pairwise Gene-Ontology-based measure. The performance of our proposal has been compared to other benchmark methods for the inference of gene networks, outperforming in some cases and obtaining similar and competitive results in others, but with the advantage of providing simple and interpretable models, which is a desired feature for the Artificial Intelligence Health related models as stated by the European Union.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
RNA: RiboNucleic acid
References
GENIE3 vignette. https://doi.org/10.18129/B9.bioc.GENIE3. https://bioconductor.org/packages/release/bioc/vignettes/GENIE3/inst/doc/GENIE3.html
The gene ontology (go) database and informatics resource. Nucleic acids research 32(Database issue), D258–61 (2004). https://doi.org/10.1093/nar/gkh036. https://www.ncbi.nlm.nih.gov/pubmed/14681407
SCENIC: Single-cell regulatory network inference and clustering. Nature Methods (2017). https://doi.org/10.1038/nmeth.4463
Benabderrahmane S, Smail-Tabbone M, Poch O, Napoli A, Devignes MD (2010) Intelligo: a new vector-based semantic similarity measure including annotation origin. BMC bioinform 11(1):588
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency Annals of Statistics. https://doi.org/10.1214/aos/1013699998
Berriz GF, King OD, Bryant B, Sander C, Roth FP (2003) Characterizing gene sets with FuncAssociate. Bioinformatics 19(18):2502–2504. https://doi.org/10.1093/bioinformatics/btg363
Bulashevska S, Eils R (2005) Inferring genetic regulatory logic from expression data. Bioinformatics (Oxford England) 21(11):2706–13. https://doi.org/10.1093/bioinformatics/bti388
Caniza H, Romero AE, Heron S, Yang H, Devoto A, Frasca M, Mesiti M, Valentini G, Paccanaro A (2014) GOssto: A stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology Bioinformatics. https://doi.org/10.1093/bioinformatics/btu144
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2(1):65–73. https://doi.org/10.1016/S1097-2765(00)80114-8. http://linkinghub.elsevier.com/retrieve/pii/S1097276500801148
Couto FM, Silva MJ, Coutinho PM (2005) Semantic similarity over the gene ontology: Family correlation and selecting disjunctive ancestors. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM ’05. ACM, New York, pp 343–344, DOI https://doi.org/10.1145/1099554.1099658, (to appear in print)
Delgado FM, Gómez-Vela F (2018) Computational methods for gene regulatory networks reconstruction and analysis: A review Artificial intelligence in medicine. https://doi.org/10.1016/j.artmed.2018.10.006
Dwight SS, Harris MA, Dolinski K, Ball CA, Binkley G, Christie KR, Fisk DG, Issel-Tarver L, Schroeder M, Sherlock G, Sethuraman A, Weng S, Botstein D, Cherry JM (2002) Saccharomyces genome database (sgd) provides secondary gene annotation using the gene ontology (go). Nucl Acids Res 30(1):69–72. https://doi.org/10.1093/nar/30.1.69. http://dblp.uni-trier.de/db/journals/nar/nar30.html#DwightHDBBCFISSSWBC03
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 95 14863–14868. https://doi.org/10.1073/pnas.95.25.14863
EMBL-EBI: Introduction to embl-european bioinformatics institute. https://www.ebi.ac.uk/sites/ebi.ac.uk/files/content.ebi.ac.uk/documents/introduction_to_embl-ebi.pdf
Fitch A, Jones M (2009) Shortest path analysis using partial correlations for classifying gene functions from gene expression data. Bioinformatics 25:42–47. https://doi.org/10.1093/bioinformatics/btn574
Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science (New York) 303(5659):799–805. https://doi.org/10.1126/science.1094068. http://www.ncbi.nlm.nih.gov/pubmed/14764868
Gan M, Dou X, Jiang R (2013) From ontology to semantic similarity: calculation of ontology-based semantic similarity. Sci World J 2013
Gómez-Vela F, Lagares JA, Díaz-Díaz N (2015) Gene network coherence based on prior knowledge using direct and indirect relationships. Comput Biol Chem 56:142–151
Gutiérrez-Avilés D, Rubio-Escudero C, Martínez-Álvarez F, Riquelme JC (2014) Trigen: A genetic algorithm to mine triclusters in temporal gene expression data. Neurocomputing 132:42–53. https://doi.org/10.1016/j.neucom.2013.03.061
Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring regulatory networks from expression data using tree-based methods PLos ONE. https://doi.org/10.1371/journal.pone.0012776
Lee I, Date SV, Adai AT, Marcotte EM (2004) A probabilistic functional network of yeast genes. Science 1555–1558. https://doi.org/10.1126/science.1099511
Lee I, LZME (2007) An improved, bias-reduced probabilistic functional gene network of baker’s yeast, saccharomyces cerevisiae. PLoS One e988. https://doi.org/10.1371/journal.pone.0000988
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A (2006) Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC bioinformatics 7 Suppl 1, S7 https://doi.org/10.1186/1471-2105-7-S1-S7. http://www.ncbi.nlm.nih.gov/pubmed/16723010
Miron M (2018) Interpretability in AI and its relation to fairness, transparency, reliability and trust. Joint Research Center, EU Commission. https://ec.europa.eu/jrc/communities/en/node/1162/article/interpretability-ai-and-its-relation-fairness-transparency-reliability-and-trust
Markowetz F, Spang R Inferring cellular networks–a review. BMC bioinformatics 8 Suppl 6, S5 (2007). https://doi.org/10.1186/1471-2105-8-S6-S5. http://www.ncbi.nlm.nih.gov/pubmed/17903286
Martínez B, Isabel A, Nepomuceno C, José C, Riquelme M (2014) Discovering gene association networks by multi-objective evolutionary quantitative association rules. J Comput Syst Sci 80(1):118–136. https://doi.org/10.1016/j.jcss.2013.03.010
Mistry M, Pavlidis P (2008) Gene ontology term overlap as a measure of gene functional similarity. BMC Bioinform 9(1):327. https://doi.org/10.1186/1471-2105-9-327. http://www.biomedcentral.com/1471-2105/9/327
Nepomuceno JA, Lora AT, Aguilar-Ruiz JS (2011) Biclustering of gene expression data by correlation-based scatter search. BioData Mining 4:3. https://doi.org/10.1186/1756-0381-4-3
Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS (2015) Integrating biological knowledge based on functional annotations for biclustering of gene expression data. Comput Methods Prog Biomed 119(3):163–180. https://doi.org/10.1016/j.cmpb.2015.02.010
Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS (2018) Pairwise gene go-based measures for biclustering of high-dimensional expression data. BioData mining 11(1):4
Nepomuceno-Chamorro I, Aguilar-Ruiz J, Riquelme J (2010) Inferring gene regression networks with model trees. BMC Bioinformatics 11 (1):517. https://doi.org/10.1186/1471-2105-11-517. http://www.biomedcentral.com/1471-2105/11/517
Nepomuceno-Chamorro IA, Jesús S, Aguilar R (2013) Synergies of genes in alzheimer’s disease. In: International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2013, Granada, Spain, March 18-20, 2013. Proceedings, pp 51–53. http://iwbbio.ugr.es/papers/iwbbio_008.pdf
Nepomuceno-Chamorro IA, Márquez C, Jesús S, Aguilar-Ruiz AE (2015) Building transcriptional association networks in cytoscape with regnetc. IEEE/ACM Trans Comput Biology Bioinform 12 (4):823–824. https://doi.org/10.1109/TCBB.2014.2385702
Pesquita C, Faria D, Bastos H, Ferreira A, Falcao A, Couto F (2008) Metrics for go based protein semantic similarity: a systematic evaluation. BMC Bioinformatics 9(Suppl 5):S4. https://doi.org/10.1186/1471-2105-9-S5-S4. http://www.biomedcentral.com/1471-2105/9/S5/S4
Pesquita C, Faria D, Falcão AO, Lord P, Couto FM (2009) Semantic similarity in biomedical ontologies. PLoS Comput Biol 5(7):12. https://doi.org/10.1371/journal.pcbi.1000443. http://www.ncbi.nlm.nih.gov/pubmed/19649320
Ponzoni I, Azuaje F, Augusto J, Glass D Inferring adaptive regulation thresholds and association rules from gene expression data through combinatorial optimization learning. https://doi.org/10.1109/tcbb.2007.1049. http://www.ncbi.nlm.nih.gov/pubmed/17975273
Quinlan JR (1993) C4.5: Programs for machine learning
Rodius S, Nazarov P, Nepomuceno-Chamorro I, Jeanty C, Gonzalez-Rosa J, Ibberson M, da Costa RM, Xenarios I, Mercader N, Azuaje F (2014) Transcriptional response to cardiac injury in the zebrafish: systematic identification of genes with highly concordant activity across in vivo models. BMC Genomics 15(1):852. https://doi.org/10.1186/1471-2164-15-852. http://www.biomedcentral.com/1471-2164/15/852
Romero-Zaliz RC, Rubio-Escudero C, Cobb JP, Herrera F, Cordón O, Zwir I (2008) A multiobjective evolutionary conceptual clustering methodology for gene annotation within structural databases: a case of study on the gene ontology database. IEEE Trans Evol Comput 12(6):679–701. https://doi.org/10.1109/TEVC.2008.915995
Segal E, SMRA, Pe’er D, Botstein D, Koller D, Friedman N (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genet 34:166–176. https://doi.org/10.1038/ng1165
Soinov LA, Krestyaninova MA, Brazma A (2003) Towards reconstruction of gene networks from expression data by supervised learning Genome biology. https://doi.org/10.1186/gb-2003-4-1-r6
Spellman P, Sherlock G, Zhang M, et al. (1998) Comprehensive identification of cell cycle–regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9(12):3273–3297. https://doi.org/10.1091/mbc.9.12.3273
Steele E, Tucker A, ’T Hoen PAC, Schuemie MJ (2009) Literature-based priors for gene regulatory networks. Bioinformatics (Oxford, England) 25(14):1768–74. https://doi.org/10.1093/bioinformatics/btp277
Wang Y, Yang S, Zhao J, Du W, Liang Y, Wang C, Zhou F, Tian Y, Ma Q (2019) Using machine learning to measure relatedness between genes: a multi-features model. Scientific reports 9(1):1–15
Wang YR, Huang H (2014) Review on statistical methods for gene network reconstruction using expression data. J Theoret Biol 362:53–61. https://doi.org/10.1016/j.jtbi.2014.03.040
Witten IH, Frank E, Trigg L, Hall M, Holmes G, Cunningham SJ (1999) Weka: Practicalmachine learning tools and techniques with java implementations
Acknowledgements
We would like to thank Spanish Ministry of Science and Innovation for the financial support under project TIN2017-88209-C2-2-R.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nepomuceno-Chamorro, I.A., Nepomuceno, J.A., Galván-Rojas, J.L. et al. Using prior knowledge in the inference of gene association networks. Appl Intell 50, 3882–3893 (2020). https://doi.org/10.1007/s10489-020-01705-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01705-4