Abstract
The analysis of gene expression data plays a crucial role in disease diagnosis. However, such analysis is rather complex due to the high-dimensional gene space. This works shows that semantically related genes can be grouped using biological knowledge in the Gene Ontology (GO) database. The GO defines knowledge phrases called GO terms describing biological functions of genes. These knowledge phrases are typically used when discussing individual components of the ontologies. By exploiting the term frequency-inverse document frequency (tf-idf) as a distance measure between two genes, this distance is used in structure analysis on four gene sets causally associated with pain and the chronification of pain, hearing loss, cancer, and drug addiction. The analysis employs the Databionic swarm (DBS). The results show distinctive homogenous groups in each set of genes outperforming related work. This work outlines the successful application of information retrieval dissimilarity measure and structure analysis on gene sets using the knowledge base of the Gene Ontology. Furthermore, it enables in the future a separate analysis of gene subspaces.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Taub, F.E., DeLEO, J.M., Thompson, E.B.: Sequential comparative hybridizations analyzed by computerized image processing can identify and quantitate regulated RNAs. Dna. 2(4), 309–327 (1983)
Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends Genet. 24(3), 133–141 (2008)
Lötsch, J., Doehring, A., Mogil, J.S., Arndt, T., Geisslinger, G., Ultsch, A.: Functional genomics of pain in analgesic drug development and therapy. Pharmacol. Ther. 139(1), 60–70 (2013)
Ultsch, A., Palasch, C., Herda, S., Lötsch, J.: What do all those MIRna do? In: Kestler, H., Schmid, M., Lausser, J., Kraus, J. (eds.) Statistical Computing, p. 16. Ulmer Informatik-Berichte, Günzburg (2014)
Tarca, A.L., Bhatti, G., Romero, R.: A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS ONE 8(11), e79217 (2013)
Barabasi, A.-L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5(2), 101–113 (2004)
Alm, E., Arkin, A.P.: Biological networks. Curr. Opin. Struct. Biol. 13(2), 193–202 (2003)
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102(43), 15545–15550 (2005)
Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intel. Res. 11, 95–130 (1999)
Thrun, M.C., Ultsch, A.: Using projection-based clustering to find distance- and density-based clusters in high-dimensional data. J. Classif. 38(2), 280–312 (2020). https://doi.org/10.1007/s00357-020-09373-2
Thrun, M.C.: Projection based clustering through self-organization and swarm intelligence. Springer, Heidelberg (2018)
Ultsch, A., Lötsch, J.: Machine-learned cluster identification in high-dimensional data. J. Biomed. Inform. 66(C), 95–104 (2017)
Thrun, M.C.: Distance-based clustering challenges for unbiased benchmarking studies. Nat. Sci. Rep. 11(1), 18988 (2021). https://doi.org/10.1038/s41598-021-98126-1
Acharya, S., Saha, S., Nikhil, N.: Unsupervised gene selection using biological knowledge : application in sample clustering. BMC Bioinformatics 18, 513 (2017). https://doi.org/10.1186/s12859-017-1933-0
Thrun, M.C., Ultsch, A.: Swarm intelligence for self-organized clustering. Artif. Intell. 290, 103237 (2021). https://doi.org/10.1016/j.artint.2020.103237
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25(1), 25–29 (2000). https://doi.org/10.1038/75556
Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2011)
Thrun, M.C., Ultsch, A.: Uncovering high-dimensional structures of projections from dimensionality reduction methods. MethodsX. 7, 101093 (2020). https://doi.org/10.1016/j.mex.2020.101093
Ultsch, A., Kringel, D., Kalso, E., Mogil, J.S., Lötsch, J.: A data science approach to candidate gene selection of pain regarded as a process of learning and neural plasticity. Pain 157(12), 2747–2757 (2016)
Gene Testing Registry: OtoGenome Test for Hearing Loss. (2018). https://www.ncbi.nlm.nih.gov/gtr/tests/509148/ Accessed 2017
Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., et al.: A census of human cancer genes. Nat. Rev. Cancer 4(3), 177 (2004). https://doi.org/10.1038/nrc1299
Li, C.-Y., Mao, X., Wei, L.: Genes and (common) pathways underlying drug addiction. PLoS Comput. Biol. 4(1), e2 (2008)
Camon, E., Magrane, M., Barrell, D., Binns, D., Fleischmann, W., Kersey, P., et al.: The gene ontology annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res. 13(4), 662–672 (2003)
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., et al.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res. 32(suppl 1), D262–D266 (2004)
Sparck, J.K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)
Thrun, M.C.: The exploitation of distance distributions for clustering. Int. J. Comput. Intell. Appl. 20(3), 2150016 (2021). https://doi.org/10.1142/S1469026821500164
Murphy, K.P.: Machine learning: a probabilistic perspective. MIT press, Adaptive Computation and Machine Learning series (2012)
Nash, J.F.: Equilibrium points in n-person games. Proc Nat Acad Sci USA 36(1), 48–49 (1950)
Thrun, M.C., Lerch, F., Lötsch, J., Ultsch, A.: Visualization and 3D Printing of Multivariate Data of Biomarkers. In: Skala V, editor. International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG). Plzen, pp. 7–16 (2016)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. Trans. Pattern Anal. Mach. Intell. IEEE 1(2), 224–227 (1979). https://doi.org/10.1109/TPAMI.1979.4766909
Thrun, M.C., Stier, Q.: Fundamental clustering algorithms suite. SoftwareX. 13(1), 100642 (2021). https://doi.org/10.1016/j.softx.2020.100642
Wei, C.-H., Kao, H.-Y., Lu, Z.: PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 41(W1), W518–W522 (2013)
Lipscomb, C.E.: Medical subject headings (MeSH). Bull. Med. Libr. Assoc. 88(3), 265 (2000)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th international conference on World Wide Web, pp. 91–100 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Thrun, M.C. (2022). Knowledge-Based Identification of Homogenous Structures in Gene Sets. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 468. Springer, Cham. https://doi.org/10.1007/978-3-031-04826-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-04826-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04825-8
Online ISBN: 978-3-031-04826-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)