Knowledge-Based Identification of Homogenous Structures in Gene Sets

Thrun, Michael C.

doi:10.1007/978-3-031-04826-5_9

Michael C. Thrun ORCID: orcid.org/0000-0001-9542-5543¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 468))

Included in the following conference series:

World Conference on Information Systems and Technologies

Abstract

The analysis of gene expression data plays a crucial role in disease diagnosis. However, such analysis is rather complex due to the high-dimensional gene space. This works shows that semantically related genes can be grouped using biological knowledge in the Gene Ontology (GO) database. The GO defines knowledge phrases called GO terms describing biological functions of genes. These knowledge phrases are typically used when discussing individual components of the ontologies. By exploiting the term frequency-inverse document frequency (tf-idf) as a distance measure between two genes, this distance is used in structure analysis on four gene sets causally associated with pain and the chronification of pain, hearing loss, cancer, and drug addiction. The analysis employs the Databionic swarm (DBS). The results show distinctive homogenous groups in each set of genes outperforming related work. This work outlines the successful application of information retrieval dissimilarity measure and structure analysis on gene sets using the knowledge base of the Gene Ontology. Furthermore, it enables in the future a separate analysis of gene subspaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Taub, F.E., DeLEO, J.M., Thompson, E.B.: Sequential comparative hybridizations analyzed by computerized image processing can identify and quantitate regulated RNAs. Dna. 2(4), 309–327 (1983)
Article Google Scholar
Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends Genet. 24(3), 133–141 (2008)
Article Google Scholar
Lötsch, J., Doehring, A., Mogil, J.S., Arndt, T., Geisslinger, G., Ultsch, A.: Functional genomics of pain in analgesic drug development and therapy. Pharmacol. Ther. 139(1), 60–70 (2013)
Article Google Scholar
Ultsch, A., Palasch, C., Herda, S., Lötsch, J.: What do all those MIRna do? In: Kestler, H., Schmid, M., Lausser, J., Kraus, J. (eds.) Statistical Computing, p. 16. Ulmer Informatik-Berichte, Günzburg (2014)
Google Scholar
Tarca, A.L., Bhatti, G., Romero, R.: A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS ONE 8(11), e79217 (2013)
Article Google Scholar
Barabasi, A.-L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5(2), 101–113 (2004)
Article Google Scholar
Alm, E., Arkin, A.P.: Biological networks. Curr. Opin. Struct. Biol. 13(2), 193–202 (2003)
Article Google Scholar
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102(43), 15545–15550 (2005)
Article Google Scholar
Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intel. Res. 11, 95–130 (1999)
Article MATH Google Scholar
Thrun, M.C., Ultsch, A.: Using projection-based clustering to find distance- and density-based clusters in high-dimensional data. J. Classif. 38(2), 280–312 (2020). https://doi.org/10.1007/s00357-020-09373-2
Article MathSciNet MATH Google Scholar
Thrun, M.C.: Projection based clustering through self-organization and swarm intelligence. Springer, Heidelberg (2018)
Book Google Scholar
Ultsch, A., Lötsch, J.: Machine-learned cluster identification in high-dimensional data. J. Biomed. Inform. 66(C), 95–104 (2017)
Article Google Scholar
Thrun, M.C.: Distance-based clustering challenges for unbiased benchmarking studies. Nat. Sci. Rep. 11(1), 18988 (2021). https://doi.org/10.1038/s41598-021-98126-1
Article Google Scholar
Acharya, S., Saha, S., Nikhil, N.: Unsupervised gene selection using biological knowledge : application in sample clustering. BMC Bioinformatics 18, 513 (2017). https://doi.org/10.1186/s12859-017-1933-0
Article Google Scholar
Thrun, M.C., Ultsch, A.: Swarm intelligence for self-organized clustering. Artif. Intell. 290, 103237 (2021). https://doi.org/10.1016/j.artint.2020.103237
Article MathSciNet MATH Google Scholar
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25(1), 25–29 (2000). https://doi.org/10.1038/75556
Article Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2011)
Google Scholar
Thrun, M.C., Ultsch, A.: Uncovering high-dimensional structures of projections from dimensionality reduction methods. MethodsX. 7, 101093 (2020). https://doi.org/10.1016/j.mex.2020.101093
Article Google Scholar
Ultsch, A., Kringel, D., Kalso, E., Mogil, J.S., Lötsch, J.: A data science approach to candidate gene selection of pain regarded as a process of learning and neural plasticity. Pain 157(12), 2747–2757 (2016)
Article Google Scholar
Gene Testing Registry: OtoGenome Test for Hearing Loss. (2018). https://www.ncbi.nlm.nih.gov/gtr/tests/509148/ Accessed 2017
Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., et al.: A census of human cancer genes. Nat. Rev. Cancer 4(3), 177 (2004). https://doi.org/10.1038/nrc1299
Article Google Scholar
Li, C.-Y., Mao, X., Wei, L.: Genes and (common) pathways underlying drug addiction. PLoS Comput. Biol. 4(1), e2 (2008)
Article Google Scholar
Camon, E., Magrane, M., Barrell, D., Binns, D., Fleischmann, W., Kersey, P., et al.: The gene ontology annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res. 13(4), 662–672 (2003)
Article Google Scholar
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., et al.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res. 32(suppl 1), D262–D266 (2004)
Article Google Scholar
Sparck, J.K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)
Article Google Scholar
Thrun, M.C.: The exploitation of distance distributions for clustering. Int. J. Comput. Intell. Appl. 20(3), 2150016 (2021). https://doi.org/10.1142/S1469026821500164
Article Google Scholar
Murphy, K.P.: Machine learning: a probabilistic perspective. MIT press, Adaptive Computation and Machine Learning series (2012)
MATH Google Scholar
Nash, J.F.: Equilibrium points in n-person games. Proc Nat Acad Sci USA 36(1), 48–49 (1950)
Article MathSciNet MATH Google Scholar
Thrun, M.C., Lerch, F., Lötsch, J., Ultsch, A.: Visualization and 3D Printing of Multivariate Data of Biomarkers. In: Skala V, editor. International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG). Plzen, pp. 7–16 (2016)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. Trans. Pattern Anal. Mach. Intell. IEEE 1(2), 224–227 (1979). https://doi.org/10.1109/TPAMI.1979.4766909
Article Google Scholar
Thrun, M.C., Stier, Q.: Fundamental clustering algorithms suite. SoftwareX. 13(1), 100642 (2021). https://doi.org/10.1016/j.softx.2020.100642
Article Google Scholar
Wei, C.-H., Kao, H.-Y., Lu, Z.: PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 41(W1), W518–W522 (2013)
Article Google Scholar
Lipscomb, C.E.: Medical subject headings (MeSH). Bull. Med. Libr. Assoc. 88(3), 265 (2000)
Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th international conference on World Wide Web, pp. 91–100 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
Michael C. Thrun

Authors

Michael C. Thrun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael C. Thrun .

Editor information

Editors and Affiliations

ISEG, Universidade de Lisboa, Lisbon, Portugal
Alvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania
Gintautas Dzemyda
DCT, Universidade Portucalense, Porto, Portugal
Fernando Moreira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thrun, M.C. (2022). Knowledge-Based Identification of Homogenous Structures in Gene Sets. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 468. Springer, Cham. https://doi.org/10.1007/978-3-031-04826-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-04826-5_9
Published: 11 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04825-8
Online ISBN: 978-3-031-04826-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics