Skip to main content

Knowledge-Based Identification of Homogenous Structures in Gene Sets

  • Conference paper
  • First Online:
Information Systems and Technologies (WorldCIST 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 468))

Included in the following conference series:

Abstract

The analysis of gene expression data plays a crucial role in disease diagnosis. However, such analysis is rather complex due to the high-dimensional gene space. This works shows that semantically related genes can be grouped using biological knowledge in the Gene Ontology (GO) database. The GO defines knowledge phrases called GO terms describing biological functions of genes. These knowledge phrases are typically used when discussing individual components of the ontologies. By exploiting the term frequency-inverse document frequency (tf-idf) as a distance measure between two genes, this distance is used in structure analysis on four gene sets causally associated with pain and the chronification of pain, hearing loss, cancer, and drug addiction. The analysis employs the Databionic swarm (DBS). The results show distinctive homogenous groups in each set of genes outperforming related work. This work outlines the successful application of information retrieval dissimilarity measure and structure analysis on gene sets using the knowledge base of the Gene Ontology. Furthermore, it enables in the future a separate analysis of gene subspaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Taub, F.E., DeLEO, J.M., Thompson, E.B.: Sequential comparative hybridizations analyzed by computerized image processing can identify and quantitate regulated RNAs. Dna. 2(4), 309–327 (1983)

    Article  Google Scholar 

  2. Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends Genet. 24(3), 133–141 (2008)

    Article  Google Scholar 

  3. Lötsch, J., Doehring, A., Mogil, J.S., Arndt, T., Geisslinger, G., Ultsch, A.: Functional genomics of pain in analgesic drug development and therapy. Pharmacol. Ther. 139(1), 60–70 (2013)

    Article  Google Scholar 

  4. Ultsch, A., Palasch, C., Herda, S., Lötsch, J.: What do all those MIRna do? In: Kestler, H., Schmid, M., Lausser, J., Kraus, J. (eds.) Statistical Computing, p. 16. Ulmer Informatik-Berichte, Günzburg (2014)

    Google Scholar 

  5. Tarca, A.L., Bhatti, G., Romero, R.: A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS ONE 8(11), e79217 (2013)

    Article  Google Scholar 

  6. Barabasi, A.-L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5(2), 101–113 (2004)

    Article  Google Scholar 

  7. Alm, E., Arkin, A.P.: Biological networks. Curr. Opin. Struct. Biol. 13(2), 193–202 (2003)

    Article  Google Scholar 

  8. Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102(43), 15545–15550 (2005)

    Article  Google Scholar 

  9. Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intel. Res. 11, 95–130 (1999)

    Article  MATH  Google Scholar 

  10. Thrun, M.C., Ultsch, A.: Using projection-based clustering to find distance- and density-based clusters in high-dimensional data. J. Classif. 38(2), 280–312 (2020). https://doi.org/10.1007/s00357-020-09373-2

    Article  MathSciNet  MATH  Google Scholar 

  11. Thrun, M.C.: Projection based clustering through self-organization and swarm intelligence. Springer, Heidelberg (2018)

    Book  Google Scholar 

  12. Ultsch, A., Lötsch, J.: Machine-learned cluster identification in high-dimensional data. J. Biomed. Inform. 66(C), 95–104 (2017)

    Article  Google Scholar 

  13. Thrun, M.C.: Distance-based clustering challenges for unbiased benchmarking studies. Nat. Sci. Rep. 11(1), 18988 (2021). https://doi.org/10.1038/s41598-021-98126-1

    Article  Google Scholar 

  14. Acharya, S., Saha, S., Nikhil, N.: Unsupervised gene selection using biological knowledge : application in sample clustering. BMC Bioinformatics 18, 513 (2017). https://doi.org/10.1186/s12859-017-1933-0

    Article  Google Scholar 

  15. Thrun, M.C., Ultsch, A.: Swarm intelligence for self-organized clustering. Artif. Intell. 290, 103237 (2021). https://doi.org/10.1016/j.artint.2020.103237

    Article  MathSciNet  MATH  Google Scholar 

  16. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25(1), 25–29 (2000). https://doi.org/10.1038/75556

    Article  Google Scholar 

  17. Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2011)

    Google Scholar 

  18. Thrun, M.C., Ultsch, A.: Uncovering high-dimensional structures of projections from dimensionality reduction methods. MethodsX. 7, 101093 (2020). https://doi.org/10.1016/j.mex.2020.101093

    Article  Google Scholar 

  19. Ultsch, A., Kringel, D., Kalso, E., Mogil, J.S., Lötsch, J.: A data science approach to candidate gene selection of pain regarded as a process of learning and neural plasticity. Pain 157(12), 2747–2757 (2016)

    Article  Google Scholar 

  20. Gene Testing Registry: OtoGenome Test for Hearing Loss. (2018). https://www.ncbi.nlm.nih.gov/gtr/tests/509148/ Accessed 2017

  21. Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., et al.: A census of human cancer genes. Nat. Rev. Cancer 4(3), 177 (2004). https://doi.org/10.1038/nrc1299

    Article  Google Scholar 

  22. Li, C.-Y., Mao, X., Wei, L.: Genes and (common) pathways underlying drug addiction. PLoS Comput. Biol. 4(1), e2 (2008)

    Article  Google Scholar 

  23. Camon, E., Magrane, M., Barrell, D., Binns, D., Fleischmann, W., Kersey, P., et al.: The gene ontology annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res. 13(4), 662–672 (2003)

    Article  Google Scholar 

  24. Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., et al.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res. 32(suppl 1), D262–D266 (2004)

    Article  Google Scholar 

  25. Sparck, J.K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)

    Article  Google Scholar 

  26. Thrun, M.C.: The exploitation of distance distributions for clustering. Int. J. Comput. Intell. Appl. 20(3), 2150016 (2021). https://doi.org/10.1142/S1469026821500164

    Article  Google Scholar 

  27. Murphy, K.P.: Machine learning: a probabilistic perspective. MIT press, Adaptive Computation and Machine Learning series (2012)

    MATH  Google Scholar 

  28. Nash, J.F.: Equilibrium points in n-person games. Proc Nat Acad Sci USA 36(1), 48–49 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  29. Thrun, M.C., Lerch, F., Lötsch, J., Ultsch, A.: Visualization and 3D Printing of Multivariate Data of Biomarkers. In: Skala V, editor. International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG). Plzen, pp. 7–16 (2016)

    Google Scholar 

  30. Davies, D.L., Bouldin, D.W.: A cluster separation measure. Trans. Pattern Anal. Mach. Intell. IEEE 1(2), 224–227 (1979). https://doi.org/10.1109/TPAMI.1979.4766909

    Article  Google Scholar 

  31. Thrun, M.C., Stier, Q.: Fundamental clustering algorithms suite. SoftwareX. 13(1), 100642 (2021). https://doi.org/10.1016/j.softx.2020.100642

    Article  Google Scholar 

  32. Wei, C.-H., Kao, H.-Y., Lu, Z.: PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 41(W1), W518–W522 (2013)

    Article  Google Scholar 

  33. Lipscomb, C.E.: Medical subject headings (MeSH). Bull. Med. Libr. Assoc. 88(3), 265 (2000)

    Google Scholar 

  34. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)

    Article  Google Scholar 

  35. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  36. Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th international conference on World Wide Web, pp. 91–100 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael C. Thrun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thrun, M.C. (2022). Knowledge-Based Identification of Homogenous Structures in Gene Sets. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 468. Springer, Cham. https://doi.org/10.1007/978-3-031-04826-5_9

Download citation

Publish with us

Policies and ethics