Skip to main content

Genome-Wide Feature Selection of Robust mRNA Biomarkers for Body Fluid Identification

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1745))

Included in the following conference series:

  • 659 Accesses

Abstract

Tracing the origins of body fluids, which can provide information linking sample donors with criminal acts, is one of the primary challenges facing forensic medicine. Gene expression profiling methods have been widely developed to identify biomarkers for body fluid identification. In this study, we systematically investigated large-scale, multi-category, high-throughput gene expression data and identified 36 high potential body fluid-specific mRNAs with robust discriminability based on decision tree models. Robustly expressed reference genes were selected for normalization, which further improved the accuracy. Results on independent datasets suggested the robust performance and good generalizability of our biomarkers. In addition, simulated data indicated that our biomarkers could also be employed for accurate body fluid mixture deconvolution. We believe our methods may facilitate body fluid identification and provide insights into forensic crime scene reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbas, A.R., Wolslegel, K., Seshasayee, D., Modrusan, Z., Clark, H.F.: Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS One 4(7), e6098 (2009)

    Article  Google Scholar 

  2. An, J.H., Shin, K.J., Yang, W.I., Lee, H.Y.: Body fluid identification in forensics. BMB Rep. 45(10), 545–553 (2012)

    Article  Google Scholar 

  3. Batruch, I., et al.: Proteomic analysis of seminal plasma from normal volunteers and post-vasectomy patients identifies over 2000 proteins and candidate biomarkers of the urogenital system. J. Proteome Res. 10(3), 941–953 (2011)

    Article  Google Scholar 

  4. Batruch, I., et al.: Analysis of seminal plasma from patients with non-obstructive azoospermia and identification of candidate biomarkers of male infertility. J. Proteome Res. 11(3), 1503–1511 (2012)

    Article  Google Scholar 

  5. Biben, C., Wang, C.C., Harvey, R.P.: NK-2 class homeobox genes and pharyngeal/oral patterning: Nkx2-3 is required for salivary gland and tooth morphogenesis. Int. J. Dev. Biol. 46(4), 415–422 (2002)

    Google Scholar 

  6. Chicco, D., Rovelli, C.: Computational prediction of diagnosis and feature selection on mesothelioma patient health records. PLoS One 14(1), e0208737 (2019)

    Article  Google Scholar 

  7. Chim, S.S.C., et al.: Systematic selection of reference genes for the normalization of circulating rna transcripts in pregnant women based on RNA-seq data. Int. J. Mol. Sci. 18(8), 1709 (2017)

    Article  Google Scholar 

  8. Chiu, Y.J., Hsieh, Y.H., Huang, Y.H.: Improved cell composition deconvolution method of bulk gene expression profiles to quantify subsets of immune cells. BMC Med. Genomics 12(Suppl 8), 169 (2019)

    Article  Google Scholar 

  9. Clarke, K.R.: Non-parametric multivariate analyses of changes in community structure. Aust. J. Ecol. 18(1), 117–143 (1993)

    Article  Google Scholar 

  10. Consortium, G.T., et al.: Genetic effects on gene expression across human tissues. Nature 550(7675), 204–213 (2017)

    Google Scholar 

  11. Czechowski, T., Stitt, M., Altmann, T., Udvardi, M.K., Scheible, W.R.: Genome-wide identification and testing of superior reference genes for transcript normalization in arabidopsis. Plant Physiol. 139(1), 5–17 (2005)

    Article  Google Scholar 

  12. Dekkers, B.J., et al.: Identification of reference genes for RT-qPCR expression analysis in arabidopsis and tomato seeds. Plant Cell Physiol. 53(1), 28–37 (2012)

    Article  Google Scholar 

  13. Dobin, A., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1), 15–21 (2013)

    Article  Google Scholar 

  14. Dorum, G., Ingold, S., Hanson, E., Ballantyne, J., Snipen, L., Haas, C.: Predicting the origin of stains from next generation sequencing mRNA data. Forensic Sci. Int. Genet. 34, 37–48 (2018)

    Article  Google Scholar 

  15. Eisenberg, E., Levanon, E.Y.: Human housekeeping genes, revisited. Trends Genet. 29(10), 569–574 (2013)

    Article  Google Scholar 

  16. Garcia-Herrero, S., Meseguer, M., Martinez-Conejero, J.A., Remohi, J., Pellicer, A., Garrido, N.: The transcriptome of spermatozoa used in homologous intrauterine insemination varies considerably between samples that achieve pregnancy and those that do not. Fertil. Steril. 94(4), 1360–1373 (2010)

    Article  Google Scholar 

  17. Georgiadis, A.P., et al.: High quality RNA in semen and sperm: isolation, analysis and potential application in clinical testing. J. Urol. 193(1), 352–359 (2015)

    Article  Google Scholar 

  18. Gong, T., Szustakowski, J.D.: DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-seq data. Bioinformatics 29(8), 1083–1085 (2013)

    Article  Google Scholar 

  19. Haas, C., Hanson, E., Kratzer, A., Bar, W., Ballantyne, J.: Selection of highly specific and sensitive mRNA biomarkers for the identification of blood. Forensic Sci. Int. Genet. 5(5), 449–458 (2011)

    Article  Google Scholar 

  20. Habuka, M., Fagerberg, L., Hallstrom, B.M., Ponten, F., Yamamoto, T., Uhlen, M.: The urinary bladder transcriptome and proteome defined by transcriptomics and antibody-based profiling. PLoS One 10(12), e0145301 (2015)

    Article  Google Scholar 

  21. Hanson, E., Ingold, S., Haas, C., Ballantyne, J.: Messenger RNA biomarker signatures for forensic body fluid identification revealed by targeted RNA sequencing. Forensic Sci. Int. Genet. 34, 206–221 (2018)

    Article  Google Scholar 

  22. Hanson, E.K., Ballantyne, J.: Highly specific mRNA biomarkers for the identification of vaginal secretions in sexual assault investigations. Sci. Justice 53(1), 14–22 (2013)

    Article  Google Scholar 

  23. Hernandez-Molina, G., et al.: Absence of salivary CCL28 in primary Sjogren’s syndrome. Rheumatol. Int. 35(8), 1431–1434 (2015)

    Article  Google Scholar 

  24. Hieshima, K., et al.: CCL28 has dual roles in mucosal immunity as a chemokine with broad-spectrum antimicrobial activity. J. Immunol. 170(3), 1452–1461 (2003)

    Article  Google Scholar 

  25. Ingold, S., et al.: Body fluid identification using a targeted mRNA massively parallel sequencing approach - results of a EUROFORGEN/EDNAP collaborative exercise. Forensic Sci. Int. Genet. 34, 105–115 (2018)

    Article  Google Scholar 

  26. Jiang, L., Zhang, M., Wang, S., Han, Y., Fang, X.: Common and specific gene signatures among three different endometriosis subtypes. PeerJ 8, e8730 (2020)

    Article  Google Scholar 

  27. Jodar, M., Sendler, E., Krawetz, S.A.: The protein and transcript profiles of human semen. Cell Tissue Res. 363(1), 85–96 (2016)

    Article  Google Scholar 

  28. Johnson, G.D., Jodar, M., Pique-Regi, R., Krawetz, S.A.: Nuclease footprints in sperm project past and future chromatin regulatory events. Sci. Rep. 6, 25864 (2016)

    Article  Google Scholar 

  29. Juusola, J., Ballantyne, J.: mRNA profiling for body fluid identification by multiplex quantitative RT-PCR. J. Forensic Sci. 52(6), 1252–1262 (2007)

    Google Scholar 

  30. Korkmaz, K.S., Elbi, C., Korkmaz, C.G., Loda, M., Hager, G.L., Saatcioglu, F.: Molecular cloning and characterization of STAMP1, a highly prostate-specific six transmembrane protein that is overexpressed in prostate cancer. J. Biol. Chem. 277(39), 36689–36696 (2002)

    Article  Google Scholar 

  31. Liang, Q., et al.: Development of new mRNA markers for the identification of menstrual blood. Ann. Clin. Lab. Sci. 48(1), 55–62 (2018)

    Google Scholar 

  32. May, K.E., Villar, J., Kirtley, S., Kennedy, S.H., Becker, C.M.: Endometrial alterations in endometriosis: a systematic review of putative biomarkers. Hum. Reprod. Update 17(5), 637–653 (2011)

    Article  Google Scholar 

  33. Michael, D.G., Pranzatelli, T.J.F., Warner, B.M., Yin, H., Chiorini, J.A.: Integrated epigenetic mapping of human and mouse salivary gene regulation. J. Dent. Res. 98(2), 209–217 (2019)

    Article  Google Scholar 

  34. Newman, A.M., et al.: Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12(5), 453–457 (2015)

    Article  Google Scholar 

  35. Nussbaumer, C., Gharehbaghi-Schnell, E., Korschineck, I.: Messenger RNA profiling: a novel method for body fluid identification by real-time PCR. Forensic Sci. Int. 157(2–3), 181–186 (2006)

    Article  Google Scholar 

  36. Park, S.M., et al.: Genome-wide mRNA profiling and multiplex quantitative RT-PCR for forensic body fluid identification. Forensic Sci. Int. Genet. 7(1), 143–150 (2013)

    Article  Google Scholar 

  37. Porkka, K.P., Helenius, M.A., Visakorpi, T.: Cloning and characterization of a novel six-transmembrane protein STEAP2, expressed in normal and malignant prostate. Lab. Invest. 82(11), 1573–1582 (2002)

    Article  Google Scholar 

  38. Raffi, R.O., Moghissi, K.S., Sacco, A.G.: Proteins of human vaginal fluid. Fertil. Steril. 28(12), 1345–1348 (1977)

    Article  Google Scholar 

  39. Saitou, M., et al.: Functional specialization of human salivary glands and origins of proteins intrinsic to human saliva. Cell Rep. 33(7), 108402 (2020)

    Google Scholar 

  40. Setzer, M., Juusola, J., Ballantyne, J.: Recovery and stability of RNA in vaginal swabs and blood, semen, and saliva stains. J. Forensic Sci. 53(2), 296–305 (2008)

    Article  Google Scholar 

  41. Song, F., Luo, H., Hou, Y.: Developed and evaluated a multiplex mRNA profiling system for body fluid identification in Chinese Han population. J. Forensic Leg. Med. 35, 73–80 (2015)

    Article  Google Scholar 

  42. Suntsova, M., et al.: Atlas of RNA sequencing profiles for normal human tissues. Sci. Data 6(1), 36 (2019)

    Article  Google Scholar 

  43. Tackmann, J., Arora, N., Schmidt, T.S.B., Rodrigues, J.F.M., von Mering, C.: Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites. Microbiome 6(1), 192 (2018)

    Article  Google Scholar 

  44. Uhlen, M., et al.: Proteomics. Tissue-based map of the human proteome. Science 347(6220), 1260419 (2015)

    Article  Google Scholar 

  45. Vandesompele, J., et al.: Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3(7), RESEARCH0034 (2002)

    Google Scholar 

  46. Venet, D., Pecasse, F., Maenhaut, C., Bersini, H.: Separation of samples into their constituents using gene expression data. Bioinformatics 17(Suppl 1), S279–S287 (2001)

    Article  Google Scholar 

  47. Xu, B., et al.: Regulation of endometrial receptivity by the highly expressed HOXA9, HOXA11 and HOXD10 HOX-class homeobox genes. Hum. Reprod. 29(4), 781–790 (2014)

    Article  Google Scholar 

  48. Yanai, I., et al.: Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21(5), 650–659 (2005)

    Article  Google Scholar 

  49. Zhuo, B., Emerson, S., Chang, J.H., Di, Y.: Identifying stably expressed genes from multiple RNA-seq data sets. PeerJ 4, e2791 (2016)

    Article  Google Scholar 

  50. Zubakov, D., Hanekamp, E., Kokshoorn, M., van Ijcken, W., Kayser, M.: Stable RNA markers for identification of blood and saliva stains revealed from whole genome expression analysis of time-wise degraded samples. Int. J. Legal Med. 122(2), 135–142 (2008)

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to all the participants in this study. This work was supported by Beijing Natural Science Foundation (No. 7212065). We are also grateful to the GTEx program, GEO, HPA database and corresponding data contributors for making their enormous database and resources available. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript from GTEx were obtained from: the GTEx Portal on 05/04/2019 and dbGaP accession number phs000424.v7.p2 on 05/04/2019.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yingnan Bian or Ence Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, G., Xiao, L., Bian, Y., Yang, E. (2022). Genome-Wide Feature Selection of Robust mRNA Biomarkers for Body Fluid Identification. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1745. Springer, Singapore. https://doi.org/10.1007/978-981-19-8991-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-8991-9_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-8990-2

  • Online ISBN: 978-981-19-8991-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics