Abstract
Tracing the origins of body fluids, which can provide information linking sample donors with criminal acts, is one of the primary challenges facing forensic medicine. Gene expression profiling methods have been widely developed to identify biomarkers for body fluid identification. In this study, we systematically investigated large-scale, multi-category, high-throughput gene expression data and identified 36 high potential body fluid-specific mRNAs with robust discriminability based on decision tree models. Robustly expressed reference genes were selected for normalization, which further improved the accuracy. Results on independent datasets suggested the robust performance and good generalizability of our biomarkers. In addition, simulated data indicated that our biomarkers could also be employed for accurate body fluid mixture deconvolution. We believe our methods may facilitate body fluid identification and provide insights into forensic crime scene reconstruction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbas, A.R., Wolslegel, K., Seshasayee, D., Modrusan, Z., Clark, H.F.: Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS One 4(7), e6098 (2009)
An, J.H., Shin, K.J., Yang, W.I., Lee, H.Y.: Body fluid identification in forensics. BMB Rep. 45(10), 545–553 (2012)
Batruch, I., et al.: Proteomic analysis of seminal plasma from normal volunteers and post-vasectomy patients identifies over 2000 proteins and candidate biomarkers of the urogenital system. J. Proteome Res. 10(3), 941–953 (2011)
Batruch, I., et al.: Analysis of seminal plasma from patients with non-obstructive azoospermia and identification of candidate biomarkers of male infertility. J. Proteome Res. 11(3), 1503–1511 (2012)
Biben, C., Wang, C.C., Harvey, R.P.: NK-2 class homeobox genes and pharyngeal/oral patterning: Nkx2-3 is required for salivary gland and tooth morphogenesis. Int. J. Dev. Biol. 46(4), 415–422 (2002)
Chicco, D., Rovelli, C.: Computational prediction of diagnosis and feature selection on mesothelioma patient health records. PLoS One 14(1), e0208737 (2019)
Chim, S.S.C., et al.: Systematic selection of reference genes for the normalization of circulating rna transcripts in pregnant women based on RNA-seq data. Int. J. Mol. Sci. 18(8), 1709 (2017)
Chiu, Y.J., Hsieh, Y.H., Huang, Y.H.: Improved cell composition deconvolution method of bulk gene expression profiles to quantify subsets of immune cells. BMC Med. Genomics 12(Suppl 8), 169 (2019)
Clarke, K.R.: Non-parametric multivariate analyses of changes in community structure. Aust. J. Ecol. 18(1), 117–143 (1993)
Consortium, G.T., et al.: Genetic effects on gene expression across human tissues. Nature 550(7675), 204–213 (2017)
Czechowski, T., Stitt, M., Altmann, T., Udvardi, M.K., Scheible, W.R.: Genome-wide identification and testing of superior reference genes for transcript normalization in arabidopsis. Plant Physiol. 139(1), 5–17 (2005)
Dekkers, B.J., et al.: Identification of reference genes for RT-qPCR expression analysis in arabidopsis and tomato seeds. Plant Cell Physiol. 53(1), 28–37 (2012)
Dobin, A., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1), 15–21 (2013)
Dorum, G., Ingold, S., Hanson, E., Ballantyne, J., Snipen, L., Haas, C.: Predicting the origin of stains from next generation sequencing mRNA data. Forensic Sci. Int. Genet. 34, 37–48 (2018)
Eisenberg, E., Levanon, E.Y.: Human housekeeping genes, revisited. Trends Genet. 29(10), 569–574 (2013)
Garcia-Herrero, S., Meseguer, M., Martinez-Conejero, J.A., Remohi, J., Pellicer, A., Garrido, N.: The transcriptome of spermatozoa used in homologous intrauterine insemination varies considerably between samples that achieve pregnancy and those that do not. Fertil. Steril. 94(4), 1360–1373 (2010)
Georgiadis, A.P., et al.: High quality RNA in semen and sperm: isolation, analysis and potential application in clinical testing. J. Urol. 193(1), 352–359 (2015)
Gong, T., Szustakowski, J.D.: DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-seq data. Bioinformatics 29(8), 1083–1085 (2013)
Haas, C., Hanson, E., Kratzer, A., Bar, W., Ballantyne, J.: Selection of highly specific and sensitive mRNA biomarkers for the identification of blood. Forensic Sci. Int. Genet. 5(5), 449–458 (2011)
Habuka, M., Fagerberg, L., Hallstrom, B.M., Ponten, F., Yamamoto, T., Uhlen, M.: The urinary bladder transcriptome and proteome defined by transcriptomics and antibody-based profiling. PLoS One 10(12), e0145301 (2015)
Hanson, E., Ingold, S., Haas, C., Ballantyne, J.: Messenger RNA biomarker signatures for forensic body fluid identification revealed by targeted RNA sequencing. Forensic Sci. Int. Genet. 34, 206–221 (2018)
Hanson, E.K., Ballantyne, J.: Highly specific mRNA biomarkers for the identification of vaginal secretions in sexual assault investigations. Sci. Justice 53(1), 14–22 (2013)
Hernandez-Molina, G., et al.: Absence of salivary CCL28 in primary Sjogren’s syndrome. Rheumatol. Int. 35(8), 1431–1434 (2015)
Hieshima, K., et al.: CCL28 has dual roles in mucosal immunity as a chemokine with broad-spectrum antimicrobial activity. J. Immunol. 170(3), 1452–1461 (2003)
Ingold, S., et al.: Body fluid identification using a targeted mRNA massively parallel sequencing approach - results of a EUROFORGEN/EDNAP collaborative exercise. Forensic Sci. Int. Genet. 34, 105–115 (2018)
Jiang, L., Zhang, M., Wang, S., Han, Y., Fang, X.: Common and specific gene signatures among three different endometriosis subtypes. PeerJ 8, e8730 (2020)
Jodar, M., Sendler, E., Krawetz, S.A.: The protein and transcript profiles of human semen. Cell Tissue Res. 363(1), 85–96 (2016)
Johnson, G.D., Jodar, M., Pique-Regi, R., Krawetz, S.A.: Nuclease footprints in sperm project past and future chromatin regulatory events. Sci. Rep. 6, 25864 (2016)
Juusola, J., Ballantyne, J.: mRNA profiling for body fluid identification by multiplex quantitative RT-PCR. J. Forensic Sci. 52(6), 1252–1262 (2007)
Korkmaz, K.S., Elbi, C., Korkmaz, C.G., Loda, M., Hager, G.L., Saatcioglu, F.: Molecular cloning and characterization of STAMP1, a highly prostate-specific six transmembrane protein that is overexpressed in prostate cancer. J. Biol. Chem. 277(39), 36689–36696 (2002)
Liang, Q., et al.: Development of new mRNA markers for the identification of menstrual blood. Ann. Clin. Lab. Sci. 48(1), 55–62 (2018)
May, K.E., Villar, J., Kirtley, S., Kennedy, S.H., Becker, C.M.: Endometrial alterations in endometriosis: a systematic review of putative biomarkers. Hum. Reprod. Update 17(5), 637–653 (2011)
Michael, D.G., Pranzatelli, T.J.F., Warner, B.M., Yin, H., Chiorini, J.A.: Integrated epigenetic mapping of human and mouse salivary gene regulation. J. Dent. Res. 98(2), 209–217 (2019)
Newman, A.M., et al.: Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12(5), 453–457 (2015)
Nussbaumer, C., Gharehbaghi-Schnell, E., Korschineck, I.: Messenger RNA profiling: a novel method for body fluid identification by real-time PCR. Forensic Sci. Int. 157(2–3), 181–186 (2006)
Park, S.M., et al.: Genome-wide mRNA profiling and multiplex quantitative RT-PCR for forensic body fluid identification. Forensic Sci. Int. Genet. 7(1), 143–150 (2013)
Porkka, K.P., Helenius, M.A., Visakorpi, T.: Cloning and characterization of a novel six-transmembrane protein STEAP2, expressed in normal and malignant prostate. Lab. Invest. 82(11), 1573–1582 (2002)
Raffi, R.O., Moghissi, K.S., Sacco, A.G.: Proteins of human vaginal fluid. Fertil. Steril. 28(12), 1345–1348 (1977)
Saitou, M., et al.: Functional specialization of human salivary glands and origins of proteins intrinsic to human saliva. Cell Rep. 33(7), 108402 (2020)
Setzer, M., Juusola, J., Ballantyne, J.: Recovery and stability of RNA in vaginal swabs and blood, semen, and saliva stains. J. Forensic Sci. 53(2), 296–305 (2008)
Song, F., Luo, H., Hou, Y.: Developed and evaluated a multiplex mRNA profiling system for body fluid identification in Chinese Han population. J. Forensic Leg. Med. 35, 73–80 (2015)
Suntsova, M., et al.: Atlas of RNA sequencing profiles for normal human tissues. Sci. Data 6(1), 36 (2019)
Tackmann, J., Arora, N., Schmidt, T.S.B., Rodrigues, J.F.M., von Mering, C.: Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites. Microbiome 6(1), 192 (2018)
Uhlen, M., et al.: Proteomics. Tissue-based map of the human proteome. Science 347(6220), 1260419 (2015)
Vandesompele, J., et al.: Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3(7), RESEARCH0034 (2002)
Venet, D., Pecasse, F., Maenhaut, C., Bersini, H.: Separation of samples into their constituents using gene expression data. Bioinformatics 17(Suppl 1), S279–S287 (2001)
Xu, B., et al.: Regulation of endometrial receptivity by the highly expressed HOXA9, HOXA11 and HOXD10 HOX-class homeobox genes. Hum. Reprod. 29(4), 781–790 (2014)
Yanai, I., et al.: Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21(5), 650–659 (2005)
Zhuo, B., Emerson, S., Chang, J.H., Di, Y.: Identifying stably expressed genes from multiple RNA-seq data sets. PeerJ 4, e2791 (2016)
Zubakov, D., Hanekamp, E., Kokshoorn, M., van Ijcken, W., Kayser, M.: Stable RNA markers for identification of blood and saliva stains revealed from whole genome expression analysis of time-wise degraded samples. Int. J. Legal Med. 122(2), 135–142 (2008)
Acknowledgements
We are grateful to all the participants in this study. This work was supported by Beijing Natural Science Foundation (No. 7212065). We are also grateful to the GTEx program, GEO, HPA database and corresponding data contributors for making their enormous database and resources available. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript from GTEx were obtained from: the GTEx Portal on 05/04/2019 and dbGaP accession number phs000424.v7.p2 on 05/04/2019.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
He, G., Xiao, L., Bian, Y., Yang, E. (2022). Genome-Wide Feature Selection of Robust mRNA Biomarkers for Body Fluid Identification. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1745. Springer, Singapore. https://doi.org/10.1007/978-981-19-8991-9_3
Download citation
DOI: https://doi.org/10.1007/978-981-19-8991-9_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8990-2
Online ISBN: 978-981-19-8991-9
eBook Packages: Computer ScienceComputer Science (R0)