ABSTRACT
Log fold change (LFC) is a common measure used in differential expression analysis to examine the differences in gene expression between two experimental classes, as in the data generated by microarray or bulk RNA sequencing. Many single-cell RNA-seq (scRNA-seq) data are labelled with three or more classes in terms of cell types, cell states, or cell stages. Several differential expression methods have been introduced to select differentially expressed genes (DEGs) among different classes in scRNA-seq data while accounting for the technical and biological variations. However, these methods are only applicable to perform comparisons between two classes. Methods to select DEGs with multiclass comparisons have also been introduced in the literature, but different measures are used instead of LFC. Thus, this study aims to impose the impactful LFC measure as a multiclass DEGs selection method. The majority voting concept is incorporated to aggregate the DEGs from every pairwise class comparison. Cell type classification using the selected genes has been performed to evaluate and validate the genes selected by the multiclass LFC method. The results show that the proposed method is capable of reducing the number of genes to as low as 26.05% of the initial scRNA-seq data. Moreover, the selected genes can classify cells into their respective cell types more accurately (an accuracy of 0.9425) as compared to the existing scRNA-seq gene selection method (an accuracy of 0.9137).
- Tallulah S Andrews and Martin Hemberg. 2019. M3Drop: dropout-based feature selection for scRNASeq. Bioinformatics 35, 16 (August 2019), 2865–2867. DOI:https://doi.org/10.1093/bioinformatics/bty1044Google ScholarCross Ref
- Jonathan Bard, Seung Y. Rhee, and Michael Ashburner. 2005. An ontology for cell types. Genome Biol 6, 2 (January 2005), R21. DOI:https://doi.org/10.1186/gb-2005-6-2-r21Google ScholarCross Ref
- Fernando H. Biase, Xiaoyi Cao, and Sheng Zhong. 2014. Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing. Genome Res 24, 11 (November 2014), 1787–1796. DOI:https://doi.org/10.1101/gr.177725.114Google ScholarCross Ref
- Geng Chen, Baitang Ning, and Tieliu Shi. 2019. Single-cell RNA-seq technologies and related computational data analysis. Front Genet 10, (April 2019), 317. DOI:https://doi.org/10.3389/fgene.2019.00317Google ScholarCross Ref
- J. J. Chen, S.-J. Wang, C.-A. Tsai, and C.-J. Lin. 2007. Selection of differentially expressed genes in microarray data analysis. Pharmacogenomics J 7, 3 (June 2007), 212–220. DOI:https://doi.org/10.1038/sj.tpj.6500412Google ScholarCross Ref
- Spyros Darmanis, Steven A. Sloan, Ye Zhang, Martin Enge, Christine Caneda, Lawrence M. Shuer, Melanie G. Hayden Gephart, Ben A. Barres, and Stephen R. Quake. 2015. A survey of human brain transcriptome diversity at the single cell level. P Natl Acad Sci Usa 112, 23 (June 2015), 7285–7290. DOI:https://doi.org/10.1073/pnas.1507125112Google ScholarCross Ref
- Samarendra Das, Anil Rai, and Shesh N. Rai. 2022. Differential expression analysis of single-cell RNA-seq data: current statistical approaches and outstanding challenges. Entropy (Basel, Switzerland) 24, 7 (July 2022). DOI:https://doi.org/10.3390/e24070995Google ScholarCross Ref
- Bianca Dumitrascu, Soledad Villar, Dustin G. Mixon, and Barbara E. Engelhardt. 2021. Optimal marker gene selection for cell type discrimination in single cell analyses. Nat Commun 12, 1 (February 2021), 1186. DOI:https://doi.org/10.1038/s41467-021-21453-4Google ScholarCross Ref
- Greg Finak, Andrew McDavid, Masanao Yajima, Jingyuan Deng, Vivian Gersuk, Alex K. Shalek, Chloe K. Slichter, Hannah W. Miller, M. Juliana McElrath, Martin Prlic, Peter S. Linsley, and Raphael Gottardo. 2015. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol 16, 1 (December 2015), 278. DOI:https://doi.org/10.1186/s13059-015-0844-5Google ScholarCross Ref
- Mubeen Goolam, Antonio Scialdone, Sarah J. L. Graham, Iain C. Macaulay, Agnieszka Jedrusik, Anna Hupalowska, Thierry Voet, John C. Marioni, and Magdalena Zernicka-Goetz. 2016. Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165, 1 (March 2016), 61–74. DOI:https://doi.org/10.1016/j.cell.2016.01.047Google ScholarCross Ref
- Ashraful Haque, Jessica Engel, Sarah A. Teichmann, and Tapio Lönnberg. 2017. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Medicine 9, 1 (August 2017), 75. DOI:https://doi.org/10.1186/s13073-017-0467-4Google ScholarCross Ref
- Stephanie C Hicks, F William Townes, Mingxiang Teng, and Rafael A Irizarry. 2018. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 4 (October 2018), 562–578. DOI:https://doi.org/10.1093/biostatistics/kxx053Google ScholarCross Ref
- Peter V Kharchenko, Lev Silberstein, and David T Scadden. 2014. Bayesian approach to single-cell differential expression analysis. Nat Methods 11, 7 (July 2014), 740–742. DOI:https://doi.org/10.1038/nmeth.2967Google ScholarCross Ref
- Vladimir Yu Kiselev, Kristina Kirschner, Michael T. Schaub, Tallulah Andrews, Andrew Yiu, Tamir Chandra, Kedar N. Natarajan, Wolf Reik, Mauricio Barahona, Anthony R. Green, and Martin Hemberg. 2017. SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 14, 5 (May 2017), 483–486. DOI:https://doi.org/10.1038/nmeth.4236Google ScholarCross Ref
- Vladimir Yu Kiselev, Andrew Yiu, and Martin Hemberg. 2018. scmap: projection of single-cell RNA-seq data across data sets. Nat Commun 15, 5 (May 2018), 359–362. DOI:https://doi.org/10.1038/nmeth.4644Google ScholarCross Ref
- Nathan Lawlor, Joshy George, Mohan Bolisetty, Romy Kursawe, Lili Sun, V. Sivakamasundari, Ina Kycia, Paul Robson, and Michael L. Stitzel. 2017. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res 27, 2 (February 2017), 208–222. DOI:https://doi.org/10.1101/gr.212720.116Google ScholarCross Ref
- Xinmin Li and Cun Yu Wang. 2021. From bulk, single-cell to spatial RNA sequencing. International Journal of Oral Science 13, 1 (November 2021), 1–6. DOI:https://doi.org/10.1038/s41368-021-00146-0Google ScholarCross Ref
- Michael I. Love, Wolfgang Huber, and Simon Anders. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 12 (December 2014), 550. DOI:https://doi.org/10.1186/s13059-014-0550-8Google ScholarCross Ref
- Liangqun Lu, Kevin A. Townsend, and Bernie J. Daigle. 2021. GEOlimma: differential expression analysis and feature selection using pre-existing microarray data. BMC Bioinformatics 22, 1 (December 2021), 44. DOI:https://doi.org/10.1186/s12859-020-03932-5Google ScholarCross Ref
- Zhun Miao, Ke Deng, Xiaowo Wang, and Xuegong Zhang. 2018. DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics 34, 18 (September 2018), 3223–3224. DOI:https://doi.org/10.1093/bioinformatics/bty332Google ScholarCross Ref
- Zhun Miao and Xuegong Zhang. 2016. Differential expression analyses for single-cell RNA-Seq: old questions on new data. Quantitative Biology 4, 4 (December 2016), 243–260. DOI:https://doi.org/10.1007/s40484-016-0089-7Google ScholarCross Ref
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, null (November 2011), 2825–2830.Google Scholar
- Peng Qiu. 2020. Embracing the dropouts in single-cell RNA-seq analysis. Nat Commun 11, 1 (March 2020), 1169. DOI:https://doi.org/10.1038/s41467-020-14976-9Google ScholarCross Ref
- Mark D. Robinson, Davis J. McCarthy, and Gordon K. Smyth. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 1 (January 2010), 139–140. DOI:https://doi.org/10.1093/bioinformatics/btp616Google ScholarDigital Library
- Oliver Stegle, Sarah A. Teichmann, and John C. Marioni. 2015. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 16, 3 (March 2015), 133–145. DOI:https://doi.org/10.1038/nrg3833Google ScholarCross Ref
- Virginia Goss Tusher, Robert Tibshirani, and Gilbert Chu. 2001. Significance analysis of microarrays applied to the ionizing radiation response. P Natl Acad Sci Usa 98, 9 (April 2001), 5116–5121. DOI:https://doi.org/10.1073/pnas.091062498Google ScholarCross Ref
- Liying Yan, Mingyu Yang, Hongshan Guo, Lu Yang, Jun Wu, Rong Li, Ping Liu, Ying Lian, Xiaoying Zheng, Jie Yan, Jin Huang, Ming Li, Xinglong Wu, Lu Wen, Kaiqin Lao, Ruiqiang Li, Jie Qiao, and Fuchou Tang. 2013. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 20, 9 (September 2013), 1131–1139. DOI:https://doi.org/10.1038/nsmb.2660Google ScholarCross Ref
Index Terms
- A Multiclass Method for Selecting Differentially-Expressed and Cell-Type-Discriminative Genes from scRNA-Seq Data
Recommendations
Selecting differentially expressed genes using minimum probability of classification error
Discovery of differentially expressed genes between normal and diseased patients is a central research problem in bioinformatics. It is specially important to find few genetic markers which can be explored for diagnostic purposes. The performance of a ...
Gene expression and protein---protein interaction data for identification of colon cancer related genes using f-information measures
One of the most important and challenging problems in functional genomics is how to select the disease genes. In this regard, the paper presents a new computational method to identify disease genes. It judiciously integrates the information of gene ...
On the identification of differentially expressed genes: Improving the generalized F-statistics for Affymetrix microarray gene expression data
It has been shown that the generalized F-statistics can give satisfactory performances in identifying differentially expressed genes with microarray data. However, for some complex diseases, it is still possible to identify a high proportion of false ...
Comments