skip to main content
10.1145/3638569.3638571acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccbbConference Proceedingsconference-collections
research-article

A Multiclass Method for Selecting Differentially-Expressed and Cell-Type-Discriminative Genes from scRNA-Seq Data

Published:07 March 2024Publication History

ABSTRACT

Log fold change (LFC) is a common measure used in differential expression analysis to examine the differences in gene expression between two experimental classes, as in the data generated by microarray or bulk RNA sequencing. Many single-cell RNA-seq (scRNA-seq) data are labelled with three or more classes in terms of cell types, cell states, or cell stages. Several differential expression methods have been introduced to select differentially expressed genes (DEGs) among different classes in scRNA-seq data while accounting for the technical and biological variations. However, these methods are only applicable to perform comparisons between two classes. Methods to select DEGs with multiclass comparisons have also been introduced in the literature, but different measures are used instead of LFC. Thus, this study aims to impose the impactful LFC measure as a multiclass DEGs selection method. The majority voting concept is incorporated to aggregate the DEGs from every pairwise class comparison. Cell type classification using the selected genes has been performed to evaluate and validate the genes selected by the multiclass LFC method. The results show that the proposed method is capable of reducing the number of genes to as low as 26.05% of the initial scRNA-seq data. Moreover, the selected genes can classify cells into their respective cell types more accurately (an accuracy of 0.9425) as compared to the existing scRNA-seq gene selection method (an accuracy of 0.9137).

References

  1. Tallulah S Andrews and Martin Hemberg. 2019. M3Drop: dropout-based feature selection for scRNASeq. Bioinformatics 35, 16 (August 2019), 2865–2867. DOI:https://doi.org/10.1093/bioinformatics/bty1044Google ScholarGoogle ScholarCross RefCross Ref
  2. Jonathan Bard, Seung Y. Rhee, and Michael Ashburner. 2005. An ontology for cell types. Genome Biol 6, 2 (January 2005), R21. DOI:https://doi.org/10.1186/gb-2005-6-2-r21Google ScholarGoogle ScholarCross RefCross Ref
  3. Fernando H. Biase, Xiaoyi Cao, and Sheng Zhong. 2014. Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing. Genome Res 24, 11 (November 2014), 1787–1796. DOI:https://doi.org/10.1101/gr.177725.114Google ScholarGoogle ScholarCross RefCross Ref
  4. Geng Chen, Baitang Ning, and Tieliu Shi. 2019. Single-cell RNA-seq technologies and related computational data analysis. Front Genet 10, (April 2019), 317. DOI:https://doi.org/10.3389/fgene.2019.00317Google ScholarGoogle ScholarCross RefCross Ref
  5. J. J. Chen, S.-J. Wang, C.-A. Tsai, and C.-J. Lin. 2007. Selection of differentially expressed genes in microarray data analysis. Pharmacogenomics J 7, 3 (June 2007), 212–220. DOI:https://doi.org/10.1038/sj.tpj.6500412Google ScholarGoogle ScholarCross RefCross Ref
  6. Spyros Darmanis, Steven A. Sloan, Ye Zhang, Martin Enge, Christine Caneda, Lawrence M. Shuer, Melanie G. Hayden Gephart, Ben A. Barres, and Stephen R. Quake. 2015. A survey of human brain transcriptome diversity at the single cell level. P Natl Acad Sci Usa 112, 23 (June 2015), 7285–7290. DOI:https://doi.org/10.1073/pnas.1507125112Google ScholarGoogle ScholarCross RefCross Ref
  7. Samarendra Das, Anil Rai, and Shesh N. Rai. 2022. Differential expression analysis of single-cell RNA-seq data: current statistical approaches and outstanding challenges. Entropy (Basel, Switzerland) 24, 7 (July 2022). DOI:https://doi.org/10.3390/e24070995Google ScholarGoogle ScholarCross RefCross Ref
  8. Bianca Dumitrascu, Soledad Villar, Dustin G. Mixon, and Barbara E. Engelhardt. 2021. Optimal marker gene selection for cell type discrimination in single cell analyses. Nat Commun 12, 1 (February 2021), 1186. DOI:https://doi.org/10.1038/s41467-021-21453-4Google ScholarGoogle ScholarCross RefCross Ref
  9. Greg Finak, Andrew McDavid, Masanao Yajima, Jingyuan Deng, Vivian Gersuk, Alex K. Shalek, Chloe K. Slichter, Hannah W. Miller, M. Juliana McElrath, Martin Prlic, Peter S. Linsley, and Raphael Gottardo. 2015. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol 16, 1 (December 2015), 278. DOI:https://doi.org/10.1186/s13059-015-0844-5Google ScholarGoogle ScholarCross RefCross Ref
  10. Mubeen Goolam, Antonio Scialdone, Sarah J. L. Graham, Iain C. Macaulay, Agnieszka Jedrusik, Anna Hupalowska, Thierry Voet, John C. Marioni, and Magdalena Zernicka-Goetz. 2016. Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165, 1 (March 2016), 61–74. DOI:https://doi.org/10.1016/j.cell.2016.01.047Google ScholarGoogle ScholarCross RefCross Ref
  11. Ashraful Haque, Jessica Engel, Sarah A. Teichmann, and Tapio Lönnberg. 2017. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Medicine 9, 1 (August 2017), 75. DOI:https://doi.org/10.1186/s13073-017-0467-4Google ScholarGoogle ScholarCross RefCross Ref
  12. Stephanie C Hicks, F William Townes, Mingxiang Teng, and Rafael A Irizarry. 2018. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 4 (October 2018), 562–578. DOI:https://doi.org/10.1093/biostatistics/kxx053Google ScholarGoogle ScholarCross RefCross Ref
  13. Peter V Kharchenko, Lev Silberstein, and David T Scadden. 2014. Bayesian approach to single-cell differential expression analysis. Nat Methods 11, 7 (July 2014), 740–742. DOI:https://doi.org/10.1038/nmeth.2967Google ScholarGoogle ScholarCross RefCross Ref
  14. Vladimir Yu Kiselev, Kristina Kirschner, Michael T. Schaub, Tallulah Andrews, Andrew Yiu, Tamir Chandra, Kedar N. Natarajan, Wolf Reik, Mauricio Barahona, Anthony R. Green, and Martin Hemberg. 2017. SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 14, 5 (May 2017), 483–486. DOI:https://doi.org/10.1038/nmeth.4236Google ScholarGoogle ScholarCross RefCross Ref
  15. Vladimir Yu Kiselev, Andrew Yiu, and Martin Hemberg. 2018. scmap: projection of single-cell RNA-seq data across data sets. Nat Commun 15, 5 (May 2018), 359–362. DOI:https://doi.org/10.1038/nmeth.4644Google ScholarGoogle ScholarCross RefCross Ref
  16. Nathan Lawlor, Joshy George, Mohan Bolisetty, Romy Kursawe, Lili Sun, V. Sivakamasundari, Ina Kycia, Paul Robson, and Michael L. Stitzel. 2017. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res 27, 2 (February 2017), 208–222. DOI:https://doi.org/10.1101/gr.212720.116Google ScholarGoogle ScholarCross RefCross Ref
  17. Xinmin Li and Cun Yu Wang. 2021. From bulk, single-cell to spatial RNA sequencing. International Journal of Oral Science 13, 1 (November 2021), 1–6. DOI:https://doi.org/10.1038/s41368-021-00146-0Google ScholarGoogle ScholarCross RefCross Ref
  18. Michael I. Love, Wolfgang Huber, and Simon Anders. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 12 (December 2014), 550. DOI:https://doi.org/10.1186/s13059-014-0550-8Google ScholarGoogle ScholarCross RefCross Ref
  19. Liangqun Lu, Kevin A. Townsend, and Bernie J. Daigle. 2021. GEOlimma: differential expression analysis and feature selection using pre-existing microarray data. BMC Bioinformatics 22, 1 (December 2021), 44. DOI:https://doi.org/10.1186/s12859-020-03932-5Google ScholarGoogle ScholarCross RefCross Ref
  20. Zhun Miao, Ke Deng, Xiaowo Wang, and Xuegong Zhang. 2018. DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics 34, 18 (September 2018), 3223–3224. DOI:https://doi.org/10.1093/bioinformatics/bty332Google ScholarGoogle ScholarCross RefCross Ref
  21. Zhun Miao and Xuegong Zhang. 2016. Differential expression analyses for single-cell RNA-Seq: old questions on new data. Quantitative Biology 4, 4 (December 2016), 243–260. DOI:https://doi.org/10.1007/s40484-016-0089-7Google ScholarGoogle ScholarCross RefCross Ref
  22. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, null (November 2011), 2825–2830.Google ScholarGoogle Scholar
  23. Peng Qiu. 2020. Embracing the dropouts in single-cell RNA-seq analysis. Nat Commun 11, 1 (March 2020), 1169. DOI:https://doi.org/10.1038/s41467-020-14976-9Google ScholarGoogle ScholarCross RefCross Ref
  24. Mark D. Robinson, Davis J. McCarthy, and Gordon K. Smyth. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 1 (January 2010), 139–140. DOI:https://doi.org/10.1093/bioinformatics/btp616Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Oliver Stegle, Sarah A. Teichmann, and John C. Marioni. 2015. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 16, 3 (March 2015), 133–145. DOI:https://doi.org/10.1038/nrg3833Google ScholarGoogle ScholarCross RefCross Ref
  26. Virginia Goss Tusher, Robert Tibshirani, and Gilbert Chu. 2001. Significance analysis of microarrays applied to the ionizing radiation response. P Natl Acad Sci Usa 98, 9 (April 2001), 5116–5121. DOI:https://doi.org/10.1073/pnas.091062498Google ScholarGoogle ScholarCross RefCross Ref
  27. Liying Yan, Mingyu Yang, Hongshan Guo, Lu Yang, Jun Wu, Rong Li, Ping Liu, Ying Lian, Xiaoying Zheng, Jie Yan, Jin Huang, Ming Li, Xinglong Wu, Lu Wen, Kaiqin Lao, Ruiqiang Li, Jie Qiao, and Fuchou Tang. 2013. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 20, 9 (September 2013), 1131–1139. DOI:https://doi.org/10.1038/nsmb.2660Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Multiclass Method for Selecting Differentially-Expressed and Cell-Type-Discriminative Genes from scRNA-Seq Data
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              ICCBB '23: Proceedings of the 2023 7th International Conference on Computational Biology and Bioinformatics
              December 2023
              108 pages
              ISBN:9798400716331
              DOI:10.1145/3638569

              Copyright © 2023 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 7 March 2024

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited
            • Article Metrics

              • Downloads (Last 12 months)9
              • Downloads (Last 6 weeks)2

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format