Skip to main content

Advertisement

Log in

Genome-wide exploratory analysis for NARAC dataset with preparation for haplotype block partitioning through minor allele frequency quality control viewpoint

  • Research
  • Published:
Iran Journal of Computer Science Aims and scope Submit manuscript

Abstract

This article provides a detailed description, analysis, and visualization of a case–control genome-wide genotypic dataset from the North American Rheumatoid Arthritis Consortium (NARAC). The data is presented in terms of the number of females and males in both cases and controls, as well as the percentage of missing data. The number of alleles and genotypes is also counted, and the minor allele frequency (MAF) is calculated for each single nucleotide polymorphism (SNP). The data is further classified into four categories based on the SNP's MAF, namely, very rare, rare, low frequency, and common SNPs. The regions of these categories in the chromosome are investigated to determine the proportion of SNPs in coding locations and other regions. It is observed that each category has a different proportion in each region of consequence annotation. The data composition in terms of alleles and genotypes is found to be greatly disproportionate. The results present clear insights into the data and its MAF, which can be compared with other datasets. These findings can aid researchers in gaining a comprehensive understanding of such case–control datasets and bring accurate insights into the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Due to the subject confidentiality agreement, the data used during the current study are not publicly accessible but are available upon reasonable request from the first author.

References

  1. The international SNP map working group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409(6822), 928–933 (2001). https://doi.org/10.1038/35057149

  2. Genomes Project Consortium: A global reference for human genetic variation. Nature 526(7571), 68–74 (2015). https://doi.org/10.1038/nature15393

    Article  Google Scholar 

  3. Silman, A.J., Pearson, J.E.: Epidemiology and genetics of rheumatoid arthritis. Arthritis Res. 4(Suppl 3), S265–272 (2002). https://doi.org/10.1186/ar578

    Article  Google Scholar 

  4. Amos, C.I., et al.: Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data. BMC Proc (2009). https://doi.org/10.1186/1753-6561-3-s7-s2

    Article  Google Scholar 

  5. Cui, J., Taylor, K.E., Lee, Y.C., Ka, H.: The influence of polygenic risk scores on heritability of anti-CCP level in RA. Genes Immun. 15(2), 107–114 (2014). https://doi.org/10.1038/gene.2013.68

    Article  Google Scholar 

  6. Stahl, E.A., et al.: Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42(6), 504–514 (2010). https://doi.org/10.1038/ng.582

    Article  Google Scholar 

  7. Raychaudhuri, S., et al.: Common variants at CD40 and other loci confer risk of rheumatoid arthritis. Nat. Genet. 40(10), 1216–1223 (2008). https://doi.org/10.1038/ng.233

    Article  Google Scholar 

  8. Chen, R., Stahl, E.A., Kurreeman, F.A.S., Gregersen, P.K., Siminovitch, K.A., Worthington, J.: Fine mapping the TAGAP risk locus in rheumatoid arthritis. Genes Immun. (2011). https://doi.org/10.1038/gene.2011.8

    Article  Google Scholar 

  9. Raychaudhuri, S., et al.: Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis.Nat. Genet. 44(3), 291–296 (2012). https://doi.org/10.1038/ng.1076

    Article  Google Scholar 

  10. Ding, B., et al.: Different patterns of associations with anti-citrullinated protein antibody—Positive and anti-citrullinated protein antibody-negative rheumatoid arthritis in the extended major histocompatibility complex region. Arthritis Rheum. 60(1), 30–38 (2009). https://doi.org/10.1002/art.24135

    Article  Google Scholar 

  11. Lee, H.S., et al.: Several regions in the major histocompatibility complex confer risk for Anti-CCP-antibody positive rheumatoid arthritis, independent of the DRB1 locus. Mol. Med. 14, 293–300 (2008). https://doi.org/10.2119/2007-00123.Lee

    Article  Google Scholar 

  12. Manavalan, R., Priya, S.: Rheumatoid arthritis identification using epistasis analysis through computational models. Biomed. Biotechnol. Res. J. 4(1), 8–15 (2020). https://doi.org/10.4103/bbrj.bbrj_147_19

    Article  Google Scholar 

  13. Achour, Y., et al.: Analysis of two susceptibility SNPs in HLA region and evidence of interaction between rs6457617 in HLA-DQB1 and HLA-DRB1 * 04 locus on Tunisian rheumatoid arthritis. J. Genet. 96(6), 911–918 (2017). https://doi.org/10.1007/s12041-017-0855-y

    Article  Google Scholar 

  14. Siegel, R.J., Bridges, S.L., Ahmed, S.: HLA—C: An accomplice in rheumatic diseases. ACR open Rheumatol. 1(9), 571–579 (2019). https://doi.org/10.1002/acr2.11065

    Article  Google Scholar 

  15. The International HapMap Consortium: A haplotype map of the human genome. Nature 437, 1299–1320 (2005). https://doi.org/10.1038/nature04226

    Article  Google Scholar 

  16. Bycroft, C., et al.: The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). https://doi.org/10.1038/s41586-018-0579-z

  17. Karczewski, K.J., et al.: The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acid Res. 45(D1), D840–D845. Nucleic Acid Res. 45, 840–845 (2017). https://doi.org/10.1093/nar/gkw971

    Article  Google Scholar 

  18. The 1000 Genomes project consortium: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (2010). https://doi.org/10.1038/nature09534

    Article  Google Scholar 

  19. Meyer, P.W.A., et al.: HLA-DRB1 shared epitope genotyping using the revised classification and its association with circulating autoantibodies, acute phase reactants, cytokines and clinical indices of disease activity in a cohort of South African rheumatoid arthritis patients. Arthritis Res. Ther. 13(5), R160 (2011). https://doi.org/10.1186/ar3479

    Article  Google Scholar 

  20. Segal, D.J.: Beyond the genome and into the clinic. Genome Med. 4(10), 78 (2012). https://doi.org/10.1186/gm379

  21. Yoo, Y.J., Kim, S.A., Bull, S.B.: Clique-based clustering of correlated SNPs in a gene can improve performance of gene-based multi-bin linear combination test. BioMed Res. Int. 2015, 852341 (2015). https://doi.org/10.1155/2015/852341

    Article  Google Scholar 

  22. Mclaren, W., et al.: The ensembl variant effect predictor. Genome Biol. 17, 122 (2016). https://doi.org/10.1186/s13059-016-0974-4

Download references

Acknowledgements

The authors would like to acknowledge the Genetic Analysis Workshop Grant [R01 GM031575] for providing the NARAC dataset. This work was made possible by funds from the National Institutes of Health [NO1-AR-2-2263 and RO1-AR-44422] and the National Arthritis Foundation (NAF).

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: MNS, AMS and HFAH. Data curation: FSI and MNS. Formal analysis: FSI and MNS. Investigation: FSI and MNS. Methodology: MNS, AMS and HFAH. Resources: MNS Software: FSI and MNS. Supervision: MNS, AMS and HFAH. Validation: GWZ and MNS. Visualization: FSI. Writing and original draft: GWZ, FSI and MNS Writing, review, and editing: GWZ, MNS, AMS and HFAH. All authors reviewed the manuscript.

Corresponding author

Correspondence to Galena W. Zareef.

Ethics declarations

Conflict of interest

The authors declare that they have neither affiliations nor involvement in any organization or entity that has a financial stake in the subject matter or materials discussed in this manuscript.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saad, M.N., Zareef, G.W., Ibrahim, F.S. et al. Genome-wide exploratory analysis for NARAC dataset with preparation for haplotype block partitioning through minor allele frequency quality control viewpoint. Iran J Comput Sci 6, 387–396 (2023). https://doi.org/10.1007/s42044-023-00147-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42044-023-00147-8

Keywords

Navigation