skip to main content
10.1145/2649387.2649395acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

Mining massive SNP data for identifying associated SNPs and uncovering gene relationships

Authors Info & Claims
Published:20 September 2014Publication History

ABSTRACT

Studies on SNP correlations have been focused on SNPs located on the same chromosome since SNPs on different chromosomes are expected to segregate randomly. Previous studies suggest that SNPs can be associated with each other over long distances and even across different chromosomes. To facilitate the study of SNP associations, our goal is to find SNPs that coexist in a significant number of samples regardless of their genomic distance, and subsequently to study the relationships among these associated SNPs and corresponding genes. This problem of mining co-occurrent SNP associations is computationally challenging and motivates us to design an efficient data mining algorithm FCIRC to mine SNP associations from massive SNP data. By applying our method on the original SNP data and random chromosome permutation data, we demonstrate that our method is able to find non-random SNP associations across multiple chromosomes. Among the large amount of associated SNPs identified by our method, many of them involve multiple chromosomes. Some SNP associations also suggest novel relationships among the corresponding genes, and some may imply biological and disease mechanisms related to corresponding genes.

References

  1. David M Altshuler, Richard A Gibbs, Leena Peltonen, Emmanouil Dermitzakis, Stephen F Schaffner, Fuli Yu, Penelope E Bonnen, PI De Bakker, Panos Deloukas, Stacey B Gabriel, et al. Integrating common and rare genetic variation in diverse human populations. Nature, 467(7311):52--58, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  2. Hyo-Jeong Ban, Jee Yeon Heo, Kyung-Soo Oh, and Keun-Joon Park. Identification of type 2 diabetes-associated combination of snps using support vector machine. BMC genetics, 11(1):26, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  3. Christian Borgelt. Efficient implementations of apriori and eclat. In FIMI'3: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations, 2003.Google ScholarGoogle Scholar
  4. Douglas Burdick, Manuel Calimlim, Jason Flannick, Johannes Gehrke, and Tomi Yiu. Mafia: A maximal frequent itemset algorithm. IEEE Trans. Knowl. Data Eng., 17(11):1490--1504, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jing Chen, Eric E Bardes, Bruce J Aronow, and Anil G Jegga. Toppgene suite for gene list enrichment analysis and candidate gene prioritization. Nucleic acids research, 37(suppl 2):W305--W311, 2009.Google ScholarGoogle Scholar
  6. Heather J Cordell. Detecting gene--gene interactions that underlie human diseases. Nature Reviews Genetics, 10(6):392--404, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  7. Joshua C Denny, Marylyn D Ritchie, Melissa A Basford, Jill M Pulley, Lisa Bastarache, Kristin Brown-Gentry, Deede Wang, Dan R Masys, Dan M Roden, and Dana C Crawford. Phewas: demonstrating the feasibility of a phenome-wide scan to discover gene--disease associations. Bioinformatics, 26(9):1205--1210, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gang Feng, Pamela Shaw, Steven T Rosen, Simon M Lin, and Warren A Kibbe. Using the bioconductor geneanswers package to interpret gene lists. In Next Generation Microarray Bioinformatics, pages 101--112. Springer, 2012.Google ScholarGoogle Scholar
  9. Obi L Griffith, Stephen B Montgomery, Bridget Bernier, Bryan Chu, Katayoon Kasaian, Stein Aerts, Shaun Mahony, Monica C Sleumer, Mikhail Bilenky, Maximilian Haeussler, et al. Oreganno: an open-access community-driven resource for regulatory annotation. Nucleic acids research, 36(suppl 1):D107--D113, 2008.Google ScholarGoogle Scholar
  10. Jiawei Han, Micheline Kamber, and Jian Pei. Data mining: concepts and techniques. Morgan kaufmann, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. SJ Hebbring, SJ Schrodi, Z Ye, Z Zhou, D Page, and MH Brilliant. A phewas approach in studying hla-drb1* 1501. Genes and immunity, 2013.Google ScholarGoogle Scholar
  12. Lucia A Hindorff, Praveen Sethupathy, Heather A Junkins, Erin M Ramos, Jayashri P Mehta, Francis S Collins, and Teri A Manolio. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 106(23):9362--9367, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  13. Federico Innocenti, Gregory M Cooper, Ian B Stanaway, Eric R Gamazon, Joshua D Smith, Snezana Mirkov, Jacqueline Ramirez, Wanqing Liu, Yvonne S Lin, Cliona Moloney, et al. Identification, replication, and functional fine-mapping of expression quantitative trait loci in primary human liver tissue. PLoS genetics, 7(5):e1002078, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  14. Andrew D Johnson and Christopher J O'Donnell. An open access database of genome-wide association results. BMC medical genetics, 10(1):6, 2009.Google ScholarGoogle Scholar
  15. Evan Koch, Mickey Ristroph, and Mark Kirkpatrick. Long range linkage disequilibrium across the human genome. PloS one, 8(12):e80754, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  16. Ching Lee Koo, Mei Jing Liew, Mohd Saberi Mohamad, and Abdul Hakim Mohamed Salleh. A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. BioMed research international, 2013, 2013.Google ScholarGoogle Scholar
  17. Chunyu Liu, H Hoxie Ackerman, and John P Carulli. A genome-wide screen of gene--gene interactions for rheumatoid arthritis susceptibility. Human genetics, 129(5):473--485, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jianzhong Ma and Christopher I Amos. Investigation of inversion polymorphisms in the human genome using principal components analysis. PloS one, 7(7):e40224, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  19. Nila Patil, Anthony J Berno, David A Hinds, Wade A Barrett, Jigna M Doshi, Coleen R Hacker, Curtis R Kautzer, Danny H Lee, Claire Marjoribanks, David P McDonough, et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294(5547):1719--1723, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  20. R. Peeters. The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics, 131(3):651--654, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sarah A Pendergrass, Kristin Brown-Gentry, Scott Dudek, Alex Frase, Eric S Torstenson, Robert Goodloe, Jose Luis Ambite, Christy L Avery, et al. Phenome-wide association study (phewas) for detection of pleiotropy within the population architecture using genomics and epidemiology (page) network. PLoS genetics, 9(1):e1003087, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kai Peng, Wei Xu, Jianyong Zheng, Kegui Huang, Huisong Wang, Jiansong Tong, Zhifeng Lin, Jun Liu, Wenqing Cheng, Dong Fu, et al. The disease and gene annotations (dga): an annotation resource for human disease. Nucleic acids research, 41(D1):D553--D560, 2013.Google ScholarGoogle Scholar
  23. David E Reich, Michele Cargill, Stacey Bolk, James Ireland, Pardis C Sabeti, Daniel J Richter, Thomas Lavery, Rose Kouyoumjian, Shelli F Farhadian, Ryk Ward, et al. Linkage disequilibrium in the human genome. Nature, 411(6834):199--204, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  24. Marylyn D Ritchie, Bill C White, Joel S Parker, Lance W Hahn, and Jason H Moore. Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases. BMC bioinformatics, 4(1):28, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  25. Rori V Rohlfs, Willie J Swanson, and Bruce S Weir. Detecting coevolution through allelic association between physically unlinked loci. The American Journal of Human Genetics, 86(5):674--685, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  26. Pardis C Sabeti, Patrick Varilly, Ben Fry, Jason Lohmueller, Elizabeth Hostetter, Chris Cotsapas, Xiaohui Xie, Elizabeth H Byrne, Steven A McCarroll, Rachelle Gaudet, et al. Genome-wide detection and characterization of positive selection in human populations. Nature, 449(7164):913--918, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ravi Sachidanandam, David Weissman, Steven C Schmidt, Jerzy M Kakol, Lincoln D Stein, Gabor Marth, Steve Sherry, James C Mullikin, Beverley J Mortimore, David L Willey, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409(6822):928--933, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  28. Yasuyuki Tomita, Shuta Tomida, Yuko Hasegawa, Yoichi Suzuki, Taro Shirakawa, Takeshi Kobayashi, and Hiroyuki Honda. Artificial neural network approach for selection of susceptible single nucleotide polymorphisms and construction of prediction model on childhood allergic asthma. BMC bioinformatics, 5(1):120, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  29. Axel Visel, Simon Minovitsky, Inna Dubchak, and Len A Pennacchio. Vista enhancer browserala database of tissue-specific human enhancers. Nucleic acids research, 35(suppl 1):D88--D92, 2007.Google ScholarGoogle Scholar
  30. Jilles Vreeken, Matthijs Van Leeuwen, and Arno Siebes. Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery, 23(1):169--214, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xiang Wan, Can Yang, Qiang Yang, Hong Xue, Xiaodan Fan, Nelson LS Tang, and Weichuan Yu. Boost: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. The American Journal of Human Genetics, 87(3):325--340, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  32. Yang Xiang. Simple linear algorithms for mining graph cores. arXiv preprint arXiv:1401.1771, 2014.Google ScholarGoogle Scholar
  33. Yang Xiang, Ruoming Jin, David Fuhry, and Feodor F. Dragan. Summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Discov., 23(2):215--251, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yang Xiang, Philip R. O. Payne, and Kun Huang. Transactional database transformation and its application in prioritizing human disease genes. IEEE/ACM Trans. Comput. Biology Bioinform., 9(1):294--304, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kim E Zerba, Robert E Ferrell, and Charles F Sing. Genetic structure of five susceptibility gene regions for coronary artery disease: disequilibria within and among regions. Human genetics, 103(3):346--354, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  36. Chun Zhang, Dione K Bailey, Tarif Awad, Guoying Liu, Guoliang Xing, Manqiu Cao, Venu Valmeekam, Jacques Retief, Hajime Matsuzaki, Margaret Taub, et al. A whole genome long-range haplotype (wglrh) test for detecting imprints of positive selection in human populations. Bioinformatics, 22(17):2122--2128, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining massive SNP data for identifying associated SNPs and uncovering gene relationships

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
            September 2014
            851 pages
            ISBN:9781450328944
            DOI:10.1145/2649387
            • General Chairs:
            • Pierre Baldi,
            • Wei Wang

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 September 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper

            Acceptance Rates

            Overall Acceptance Rate254of885submissions,29%
          • Article Metrics

            • Downloads (Last 12 months)5
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader