Skip to main content

Perfect Population Classification on Hapmap Data with a Small Number of SNPs

  • Conference paper
Neural Information Processing (ICONIP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4985))

Included in the following conference series:

Abstract

The single nucleotide polymorphisms (SNPs) are believed to determine human differences and, to some degree, provide biomedical researchers a possibility of predicting risks of some diseases and explaining patients’ different responses to drug regimens. With the availability of millions of SNPs in the Hapmap Project, although large amount of information about SNPs is available, the tremendous size also causes a major challenge for research on SNPs. Inspired from the recent research work on population classification by Park et al (2006), we attempt to find as few SNPs as possible from the original nearly 4 millions SNPs to classify the 3 populations in the Hapmap genotype data. In this paper, we propose to first use a modified t-test measure to rank SNPs, and then combine the ranking result with a classifier, e.g., the support vector machine, to find the optimal SNP subset. Compared with Park et al’s result, our proposed method is more efficient in ranking features and classifying the three populations, i.e., we obtained perfect classification using only 11 SNPs in comparison with 82 SNPs used by Park et al.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bafna, V., Halldorsson, B., Schwartz, R., Clark, A., Istrail, S.: Haplotypes and Informative SNP selection: Don’t block out information. In: Proc. of RECOMB, pp. 19–27 (2003)

    Google Scholar 

  2. Celedon, J.C.: Candidate genes, SNPs, Haplotypes and linkage disequilibrium. Powerpoint presentation (2004), http://innateimmunity.net/files/CANDGENES/siframes.html

  3. Devore, J., Peck, R.: Statistics:the exploration and analysis of data, 3rd edn. Duxbury Press, Pacific Grove (1997)

    Google Scholar 

  4. Duerinck, K.F.: (2001), http://www.duerinck.com/snp.html

  5. Francois, R., Langrognet, F.: Double Cross Validation for Model Based Classification, User (2006), http://www.r-project.org/user-2006/Abstracts/Francois+Langrognet.pdf

  6. Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    Article  MATH  Google Scholar 

  7. Halldrsson, B., Bafna, V., Lippert, R., Schwartz, R., de la Vega, F., Clark, A., Istrail, S.: Optimal haplotype blockfree selection of tagging snps for genome-wide association studies. Genome research 14, 1633–1640 (2004)

    Article  Google Scholar 

  8. Halperin, E., Kimmel, G., Shamir, R.: Tag SNP selection in genotype data for maximizig SNP prediction accuracy. Bioinformatics 199, 195–203 (2005)

    Article  Google Scholar 

  9. Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei (2003)

    Google Scholar 

  10. Human genome project information (2006), http://www.ornl.gov/sci/techresources/Human_Genome/faq/snps.html

  11. Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved Gene Selection For Classification Of Microarrays. Pac. Symp. Biocomput., 53–64 (2003)

    Google Scholar 

  12. Keerthi, S.S., Lin, C.-J.: Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Computation 15, 1667–1689 (2003)

    Article  MATH  Google Scholar 

  13. Levner, I.: Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics 6, 68 (2005)

    Article  Google Scholar 

  14. Liu, B., Wan, C.R., Wang, L.P.: An efficient semi-unsupervised gene selection method via spectral biclustering. IEEE Trans. on Nano-Bioscience 5, 110–114 (2006)

    Google Scholar 

  15. Mitra, Pabitra, Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE trans. on Pattern analysis and machine intelligence 3, 301–312 (2002)

    Article  Google Scholar 

  16. Park, J.S., Hwang, S.H., Lee, Y.S., Kim, S.C.: SNP@Ethnos: a database of ethnically variant single-nucleotide polymorphisms. Nucleic Acids Research 0, D1–D5 (2006)

    Google Scholar 

  17. Phuong, T.M., Lin, Z., Altman, R.B.: Choosing SNPs using Feature Selection. In: Proc IEEE Comput Syst Bioinform Conf. 2005 (CSB 2005), pp. 301–309 (2005)

    Google Scholar 

  18. Pritchard, J.K., Przeworski, M.: Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001)

    Article  Google Scholar 

  19. Rosenberg, N.A., et al.: Informativeness of genetic markers for inference of ancestry. Am. J. Hum. Genet. 73, 1402–1422 (2003)

    Article  Google Scholar 

  20. Rosenberg, N.A.: Algorithms for selecting informative marker panels for population assignment. Journal of computational biology 9, 1183–1201 (2005)

    Article  Google Scholar 

  21. Su, Y., Murali, T.M., Pavlovic, V., Schaffer, M., Kasif, S.: RankGene: Identifcation of Diagnostic Genes Based on Expression Data. Bioinformatics 19, 1578–1579 (2003)

    Article  Google Scholar 

  22. The International HapMap Consortium: The international Hapmap Project. Nature 426, 789–796 (2003), www.hapmap.org/genotypes

    Google Scholar 

  23. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. PNAS 99, 6567–6572 (2002)

    Article  Google Scholar 

  24. Trochim, W.M.: The Research Methods Knowledge Base, 2nd edn. Atomic Dog Publishing (2004), http://www.socialresearchmethods.net/kb/

  25. Vapnik, V.: Statistical learning theory. Wiley, NewYork (1998)

    MATH  Google Scholar 

  26. Wang, L.P.: Support Vector Machines: Theory and Applications. Springer, Heidelberg (2005)

    MATH  Google Scholar 

  27. Wang, L.P., Chu, F., Xie, W.: Accurate cancer classification using expressions of very few genes. IEEE Transactions on Bioinformatics and Computational Biology 4, 40–53 (2007)

    Article  Google Scholar 

  28. Wang, L.P., Fu, X.J.: Data Mining with Computational Intelligence. Springer, Berlin (2005)

    MATH  Google Scholar 

  29. Welch, B.L.: The generalizaition of student’s problem when several different population are involved. Biomethika 34, 28–35 (1947)

    MATH  MathSciNet  Google Scholar 

  30. Wright, S.: The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 19, 395–420 (1965)

    Article  Google Scholar 

  31. Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., Ward, D., Williams, K., Zhao, H.: Comparison of statistical methods for classifcation of ovarian cancer using mass spectrometry data. BioInformatics 19, 1636–1643 (2003)

    Article  Google Scholar 

  32. Zhen, L., Altman, R.B.: Finding Haplotype Tagging SNPs by Use of Principle Components Analysis. Am. J. Hum. Genet. 75, 850–861 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Masumi Ishikawa Kenji Doya Hiroyuki Miyamoto Takeshi Yamakawa

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, N., Wang, L. (2008). Perfect Population Classification on Hapmap Data with a Small Number of SNPs. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds) Neural Information Processing. ICONIP 2007. Lecture Notes in Computer Science, vol 4985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69162-4_82

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69162-4_82

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69159-4

  • Online ISBN: 978-3-540-69162-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics