Statistical estimation of diagnosis with genetic markers based on decision tree analysis of complex disease

https://doi.org/10.1016/j.compbiomed.2009.07.015Get rights and content

Abstract

To explore combinations of genetic markers and to estimate their joint action, decision trees are built on the basis of marker frequencies in both disease and control groups. Youden's index (0.1–0.9 for a single marker) is calculated for genetic markers with different diagnostic capacities. When 23 single genetic markers with diagnostic power 0.10 are combined, the resulting diagnostic power is 0.5. Medium diagnostic power (Youden's index 0.7) can be obtained by combining four low effect diagnostic items. High diagnostic power (Youden's index 0.9) can be obtained by combining either eight low power items or four medium power ones. This implies that selection of about 100 genetic markers, differing in capacity to distinguish between the disease and control groups by (say) 10%, will meet the requirement for clinic diagnosis. Thus, diagnosis of complex diseases by genetic markers is possible through the discovery and characterization of markers throughout the human genome and the development of genotyping technology.

Introduction

Analysis of genetic linkage has been highly successful in mapping the genes responsible for Mendelian diseases. Many disorders with an underlying single-gene mode of inheritance have been identified by positional cloning. In the past decade, attempts have been made to extend this approach to multifactorial disorders and other health-related traits. It has proved difficult, however, to find strong and replicable linkages since a significant feature of a complex disease is the modest contribution of each susceptibility gene to its onset [1], [2]. Therefore, diagnosis of complex diseases using single genes is limited in efficacy.

Recent progress in genotyping techniques enables us to use short tandem repeats (STR) or single nucleotide polymorphisms (SNP) as allelic markers of complex diseases [3], [4]. It is possible by combining several genetic markers to improve the diagnostic power to meet the requirements for clinical diagnosis. Consequently, it is necessary to analyze the frequency indices and to evaluate the joint action of multiple genetic markers.

Multivariate logistic regression is the first choice for multivariate analysis [5]. However, it requires original data and thus is not suitable for theoretical illustration. In this paper we have chosen the decision tree analytical method [6], [7], which solves these problems satisfactorily and lays a theoretical foundation for research and the clinical application of genetic markers.

Section snippets

Setting of decision tree

We assume three genetic markers with those expected under the hypothesis of panmixia (Hardy–Weinberg equation), X1, X2 and X3, the frequencies of which are PX1, PX2 and PX3, respectively, and establish the decision tree shown in Fig. 1.

Calculations are performed using the VB Program as shown in Fig. 2 and VB Program Code is listed in Appendix A.

We can suppose three observation indices X1, X2 and X3, whose positive rates are Ptx1, Ptx2 and Ptx3, respectively, in a test group and Pcx1, Pcx2 and

Results

The analysis indicates that diagnostic power increases to 0.5 when 23 single genetic markers with diagnostic power 0.10 are combined. Medium diagnostic power (Youden's index 0.7) can be obtained by combining four low-power diagnostic items. High diagnostic power (Youden's index 0.9) can be obtained by combining either eight low-power items or four medium-effect ones. Details are listed in Table 1.

Discussion

During the past 10 years, microsatellites have been the most widely used markers in linkage and association studies because of their high degree of heterozygosity. More recently, SNP have become more popular in association studies. In the course of the Human Genome Project, more than 1.68 million SNPs have been identified. Mapping the genetic basis underlying common multifactorial diseases such as cancer through whole genome association studies has attracted much attention in recent years [9],

Conflict of interest statement

I, the undersigned author, certify that I have no commercial associations that pose a conflict of interest in connection with the submitted article. I declare that the above statement is true on behalf of all the authors related to this study.

Liu Hui(1960), male, born in Li, Hebei province, Han nationality, China, graduated from Dalian Medical College, MS. He is a Professor in the College of Medical Laboratory, Dalian Medical University, Dalian, China. His Research direction is molecular foundation on complex disease and bioinformation.

References (10)

  • Y. Suh et al.

    SNP discovery in associating genetic variation with human disease phenotypes

    Mutat. Res.

    (2005)
  • P. Sham

    Shifting paradigms in gene-mapping methodology for complex traits

    Pharmacogenomics

    (2001)
  • M.B. Bracken

    Genomic epidemiology of complex disease: the need for an electronic evidence-based approach to research synthesis

    Am. J. Epidemiol.

    (2005)
  • J. Brohede et al.

    PPC: an algorithm for accurate estimation of SNP allele frequencies in small equimolar pools of DNA using data from high density microarrays

    Nucleic Acids Res.

    (2005)
  • T.H. Nguyen et al.

    Frequency finder: a multi-source web application for collection of public allele frequencies of SNP markers

    Bioinformatics

    (2004)
There are more references available in the full text version of this article.

Cited by (0)

Liu Hui(1960), male, born in Li, Hebei province, Han nationality, China, graduated from Dalian Medical College, MS. He is a Professor in the College of Medical Laboratory, Dalian Medical University, Dalian, China. His Research direction is molecular foundation on complex disease and bioinformation.

View full text