Copyright © 2007 The American Society of Human Genetics. All rights reserved.
The American Journal of Human Genetics, Volume 80, Issue 5, 921-930, 1 May 2007
doi:10.1086/516842
Article
Michael P. Epstein*, a,
,
, Andrew S. Allenc, * and Glen A. Satten
a From the Department of Human Genetics, Emory University Atlanta, GA
b Centers for Disease Control and Prevention Atlanta, GA
c Department of Biostatistics and Bioinformatics and Duke Clinical Research Institute, Duke University, Durham, NC (A.S.A.)
Address for correspondence and reprints: Dr. Michael P. Epstein, Department of Human Genetics, Emory University School of Medicine, 615 Michael Street, Suite 301, Atlanta, GA 30322.Abstract
Population stratification remains an important issue in case-control studies of disease-marker association, even within populations considered to be genetically homogeneous. Campbell et al. (Nature Genetics 2005;37:868–872) illustrated this by showing that stratification induced a spurious association between the lactase gene (LCT) and tall/short status in a European American sample. Furthermore, existing approaches for controlling stratification by use of substructure-informative loci (e.g., genomic control, structured association, and principal components) could not resolve this confounding. To address this problem, we propose a simple two-step procedure. In the first step, we model the odds of disease, given data on substructure-informative loci (excluding the test locus). For each participant, we use this model to calculate a stratification score, which is that participant’s estimated odds of disease calculated using his or her substructure-informative–loci data in the disease-odds model. In the second step, we assign subjects to strata defined by stratification score and then test for association between the disease and the test locus within these strata. The resulting association test is valid even in the presence of population stratification. Our approach is computationally simple and less model dependent than are existing approaches for controlling stratification. To illustrate these properties, we apply our approach to the data from Campbell et al. and find no association between the LCT locus and tall/short status. Using simulated data, we show that our approach yields a more appropriate correction for stratification than does principal components or genomic control.
| Power of Linkage versus Association Analysis of Quantitative Traits, by Use of Variance-Components Models, for Sibship Data The American Journal of Human Genetics, Volume 66, Issue 5, 1 May 2000, Pages 1616-1630 P.C. Sham, S.S. Cherny, S. Purcell and J.K. Hewitt Abstract Optimal design of quantitative-trait loci (QTL) mapping studies requires a precise understanding of the power of QTL linkage versus QTL association analysis, under a range of different conditions. In this article, we investigate the power of QTL linkage and association analyses for simple random sibship samples, under the variance-components model proposed by Fulker et al. After a brief description of an extension of this variance-components model, we show that the powers of both linkage and association analyses are crucially dependent on the proportion of phenotypic variance attributable to the QTL. The main difference between the two tests is that, whereas the power of association is directly related to the QTL heritability, the power of linkage is related more closely to the square of the QTL heritability. We also describe both how the power of linkage is attenuated by incomplete linkage and incomplete marker information and how the power of association is attenuated by incomplete linkage disequilibrium. Abstract | | |
| Genetic Association Analysis Using Data from Triads and Unrelated Subjects The American Journal of Human Genetics, Volume 76, Issue 4, 1 April 2005, Pages 592-608 Michael P. Epstein, Colin D. Veal, Richard C. Trembath, Jonathan N.W.N. Barker, Chun Li and Glen A. Satten Abstract The selection of an appropriate control sample for use in association mapping requires serious deliberation. Unrelated controls are generally easy to collect, but the resulting analyses are susceptible to spurious association arising from population stratification. Parental controls are popular, since triads comprising a case and two parents can be used in analyses that are robust to this stratification. However, parental controls are often expensive and difficult to collect. In some situations, studies may have both parental and unrelated controls available for analysis. For example, a candidate-gene study may analyze triads but may have an additional sample of unrelated controls for examination of background linkage disequilibrium in genomic regions. Also, studies may collect a sample of triads to confirm results initially found using a traditional case-control study. Initial association studies also may collect each type of control, to provide insurance against the weaknesses of the other type. In these situations, resulting samples will consist of some triads, some unrelated controls, and, possibly, some unrelated cases. Rather than analyze the triads and unrelated subjects separately, we present a likelihood-based approach for combining their information in a single combined association analysis. Our approach allows for joint analysis of data from both triad and case-control study designs. Simulations indicate that our proposed approach is more powerful than association tests that are based on each separate sample. Our approach also allows for flexible modeling and estimation of allele effects, as well as for missing parental data. We illustrate the usefulness of our approach using SNP data from a candidate-gene study of psoriasis. Abstract | | |
| Multipoint Linkage Analysis of the Pseudoautosomal Regions, Using Affected Sibling Pairs The American Journal of Human Genetics, Volume 67, Issue 2, 1 August 2000, Pages 462-475 Josée Dupuis and Paul Van Eerdewegh Abstract Affected sibling pairs are often the design of choice in linkage-analysis studies with the goal of identifying the genes that increase susceptibility to complex diseases. Methods for multipoint analysis based on sibling amount of sharing that is identical by descent are widely available, for both autosomal and X-linked markers. Such methods have the advantage of making few assumptions about the mode of inheritance of the disease. However, with this approach, data from the pseudoautosomal regions on the X chromosome pose special challenges. Same-sex sibling pairs will share, in that region of the genome, more genetic material identical by descent, with and without the presence of a disease-susceptibility gene. This increased sharing will be more pronounced for markers closely linked to the sex-specific region. For the same reason, opposite-sex sibling pairs will share fewer alleles identical by descent. Failure to take this inequality in sharing into account may result in a false declaration of linkage if the study sample contains an excess of sex-concordant pairs, or a linkage may be missed when an excess of sex-discordant pairs is present. We propose a method to take into account this expected increase/decrease in sharing when markers in the pseudoautosomal region are analyzed. For quantitative traits, we demonstrate, using the Haseman-Elston method, (1) the same inflation in type I error, in the absence of an appropriate correction, and (2) the inadequacy of permutation tests to estimate levels of significance when all phenotypic values are permuted, irrespective of gender. The proposed method is illustrated with a genome screen on 350 sibling pairs affected with type I diabetes. Abstract | | |