Copyright © 2007 The American Society of Human Genetics. All rights reserved.
The American Journal of Human Genetics, Volume 81, Issue 5, 895-905, 1 November 2007
doi:10.1086/521372
Article
Gad Kimmela, c,
,
, Michael I. Jordana, b, Eran Halperinc, Ron Shamird and Richard M. Karpa, c
a Computer Science Division University of California Berkeley, and International Computer Science Institute
b Department of Statistics University of California Berkeley, and International Computer Science Institute
c University of California Berkeley, and International Computer Science Institute, Berkeley
d School of Computer Science, Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv
Address for correspondence and reprints: Dr. Gad Kimmel, Computer Science Division, University of California Berkeley, Berkeley, CA 94720Abstract
Population stratification can be a serious obstacle in the analysis of genomewide association studies. We propose a method for evaluating the significance of association scores in whole-genome cohorts with stratification. Our approach is a randomization test akin to a standard permutation test. It conditions on the genotype matrix and thus takes into account not only the population structure but also the complex linkage disequilibrium structure of the genome. As we show in simulation experiments, our method achieves higher power and significantly better control over false-positive rates than do existing methods. In addition, it can be easily applied to whole-genome association studies.
| Incorporating Genotyping Uncertainty in Haplotype Inference for Single-Nucleotide Polymorphisms The American Journal of Human Genetics, Volume 74, Issue 3, 1 March 2004, Pages 495-510 Hosung Kang, Zhaohui S. Qin, Tianhua Niu and Jun S. Liu Abstract The accuracy of the vast amount of genotypic information generated by high-throughput genotyping technologies is crucial in haplotype analyses and linkage-disequilibrium mapping for complex diseases. To date, most automated programs lack quality measures for the allele calls; therefore, human interventions, which are both labor intensive and error prone, have to be performed. Here, we propose a novel genotype clustering algorithm, GeneScore, based on a bivariate t-mixture model, which assigns a set of probabilities for each data point belonging to the candidate genotype clusters. Furthermore, we describe an expectation-maximization (EM) algorithm for haplotype phasing, GenoSpectrum (GS)-EM, which can use probabilistic multilocus genotype matrices (called “GenoSpectrum”) as inputs. Combining these two model-based algorithms, we can perform haplotype inference directly on raw readouts from a genotyping machine, such as the TaqMan assay. By using both simulated and real data sets, we demonstrate the advantages of our probabilistic approach over the current genotype scoring methods, in terms of both the accuracy of haplotype inference and the statistical power of haplotype-based association analyses. Abstract | | |
| So Many Correlated Tests, So Little Time! Rapid Adjustment of P Values for Multiple Correlated Tests The American Journal of Human Genetics, Volume 81, Issue 6, 1 December 2007, Pages 1158-1168 Karen N. Conneely and Michael Boehnke Abstract Contemporary genetic association studies may test hundreds of thousands of genetic variants for association, often with multiple binary and continuous traits or under more than one model of inheritance. Many of these association tests may be correlated with one another because of linkage disequilibrium between nearby markers and correlation between traits and models. Permutation tests and simulation-based methods are often employed to adjust groups of correlated tests for multiple testing, since conventional methods such as Bonferroni correction are overly conservative when tests are correlated. We present here a method of computing P values adjusted for correlated tests (PACT) that attains the accuracy of permutation or simulation-based tests in much less computation time, and we show that our method applies to many common association tests that are based on multiple traits, markers, and genetic models. Simulation demonstrates that PACT attains the power of permutation testing and provides a valid adjustment for hundreds of correlated association tests. In data analyzed as part of the Finland–United States Investigation of NIDDM Genetics (FUSION) study, we observe a near one-to-one relationship (r2>.999) between PACT and the corresponding permutation-based P values, achieving the same precision as permutation testing but thousands of times faster. Abstract | | |