Copyright © 2008 The American Society of Human Genetics. All rights reserved.
The American Journal of Human Genetics, Volume 82, Issue 2, 386-397, 8 February 2008
doi:10.1016/j.ajhg.2007.10.010
Article
Lydia Coulter Kwee1, Dawei Liu3, Xihong Lin4, Debashis Ghosh5 and Michael P. Epstein2,
, 
1 Department of Biostatistics, Emory University, Atlanta, GA 30322, USA
2 Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
3 Center for Statistical Sciences, Brown University, Providence, RI 02912, USA
4 Department of Biostatistics, Harvard University, Boston, MA 02115, USA
5 Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA
Corresponding authorAbstract
Association mapping of complex traits typically employs tagSNP genotype data to identify a trait locus within a region of interest. However, considerable debate exists regarding the most powerful strategy for utilizing such tagSNP data for inference. A popular approach tests each tagSNP within the region individually, but such tests could lose power as a result of incomplete linkage disequilibrium between the genotyped tagSNP and the trait locus. Alternatively, one can jointly test all tagSNPs simultaneously within the region (by using genotypes or haplotypes), but such multivariate tests have large degrees of freedom that can also compromise power. Here, we consider a semiparametric model for quantitative-trait mapping that uses genetic information from multiple tagSNPs simultaneously in analysis but produces a test statistic with reduced degrees of freedom compared to existing multivariate approaches. We fit this model by using a dimension-reducing technique called least-squares kernel machines, which we show is identical to analysis using a specific linear mixed model (which we can fit by using standard software packages like SAS and R). Using simulated SNP data based on real data from the International HapMap Project, we demonstrate that our approach often has superior performance for association mapping of quantitative traits compared to the popular approach of single-tagSNP testing. Our approach is also flexible, because it allows easy modeling of covariates and, if interest exists, high-dimensional interactions among tagSNPs and environmental predictors.
| Multipoint Approximations of Identity-by-Descent Probabilities for Accurate Linkage Analysis of Distantly Related Individuals The American Journal of Human Genetics, Volume 82, Issue 3, 3 March 2008, Pages 607-622 Cornelis A. Albers, Jim Stankovich, Russell Thomson, Melanie Bahlo and Hilbert J. Kappen Abstract We propose an analytical approximation method for the estimation of multipoint identity by descent (IBD) probabilities in pedigrees containing a moderate number of distantly related individuals. We show that in large pedigrees where cases are related through untyped ancestors only, it is possible to formulate the hidden Markov model of the Lander-Green algorithm in terms of the IBD configurations of the cases. We use a first-order Markov approximation to model the changes in this IBD-configuration variable along the chromosome. In simulated and real data sets, we demonstrate that estimates of parametric and nonparametric linkage statistics based on the first-order Markov approximation are accurate. The computation time is exponential in the number of cases instead of in the number of meioses separating the cases. We have implemented our approach in the computer program ALADIN (accurate linkage analysis of distantly related individuals). ALADIN can be applied to general pedigrees and marker types and has the ability to model marker-marker linkage disequilibrium with a clustered-markers approach. Using ALADIN is straightforward: It requires no parameters to be specified and accepts standard input files. Abstract | | |
| Estimating Odds Ratios in Genome Scans: An Approximate Conditional Likelihood Approach The American Journal of Human Genetics, Volume 82, Issue 5, 9 May 2008, Pages 1064-1074 Arpita Ghosh, Fei Zou and Fred A. Wright Abstract In modern whole-genome scans, the use of stringent thresholds to control the genome-wide testing error distorts the estimation process, producing estimated effect sizes that may be on average far greater in magnitude than the true effect sizes. We introduce a method, based on the estimate of genetic effect and its standard error as reported by standard statistical software, to correct for this bias in case-control association studies. Our approach is widely applicable, is far easier to implement than competing approaches, and may often be applied to published studies without access to the original data. We evaluate the performance of our approach via extensive simulations for a range of genetic models, minor allele frequencies, and genetic effect sizes. Compared to the naive estimation procedure, our approach reduces the bias and the mean squared error, especially for modest effect sizes. We also develop a principled method to construct confidence intervals for the genetic effect that acknowledges the conditioning on statistical significance. Our approach is described in the specific context of odds ratios and logistic modeling but is more widely applicable. Application to recently published data sets demonstrates the relevance of our approach to modern genome scans. Abstract | | |