Copyright © 2008 The American Society of Human Genetics. All rights reserved.
The American Journal of Human Genetics, Volume 82, Issue 6, 1316-1333, 6 June 2008
doi:10.1016/j.ajhg.2008.05.008
Article
Lude Franke1, 3, Carolien G.F. de Kovel1, Yurii S. Aulchenko2, Gosia Trynka3, Alexandra Zhernakova1, Karen A. Hunt4, Hylke M. Blauw5, Leonard H. van den Berg5, Roel Ophoff1, 6, Panagiotis Deloukas7, David A. van Heel4 and Cisca Wijmenga1, 3,
, 
1 Complex Genetics Section, DBG-Department of Medical Genetics, University Medical Centre Utrecht, 3584 CG Utrecht, The Netherlands
2 Department of Epidemiology & Biostatistics, Erasmus MC Rotterdam, 3000 CA Rotterdam, The Netherlands
3 Genetics Department, University Medical Centre Groningen and University of Groningen, 9700 RB Groningen, The Netherlands
4 Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, London, E1 2AT, UK
5 Department of Neurology, Rudolf Magnus Institute of Neuroscience, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
6 Center for Neurobehavioral Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA
7 Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
Corresponding authorAbstract
Copy-number variation (CNV) is a major contributor to human genetic variation. Recently, CNV associations with human disease have been reported. Many genome-wide association (GWA) studies in complex diseases have been performed with sets of biallelic single-nucleotide polymorphisms (SNPs), but the available CNV methods are still limited. We present a new method (TriTyper) that can infer genotypes in case-control data sets for deletion CNVs, or SNPs with an extra, untyped allele at a high-resolution single SNP level. By accounting for linkage disequilibrium (LD), as well as intensity data, calling accuracy is improved. Analysis of 3102 unrelated individuals with European descent, genotyped with Illumina Infinium BeadChips, resulted in the identification of 1880 SNPs with a common untyped allele, and these SNPs are in strong LD with neighboring biallelic SNPs. Simulations indicate our method has superior power to detect associations compared to biallelic SNPs that are in LD with these SNPs, yet without increasing type I errors, as shown in a GWA analysis in celiac disease. Genotypes for 1204 triallelic SNPs could be fully imputed, with only biallelic-genotype calls, permitting association analysis of these SNPs in many published data sets. We estimate that 682 of the 1655 unique loci reflect deletions; this is on average 99 deletions per individual, four times greater than those detected by other methods. Whereas the identified loci are strongly enriched for known deletions, 61% have not been reported before. Genes overlapping with these loci more often have paralogs (p = 0.006) and biologically interact with fewer genes than expected (p = 0.004).
| Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms The American Journal of Human Genetics, Volume 83, Issue 1, 11 July 2008, Pages 112-119 Carl A. Anderson, Fredrik H. Pettersson, Jeffrey C. Barrett, Joanna J. Zhuang, Jiannis Ragoussis, Lon R. Cardon and Andrew P. Morris Abstract Genotype imputation is potentially a zero-cost method for bridging gaps in coverage and power between genotyping platforms. Here, we quantify these gains in power and coverage by using 1,376 population controls that are from the 1958 British Birth Cohort and were genotyped by the Wellcome Trust Case-Control Consortium with the Illumina HumanHap 550 and Affymetrix SNP Array 5.0 platforms. Approximately 50% of genotypes at single-nucleotide polymorphisms (SNPs) exclusively on the HumanHap 550 can be accurately imputed from direct genotypes on the SNP Array 5.0 or Illumina HumanHap 300. This roughly halves differences in coverage and power between the platforms. When the relative cost of currently available genome-wide SNP platforms is accounted for, and finances are limited but sample size is not, the highest-powered strategy in European populations is to genotype a larger number of individuals with the HumanHap 300 platform and carry out imputation. Platforms consisting of around 1 million SNPs offer poor cost efficiency for SNP association in European populations. Abstract | | |
| Rapid Simulation of P Values for Product Methods and Multiple-Testing Adjustment in Association Studies The American Journal of Human Genetics, Volume 76, Issue 3, 1 March 2005, Pages 399-408 S.R. Seaman and B. Müller-Myhsok Abstract A major aim of association studies is the identification of polymorphisms (usually SNPs) associated with a trait. Tests of association may be based on individual SNPs or on sets of neighboring SNPs, by use of (for example) a product P value method or Hotelling's T test. Linkage disequilibrium, the nonindependence of SNPs in physical proximity, causes problems for all these tests. First, multiple-testing correction for individual-SNP tests or for multilocus tests either leads to conservative P values (if Bonferroni correction is used) or is computationally expensive (if permutation is used). Second, calculation of product P values usually requires permutation. Here, we present the direct simulation approach (DSA), a method that accurately approximates P values obtained by permutation but is much faster. It may be used whenever tests are based on score statistics—for example, with Armitage's trend test or its multivariate analogue. The DSA can be used with binary, continuous, or count traits and allows adjustment for covariates. We demonstrate the accuracy of the DSA on real and simulated data and illustrate how it might be used in the analysis of a whole-genome association study. Abstract | | |
| Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype Clustering The American Journal of Human Genetics, Volume 81, Issue 5, 1 November 2007, Pages 1084-1097 Sharon R. Browning and Brian L. Browning Abstract Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available. Abstract | | |