Copyright © 2003 The American Society of Human Genetics. All rights reserved.
The American Journal of Human Genetics, Volume 73, Issue 2, 285-300, 1 August 2003
doi:10.1086/377138
Andrew G. Clark1, 3,
,
, Rasmus Nielsen2, James Signorovitch2, Tara C. Matise4, Stephen Glanowski3, Jeremy Heil3, Emily S. Winn-Deen3, 5, Arthur L. Holden6 and Eric Lai7
1 Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY
2 Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY
3 Department of Celera Genomics, Rockville, MD
4 Department of Genetics, Rutgers University, Piscataway, NJ
5 Roche Molecular Systems, Pleasanton, CA
6 First Genetic Trust, Deerfield, IL; and
7 GlaxoSmithKline, Research Triangle Park, NC
Address for correspondence and reprints: Dr. Andrew G. Clark, Department of Molecular Biology and Genetics, 107 Biotechnology Building, Cornell University, Ithaca, NY 14853Abstract
The prospect of using linkage disequilibrium (LD) for fine-scale mapping in humans has attracted considerable attention, and, during the validation of a set of single-nucleotide polymorphisms (SNPs) for linkage analysis, a set of data for 4,833 SNPs in 538 clusters was produced that provides a rich picture of local attributes of LD across the genome. LD estimates may be biased depending on the means by which SNPs are first identified, and a particular problem of ascertainment bias arises when SNPs identified in small heterogeneous panels are subsequently typed in larger population samples. Understanding and correcting ascertainment bias is essential for a useful quantitative assessment of the landscape of LD across the human genome. Heterogeneity in the population recombination rate, ρ=4Nr, along the genome reflects how variable the density of markers will have to be for optimal coverage. We find that ascertainment-corrected ρ varies along the genome by more than two orders of magnitude, implying great differences in the recombinational history of different portions of our genome. The distribution of ρˆ is unimodal, and we show that this is compatible with a wide range of mixtures of hotspots in a background of variable recombination rate. Although ρˆ is significantly correlated across the three population samples, some regions of the genome exhibit population-specific spikes or troughs in ρ that are too large to be explained by sampling. This result is consistent with differences in the genealogical depth of local genomic regions, a finding that has direct bearing on the design and utility of LD mapping and on the National Institutes of Health HapMap project.
| Improving Power in Contrasting Linkage-Disequilibrium Patterns between Cases and Controls The American Journal of Human Genetics, Volume 80, Issue 5, 1 May 2007, Pages 911-920 Tao Wang, Xiaofeng Zhu and Robert C. Elston Abstract Genetic association studies offer an opportunity to find genetic variants underlying complex human diseases. The success of this approach depends on the linkage disequilibrium (LD) between markers and the disease variant(s) in a local region of the genome. Because, in the region with a disease mutation, the LD pattern among markers may differ between cases and controls, in some scenarios, it is useful to compare a measure of this LD, to map disease mutations. For example, using the composite correlation to characterize the LD among markers, Zaykin et al. recently suggested an “LD contrast” test and showed that it has high power under certain haplotype-driven disease models. Furthermore, it is likely that individual variants observed at different positions in a gene act jointly with each other to influence the phenotype, and the LD contrast test is also a useful method to detect such joint action. However, the LD among markers introduced by mutations and their joint action is usually confounded by background LD, which is measured at the population level, especially in a local region with disease mutations. Because the measures of LD that are usually used, such as the composite correlation, represent both effects, they may not be optimal for the purpose of detecting association when high background LD exists. Here, we describe a test that improves the LD contrast test by taking into account the background LD. Because the proposed test is developed in a regression framework, it is very flexible and can be extended to continuous traits and to incorporate covariates. Our simulation results demonstrate the validity and substantially higher power of the proposed method over current methods. Finally, we illustrate our new method by applying it to real data from the International Collaborative Study on Hypertension in Blacks. Abstract | | |
| Selection of Genetic Markers for Association Analyses, Using Linkage Disequilibrium and Haplotypes The American Journal of Human Genetics, Volume 73, Issue 1, 1 July 2003, Pages 115-130 Zhaoling Meng, Dmitri V. Zaykin, Chun-Fang Xu, Michael Wagner and Margaret G. Ehm Abstract The genotyping of closely spaced single-nucleotide polymorphism (SNP) markers frequently yields highly correlated data, owing to extensive linkage disequilibrium (LD) between markers. The extent of LD varies widely across the genome and drives the number of frequent haplotypes observed in small regions. Several studies have illustrated the possibility that LD or haplotype data could be used to select a subset of SNPs that optimize the information retained in a genomic region while reducing the genotyping effort and simplifying the analysis. We propose a method based on the spectral decomposition of the matrices of pairwise LD between markers, and we select markers on the basis of their contributions to the total genetic variation. We also modify Clayton’s “haplotype tagging SNP” selection method, which utilizes haplotype information. For both methods, we propose sliding window–based algorithms that allow the methods to be applied to large chromosomal regions. Our procedures require genotype information about a small number of individuals for an initial set of SNPs and selection of an optimum subset of SNPs that could be efficiently genotyped on larger numbers of samples while retaining most of the genetic variation in samples. We identify suitable parameter combinations for the procedures, and we show that a sample size of 50–100 individuals achieves consistent results in studies of simulated data sets in linkage equilibrium and LD. When applied to experimental data sets, both procedures were similarly effective at reducing the genotyping requirement while maintaining the genetic information content throughout the regions. We also show that haplotype-association results that Hosking et al. obtained near CYP2D6 were almost identical before and after marker selection. Abstract | | |