Sung K Kim (2006)
LINKAGE DISEQUILIBRIUM AND ITS APPLICATION TO MAPPING IN ARABIDOPSIS THALIANA
PhD thesis, UNIVERSITY OF SOUTHERN CALIFORNIA.
We present several studies related to the application of linkage disequilibrium (LD), the phenomena of the non-random association of alleles at two or more loci, in fine-mapping genes involved with complex traits. First, we develop and compare methods that estimate the sampling variance of a commonly used measure of LD, D . We find our method of maximum likelihood estimation, together with Fisher’s information theory, provide the best approximation to the sampling variance of D.
Second, we present work on LD and fine-scale mapping in Arabidopsis thaliana. We characterize LD using two complementary data sets and conclude that LD decays within 10 kb. We present results of a genome-wide association scan for complex traits involving floral induction and pathogen resistance. We conclude that despite high population structure, we were still successful in identifying previously known genes. We attempt to reduce the effects of population stratification by implementing several methodologies and conclude that these methods can somewhat reduce the level of inflation, but do not do so completely. Our results suggest that further work on adjusting for spurious association is warranted. Next, we present simulation study on the feasibility of using single-feature polymorphisms (SFPs), which are polymorphisms resulting from the use of high-density oligonucleotide tiling arrays for genotyping, for association mapping. We conclude that the large quantity of SFPs compensates the inherent noisiness of SFPs, which allows the use of this data for association mapping.
Lastly, we present preliminary results of prospects for LD mapping by analysis of polymorphisms obtained from 20 accessions using a hybridization-based whole-genome re-sequencing platform. We applied four tag-SNP selection algorithms to determine the reduction in the number of SNPs that need to be typed in order to fully represent the genome. We conclude that a ∼ 50% reduction is sufficient to capture the entire ~80K polymorphisms. In addition, we explore the power to detect SNPs at various marker densities and demonstrate that for a given density of 1:480bp (or 250,000 SNPs), we will provide a genome coverage that is nearly identical to that provided by the entire data set.