Sangam: A Confluence of Knowledge Streams

Bias, Precision and Power of Some Techniques in Genome-Wide Association Analysis

Show simple item record

dc.contributor Boehnke, Michael Lee
dc.contributor Li, Jun
dc.contributor Abecasis, Goncalo
dc.contributor Little, Roderick J
dc.contributor Scott, Laura Jean
dc.contributor Zoellner, Sebastian K
dc.creator Yajnik, Pranav
dc.date 2020-10-04T23:28:46Z
dc.date NO_RESTRICTION
dc.date 2020-10-04T23:28:46Z
dc.date 2020
dc.date.accessioned 2022-05-19T13:30:33Z
dc.date.available 2022-05-19T13:30:33Z
dc.identifier http://hdl.handle.net/2027.42/163049
dc.identifier 0000-0001-5294-4036
dc.identifier Yajnik, Pranav; 0000-0001-5294-4036
dc.identifier.uri http://localhost:8080/xmlui/handle/CUHPOERS/117368
dc.description Genome-wide association studies (GWAS) have successfully identified thousands of genetic loci associated with a wide variety of human phenotypic traits. In this thesis, we evaluate the bias, precision and power of three statistical techniques employed in GWAS. In Chapter 2, we assess bias and power for adjusted-trait regression (ATR). ATR is a modification to the traditional ordinary least-squares estimation and F-test hypothesis testing techniques for quantitative trait multiple linear regression models. ATR involves performing bivariate correlation analysis between a genetic variant (or set of genetic variants) and a covariate-adjusted trait, obtained by regressing the trait on covariates. We show that ATR effect size estimates for single variant analysis are biased towards the null by a factor equal to coefficient of determination obtained from the regression of genetic variant onto covariates. We derive the exact distributions of ATR test statistics and show that ATR is less powerful than traditional methods when the genetic variant are correlated with covariates. The loss of power increases as stringency of Type 1 error control increases. The maximum possible power loss for the ATR multi-variant test is completely characterized by the canonical correlation between genetic variants and covariates. We show that, for typical covariates like genetic principal components, the loss of power will likely be low in practice. In Chapter 3, we assess three genetic imputation quality scores (allelic-RSQ, MACH-RSQ and INFO) as predictors for realized imputation quality (squared correlation between true genotypes and imputed dosages) for low-frequency and rare variants. We assess the impact of using different imputation algorithms (Beagle 4.2, minimac3 and IMPUTE 2) and reference panels (1000 Genomes [1KG] and Haplotype Reference Consortium [HRC]) on the relationship between imputation quality scores and realized quality. We imputed genotypes into 8,378 participants using each imputation algorithm with the 1KG panel and minimac3 with the HRC panel. We show that MACH-RSQ and INFO are identical when calculated on the same data. We observe that allelic-RSQ predicts realized quality less well than MACH-RSQ/INFO for low-frequency and rare variants. Realized quality decreases as minor allele frequency (MAF) decreases. The mean absolute difference (MAD) between quality scores and realized quality increases as MAF decreases. Imputation with HRC resulted in better realized quality for low-frequency and rare variants compared to imputation with 1KG. However, the MAD between quality scores and realized quality for low-frequency and rare variants was similar for both panels. In chapter 4, we assess the efficiency gained or lost by adding an external sample with missing case-control status to an (internal) case-control study sample. We propose a method for estimation and testing that accounts for the known (or presumed) proportion of cases in the external sample. Misspecification of the external sample case proportion leads to biased estimation; in particular, treating the external sample as a control sample leads to underestimation of the effect size. However, the proposed test controls Type 1 error regardless of the particular value chosen for the presumptive external sample case proportion. When treating the external participants as controls, addition of external participants improves power if the proportion of cases in the internal sample is at least twice that in the external sample.
dc.description PHD
dc.description Biostatistics
dc.description University of Michigan, Horace H. Rackham School of Graduate Studies
dc.description http://deepblue.lib.umich.edu/bitstream/2027.42/163049/1/pyajnik_1.pdf
dc.format application/pdf
dc.language en_US
dc.subject Genetic epidemiology
dc.subject Genome-wide association analysis
dc.subject Covariate adjusted trait
dc.subject Imputation quality
dc.subject External controls
dc.subject Genetics
dc.subject Statistics and Numeric Data
dc.subject Science
dc.title Bias, Precision and Power of Some Techniques in Genome-Wide Association Analysis
dc.type Thesis


Files in this item

Files Size Format View
pyajnik_1.pdf 33.20Mb application/pdf View/Open

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse