Bias, Precision and Power of Some Techniques in Genome-Wide Association Analysis

Yajnik, Pranav

Sangam Home
→
Institutional Repositories
→
Deep Blue Repositories - University of Michigan
→
View Item

dc.contributor	Boehnke, Michael Lee
dc.contributor	Li, Jun
dc.contributor	Abecasis, Goncalo
dc.contributor	Little, Roderick J
dc.contributor	Scott, Laura Jean
dc.contributor	Zoellner, Sebastian K
dc.creator	Yajnik, Pranav
dc.date	2020-10-04T23:28:46Z
dc.date	NO_RESTRICTION
dc.date	2020-10-04T23:28:46Z
dc.date	2020
dc.date.accessioned	2022-05-19T13:30:33Z
dc.date.available	2022-05-19T13:30:33Z
dc.identifier	http://hdl.handle.net/2027.42/163049
dc.identifier	0000-0001-5294-4036
dc.identifier	Yajnik, Pranav; 0000-0001-5294-4036
dc.identifier.uri	http://localhost:8080/xmlui/handle/CUHPOERS/117368
dc.description	Genome-wide association studies (GWAS) have successfully identified thousands of genetic loci associated with a wide variety of human phenotypic traits. In this thesis, we evaluate the bias, precision and power of three statistical techniques employed in GWAS. In Chapter 2, we assess bias and power for adjusted-trait regression (ATR). ATR is a modification to the traditional ordinary least-squares estimation and F-test hypothesis testing techniques for quantitative trait multiple linear regression models. ATR involves performing bivariate correlation analysis between a genetic variant (or set of genetic variants) and a covariate-adjusted trait, obtained by regressing the trait on covariates. We show that ATR effect size estimates for single variant analysis are biased towards the null by a factor equal to coefficient of determination obtained from the regression of genetic variant onto covariates. We derive the exact distributions of ATR test statistics and show that ATR is less powerful than traditional methods when the genetic variant are correlated with covariates. The loss of power increases as stringency of Type 1 error control increases. The maximum possible power loss for the ATR multi-variant test is completely characterized by the canonical correlation between genetic variants and covariates. We show that, for typical covariates like genetic principal components, the loss of power will likely be low in practice. In Chapter 3, we assess three genetic imputation quality scores (allelic-RSQ, MACH-RSQ and INFO) as predictors for realized imputation quality (squared correlation between true genotypes and imputed dosages) for low-frequency and rare variants. We assess the impact of using different imputation algorithms (Beagle 4.2, minimac3 and IMPUTE 2) and reference panels (1000 Genomes [1KG] and Haplotype Reference Consortium [HRC]) on the relationship between imputation quality scores and realized quality. We imputed genotypes into 8,378 participants using each imputation algorithm with the 1KG panel and minimac3 with the HRC panel. We show that MACH-RSQ and INFO are identical when calculated on the same data. We observe that allelic-RSQ predicts realized quality less well than MACH-RSQ/INFO for low-frequency and rare variants. Realized quality decreases as minor allele frequency (MAF) decreases. The mean absolute difference (MAD) between quality scores and realized quality increases as MAF decreases. Imputation with HRC resulted in better realized quality for low-frequency and rare variants compared to imputation with 1KG. However, the MAD between quality scores and realized quality for low-frequency and rare variants was similar for both panels. In chapter 4, we assess the efficiency gained or lost by adding an external sample with missing case-control status to an (internal) case-control study sample. We propose a method for estimation and testing that accounts for the known (or presumed) proportion of cases in the external sample. Misspecification of the external sample case proportion leads to biased estimation; in particular, treating the external sample as a control sample leads to underestimation of the effect size. However, the proposed test controls Type 1 error regardless of the particular value chosen for the presumptive external sample case proportion. When treating the external participants as controls, addition of external participants improves power if the proportion of cases in the internal sample is at least twice that in the external sample.
dc.description	PHD
dc.description	Biostatistics
dc.description	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.description	http://deepblue.lib.umich.edu/bitstream/2027.42/163049/1/pyajnik_1.pdf
dc.format	application/pdf
dc.language	en_US
dc.subject	Genetic epidemiology
dc.subject	Genome-wide association analysis
dc.subject	Covariate adjusted trait
dc.subject	Imputation quality
dc.subject	External controls
dc.subject	Genetics
dc.subject	Statistics and Numeric Data
dc.subject	Science
dc.title	Bias, Precision and Power of Some Techniques in Genome-Wide Association Analysis
dc.type	Thesis

Files in this item

Files	Size	Format	View
pyajnik_1.pdf	33.20Mb	application/pdf	View/Open

This item appears in the following Collection(s)

Deep Blue Repositories - University of Michigan [17189]

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Bias, Precision and Power of Some Techniques in Genome-Wide Association Analysis

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection