Semiparametric and Nonparametric Methods for Complex Data

Kim, Byung-Jun

Sangam Home
→
Electronic Theses and Dissertations (ETDs)
→
Virginia Tech Electronic Theses and Dissertations
→
View Item

dc.contributor	Statistics
dc.contributor	Kim, Inyoung
dc.contributor	Terrell, George R.
dc.contributor	Deng, Xinwei
dc.contributor	Du, Pang
dc.creator	Kim, Byung-Jun
dc.date	2020-06-27T08:00:24Z
dc.date	2020-06-27T08:00:24Z
dc.date	2020-06-26
dc.date.accessioned	2023-02-28T18:21:14Z
dc.date.available	2023-02-28T18:21:14Z
dc.identifier	vt_gsexam:26831
dc.identifier	http://hdl.handle.net/10919/99155
dc.identifier.uri	http://localhost:8080/xmlui/handle/CUHPOERS/269689
dc.description	A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing those complex data in this dissertation. We have then provided several contributions to semiparametric and nonparametric methods for dealing with the following problems: the first is to propose a method for testing the significance of a functional association under the matched study; the second is to develop a method to simultaneously identify important variables and build a network in HDHC data; the third is to propose a multi-class dynamic model for recognizing a pattern in the time-trend analysis. For the first topic, we propose a semiparametric omnibus test for testing the significance of a functional association between the clustered binary outcomes and covariates with measurement error by taking into account the effect modification of matching covariates. We develop a flexible omnibus test for testing purposes without a specific alternative form of a hypothesis. The advantages of our omnibus test are demonstrated through simulation studies and 1-4 bidirectional matched data analyses from an epidemiology study. For the second topic, we propose a joint semiparametric kernel machine network approach to provide a connection between variable selection and network estimation. Our approach is a unified and integrated method that can simultaneously identify important variables and build a network among them. We develop our approach under a semiparametric kernel machine regression framework, which can allow for the possibility that each variable might be nonlinear and is likely to interact with each other in a complicated way. We demonstrate our approach using simulation studies and real application on genetic pathway analysis. Lastly, for the third project, we propose a Bayesian focal-area detection method for a multi-class dynamic model under a Bayesian hierarchical framework. Two-step Bayesian sequential procedures are developed to estimate patterns and detect focal intervals, which can be used for gas chromatography. We demonstrate the performance of our proposed method using a simulation study and real application on gas chromatography on Fast Odor Chromatographic Sniffer (FOX) system.
dc.description	Doctor of Philosophy
dc.description	A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing the following three types of data: (1) matched case-crossover data, (2) HCHD data, and (3) Time-series data. We contribute to the development of statistical methods to deal with such complex data. First, under the matched study, we discuss an idea about hypothesis testing to effectively determine the association between observed factors and risk of interested disease. Because, in practice, we do not know the specific form of the association, it might be challenging to set a specific alternative hypothesis. By reflecting the reality, we consider the possibility that some observations are measured with errors. By considering these measurement errors, we develop a testing procedure under the matched case-crossover framework. This testing procedure has the flexibility to make inferences on various hypothesis settings. Second, we consider the data where the number of variables is very large compared to the sample size, and the variables are correlated to each other. In this case, our goal is to identify important variables for outcome among a large amount of the variables and build their network. For example, identifying few genes among whole genomics associated with diabetes can be used to develop biomarkers. By our proposed approach in the second project, we can identify differentially expressed and important genes and their network structure with consideration for the outcome. Lastly, we consider the scenario of changing patterns of interest over time with application to gas chromatography. We propose an efficient detection method to effectively distinguish the patterns of multi-level subjects in time-trend analysis. We suggest that our proposed method can give precious information on efficient search for the distinguishable patterns so as to reduce the burden of examining all observations in the data.
dc.format	ETD
dc.format	application/pdf
dc.language	en
dc.publisher	Virginia Tech
dc.rights	In Copyright
dc.rights	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	Bayesian Hierarchical Model
dc.subject	Fused Lasso
dc.subject	Gaussian graphical model
dc.subject	High-dimensional regression
dc.subject	Kernel machine learning based regression
dc.subject	Matched case-control study
dc.subject	Measurement error in covariates
dc.subject	Multivariate analysis
dc.subject	Semiparametric regression
dc.title	Semiparametric and Nonparametric Methods for Complex Data
dc.type	Dissertation

Files in this item

Files	Size	Format	View
Kim_B_D_2020.pdf	4.327Mb	application/pdf	View/Open

This item appears in the following Collection(s)

Virginia Tech Electronic Theses and Dissertations [4810]
Doctoral Dissertations

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Semiparametric and Nonparametric Methods for Complex Data

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection