Sangam: A Confluence of Knowledge Streams

Robust Feature Screening Procedures for Mixed Type of Data

Show simple item record

dc.contributor Statistics
dc.contributor Du, Pang
dc.contributor Deng, Xinwei
dc.contributor Hong, Yili
dc.contributor Kim, Inyoung
dc.creator Sun, Jinhui
dc.date 2016-12-17T09:00:16Z
dc.date 2016-12-17T09:00:16Z
dc.date 2016-12-16
dc.date.accessioned 2023-03-01T08:10:34Z
dc.date.available 2023-03-01T08:10:34Z
dc.identifier vt_gsexam:9678
dc.identifier http://hdl.handle.net/10919/73709
dc.identifier.uri http://localhost:8080/xmlui/handle/CUHPOERS/276638
dc.description High dimensional data have been frequently collected in many fields of scientific research and technological development. The traditional idea of best subset selection methods, which use penalized L_0 regularization, is computationally too expensive for many modern statistical applications. A large number of variable selection approaches via various forms of penalized least squares or likelihood have been developed to select significant variables and estimate their effects simultaneously in high dimensional statistical inference. However, in modern applications in areas such as genomics and proteomics, ultra-high dimensional data are often collected, where the dimension of data may grow exponentially with the sample size. In such problems, the regularization methods can become computationally unstable or even infeasible. To deal with the ultra-high dimensionality, Fan and Lv (2008) proposed a variable screening procedure via correlation learning to reduce dimensionality in sparse ultra-high dimensional models. Since then many authors further developed the procedure and applied to various statistical models. However, they all focused on single type of predictors, that is, the predictors are either all continuous or all discrete. In practice, we often collect mixed type of data, which contains both continuous and discrete predictors. For example, in genetic studies, we can collect information on both gene expression profiles and single nucleotide polymorphism (SNP) genotypes. Furthermore, outliers are often present in the observations due to experimental errors and other reasons. And the true trend underlying the data might not follow the parametric models assumed in many existing screening procedures. Hence a robust screening procedure against outliers and model misspecification is desired. In my dissertation, I shall propose a robust feature screening procedure for mixed type of data. To gain insights on screening for individual types of data, I first studied feature screening procedures for single type of data in Chapter 2 based on marginal quantities. For each type of data, new feature screening procedures are proposed and simulation studies are performed to compare their performances with existing procedures. The aim is to identify a best robust screening procedure for each type of data. In Chapter 3, I combine these best screening procedures to form the robust feature screening procedure for mixed type of data. Its performance will be assessed by simulation studies. I shall further illustrate the proposed procedure by the analysis of a real example.
dc.description Ph. D.
dc.format ETD
dc.format application/pdf
dc.publisher Virginia Tech
dc.rights In Copyright
dc.rights http://rightsstatements.org/vocab/InC/1.0/
dc.subject ultra-high dimensional variable selection
dc.subject feature screening
dc.subject mixed type of data
dc.title Robust Feature Screening Procedures for Mixed Type of Data
dc.type Dissertation


Files in this item

Files Size Format View
Sun_J_D_2016.pdf 358.1Kb application/pdf View/Open

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse