Optimality and sub-optimality of PCA I: Spiked random matrix models

Perry, Amelia E.; Wein, Alexander Spence; Bandeira, Afonso S.; Moitra, Ankur

dc.contributor	Massachusetts Institute of Technology. Department of Mathematics
dc.creator	Perry, Amelia E.
dc.creator	Wein, Alexander Spence
dc.creator	Bandeira, Afonso S.
dc.creator	Moitra, Ankur
dc.date	2020-05-21T20:31:23Z
dc.date	2020-05-21T20:31:23Z
dc.date	2018-08
dc.date	2017-07
dc.date	2019-11-15T17:41:52Z
dc.date.accessioned	2023-03-01T18:12:17Z
dc.date.available	2023-03-01T18:12:17Z
dc.identifier	0090-5364
dc.identifier	https://hdl.handle.net/1721.1/125398
dc.identifier	Perry, Amelia et al. "Optimality and sub-optimality of PCA I: Spiked random matrix models." Annals of Statistics 46, 5 (October 2018), 2416-2451. © 2018 Institute of Mathematical Statistics.
dc.identifier.uri	http://localhost:8080/xmlui/handle/CUHPOERS/279144
dc.description	A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or “spike”) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Péché showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the spike strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, under structural assumptions on the spike, not all information is necessarily contained in the spectrum. We study the statistical limits of tests for the presence of a spike, including nonspectral tests. Our results leverage Le Cam's notion of contiguity and include: (i) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for certain natural priors for the spike. (ii) For any non-Gaussian Wigner ensemble, PCA is sub-optimal for detection. However, an efficient variant of PCA achieves the optimal threshold (for natural priors) by pre-transforming the matrix entries. (iii) For the Gaussian Wishart ensemble, the PCA threshold is optimal for positive spikes (for natural priors) but this is not always the case for negative spikes. Keywords: Random matrix; principal component analysis; hypothesis testing; deformed Wigner; spiked covariance; contiguity; power envelope; phase transition
dc.description	NSF CAREER Award (Grant CCF-1453261)
dc.description	NSF Large (Grant CCF-156523)
dc.format	application/pdf
dc.language	en
dc.publisher	Institute of Mathematical Statistics
dc.relation	http://dx.doi.org/10.1214/17-aos1625
dc.relation	Annals of Statistics
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike
dc.rights	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source	arXiv
dc.title	Optimality and sub-optimality of PCA I: Spiked random matrix models
dc.type	Article
dc.type	http://purl.org/eprint/type/JournalArticle

Files in this item

Files	Size	Format	View
1807.00891.pdf	715.5Kb	application/pdf	View/Open

This item appears in the following Collection(s)

MIT Open Access Articles [2860]

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Optimality and sub-optimality of PCA I: Spiked random matrix models

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection