Sangam: A Confluence of Knowledge Streams

Distribution-Based Clustering: Using Ecology To Refine the Operational Taxonomic Unit

Show simple item record

dc.contributor Massachusetts Institute of Technology. Department of Biological Engineering
dc.contributor Massachusetts Institute of Technology. Department of Civil and Environmental Engineering
dc.contributor Alm, Eric J.
dc.contributor Preheim, Sarah Pacocha
dc.contributor Perrotta, Allison Rose
dc.contributor Martin-Platero, Antonio M.
dc.contributor Gupta, Anika
dc.contributor Alm, Eric J.
dc.creator Preheim, Sarah Pacocha
dc.creator Perrotta, Allison Rose
dc.creator Martin-Platero, Antonio M.
dc.creator Gupta, Anika
dc.creator Alm, Eric J.
dc.date 2014-11-05T20:30:29Z
dc.date 2014-11-05T20:30:29Z
dc.date 2013-08
dc.date 2013-01
dc.date.accessioned 2023-03-01T18:09:50Z
dc.date.available 2023-03-01T18:09:50Z
dc.identifier 0099-2240
dc.identifier http://hdl.handle.net/1721.1/91469
dc.identifier Preheim, S. P., A. R. Perrotta, A. M. Martin-Platero, A. Gupta, and E. J. Alm. “Distribution-Based Clustering: Using Ecology To Refine the Operational Taxonomic Unit.” Applied and Environmental Microbiology 79, no. 21 (August 23, 2013): 6593–6603.
dc.identifier https://orcid.org/0000-0001-8294-9364
dc.identifier https://orcid.org/0000-0003-4378-9542
dc.identifier.uri http://localhost:8080/xmlui/handle/CUHPOERS/278990
dc.description 16S rRNA sequencing, commonly used to survey microbial communities, begins by grouping individual reads into operational taxonomic units (OTUs). There are two major challenges in calling OTUs: identifying bacterial population boundaries and differentiating true diversity from sequencing errors. Current approaches to identifying taxonomic groups or eliminating sequencing errors rely on sequence data alone, but both of these activities could be informed by the distribution of sequences across samples. Here, we show that using the distribution of sequences across samples can help identify population boundaries even in noisy sequence data. The logic underlying our approach is that bacteria in different populations will often be highly correlated in their abundance across different samples. Conversely, 16S rRNA sequences derived from the same population, whether slightly different copies in the same organism, variation of the 16S rRNA gene within a population, or sequences generated randomly in error, will have the same underlying distribution across sampled environments. We present a simple OTU-calling algorithm (distribution-based clustering) that uses both genetic distance and the distribution of sequences across samples and demonstrate that it is more accurate than other methods at grouping reads into OTUs in a mock community. Distribution-based clustering also performs well on environmental samples: it is sensitive enough to differentiate between OTUs that differ by a single base pair yet predicts fewer overall OTUs than most other methods. The program can decrease the total number of OTUs with redundant information and improve the power of many downstream analyses to describe biologically relevant trends.
dc.description United States. Dept. of Energy (Office of Science, contract no. DEAC02-05CH11231)
dc.format application/pdf
dc.language en_US
dc.publisher American Society for Microbiology
dc.relation http://dx.doi.org/10.1128/AEM.00342-13
dc.relation Applied and Environmental Microbiology
dc.rights Creative Commons Attribution-Noncommercial-Share Alike
dc.rights http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source Preheim
dc.title Distribution-Based Clustering: Using Ecology To Refine the Operational Taxonomic Unit
dc.type Article
dc.type http://purl.org/eprint/type/JournalArticle


Files in this item

Files Size Format View
Alm_Distributio ... figs tables supp info.pdf 2.902Mb application/pdf View/Open

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse