Sangam: A Confluence of Knowledge Streams

Thousands of missed genes found in bacterial genomes and their analysis with COMBREX

Show simple item record

dc.creator Wood, Derrick E
dc.creator Lin, Henry
dc.creator Levy-Moonshine, Ami
dc.creator Swaminathan, Rajiswari
dc.creator Chang, Yi-Chien
dc.creator Anton, Brian P
dc.creator Osmani, Lais
dc.creator Steffen, Martin
dc.creator Kasif, Simon
dc.creator Salzberg, Steven L
dc.date 2021-09-28T18:41:24Z
dc.date 2021-09-28T18:41:24Z
dc.date 2012-10-30
dc.date.accessioned 2022-05-20T08:38:52Z
dc.date.available 2022-05-20T08:38:52Z
dc.identifier https://doi.org/10.13016/i3dm-vkvw
dc.identifier Wood, D.E., Lin, H., Levy-Moonshine, A. et al. Thousands of missed genes found in bacterial genomes and their analysis with COMBREX. Biol Direct 7, 37 (2012).
dc.identifier http://hdl.handle.net/1903/28038
dc.identifier.uri http://localhost:8080/xmlui/handle/CUHPOERS/117665
dc.description The dramatic reduction in the cost of sequencing has allowed many researchers to join in the effort of sequencing and annotating prokaryotic genomes. Annotation methods vary considerably and may fail to identify some genes. Here we draw attention to a large number of likely genes missing from annotations using common tools such as Glimmer and BLAST. By analyzing 1,474 prokaryotic genome annotations in GenBank, we identify 13,602 likely missed genes that are homologs to non-hypothetical proteins, and 11,792 likely missed genes that are homologs only to hypothetical proteins, yet have supporting evidence of their protein-coding nature from COMBREX, a newly created gene function database. We also estimate the likelihood that each potential missing gene found is a genuine protein-coding gene using COMBREX. Our analysis of the causes of missed genes suggests that larger annotation centers tend to produce annotations with fewer missed genes than smaller centers, and many of the missed genes are short genes <300 bp. Over 1,000 of the likely missed genes could be associated with phenotype information available in COMBREX. 359 of these genes, found in pathogenic organisms, may be potential targets for pharmaceutical research. The newly identified genes are available on COMBREX’s website.
dc.description https://doi.org/10.1186/1745-6150-7-37
dc.format application/pdf
dc.language en_US
dc.publisher Springer Nature
dc.relation College of Computer, Mathematical & Natural Sciences
dc.relation Computer Science
dc.relation Digital Repository at the University of Maryland
dc.relation University of Maryland (College Park, MD)
dc.subject Genome Annotation
dc.subject Annotate Gene
dc.subject Prokaryotic Genome
dc.subject True Gene
dc.subject Significant Sequence Similarity
dc.title Thousands of missed genes found in bacterial genomes and their analysis with COMBREX
dc.type Article


Files in this item

Files Size Format View
1745-6150-7-37.pdf 808.2Kb application/pdf View/Open

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse