Transposable elements (TEs) are DNA sequences with the capability to move within a genome. Whilst often detrimental, there are also multiple examples demonstrating the potential of TEs for the evolution of their hosts, including in the evolution of insecticide resistance, mammalian pregnancy, and adaptive immunity. Despite these examples, open questions remain regarding the extent to which TEs provide a general mechanism for host evolution, and the importance of TEs as a source of genetic variation compared to other contributions.
To address broadscale questions on TE-host dynamics, comparative genomic studies can be used to assess the contributions of TEs to host genomes, whilst also uncovering the interactions which can often lead to large differences in TE abundance and diversity among species, even within a single genus. Such studies are timely due to the increasing availability of high-quality genome assemblies for non-model organisms. However, processing large numbers of genome assemblies to understand TE-host dynamics requires effective automated TE annotation methodologies, as manual annotation is unviable for studies considering more than a handful of genome assemblies. To address this, in Chapter Two, a fully-automated TE curation and annotation pipeline named ‘Earl Grey’ is developed, which aims to address some of the core issues regarding TE annotation, namely poorly-defined TE boundaries and redundant consensus sequences in TE consensus libraries, and fragmented and overlapping TE annotations. Earl Grey aims to outperform other widely-used TE annotation tools and produces outputs in standard formats for compatibility with downstream analyses, along with summary figures. Earl Grey is capable of analysing large numbers of genomes without any intervention, and is an effective tool for large-scale comparative analyses incorporating large numbers of genome assemblies. The aim is for Earl Grey to continue to incorporate new modules using feedback from users and advances in TE annotation methodologies, with the aim of becoming a community-led TE annotation and curation tool.
In Chapter Three, a recent high-quality chromosomal assembly of the monarch butterfly (Danaus plexippus) is used for an in-depth exploration into the impacts of TEs in shaping the host genome by examining TE expansions, host interactions, and removal, whilst the availability of two other Danaus genomes provide a comparative context. The TE content of the monarch was found to be much lower than the content annotated in the original draft genome assembly (This study: 6.21%, Draft Genome: 13.06%), and also to be much lower than in other Danaus genomes (D. melanippus: 11.87%, D. chrysippus: 33.97%). The reduction in annotated TE content compared to the original draft genome is attributed to an improved understanding of TE structure resulting in the exclusion of erroneous annotations previously annotated as TE, along with the conservative annotation approach taken in this study. The differences among Danaus species were not due to variation in DNA deletion rates, but are hypothesised to be due to expansions in lineage-specific TEs. Three newly-identified TE families, r2-hero_dPle (LINE), penelope-1_dPle (Penelope-like), and hase2-1_dPle (SINE) contribute over one third of the total TE content in the monarch. Historical bursts of LINE activity are evident in the monarch genome. However, just two novel Tc1 families (tc1-1_dPle and tc1-2_dPle) have rapidly expanded over the last ~500,000 years. TE content was found to be unevenly distributed between different genomic compartments, with a strong negative correlation between gene density and TE density. Six gene hotspots containing putatively important host functions were significantly depleted of TEs, potentially reflecting the deleterious consequences of TE insertions and the selection against TEs detrimental to host fitness. There is evidence of LINE and Penelope-like element removal via both dissociation of transcription machinery and genomic deletions, and the presence of swathes of small fragments suggests rapid turnover of Non-LTR TEs. This contrasts with patterns observed in mammals, where lower rates of TE turnover result in the accumulation of more ancient TEs. Together, the findings presented in Chapter Three demonstrate the ongoing dynamic nature of the interactions between TEs and their host genomes, with ongoing TE activity having the potential to considerably alter host genomes over evolutionary timescales, which can drive significant differences even among very closely related species.
Aphids are destructive crop pests, and several species have evolved multiple resistance to insecticides. Meanwhile, TEs have been implicated in the evolution of insecticide resistance. Consequently, in Chapter Four, 21 available aphid genomes are analysed to explore the extent to which TEs act as a general source of genomic novelty for host evolution. TEs were found to be significantly enriched at xenobiotic resistance loci (XRL) when compared to other genes, and showed enrichment levels similar to housekeeping genes. However, unlike at housekeeping genes, the maintenance of TEs around XRL is unlikely to arise through constitutive expression and associated open chromatin, and is more likely to arise via selection for insertions in these regions, as XRL are not expressed in germline cells. Further, TEs are also enriched around cytochrome P450 genes with known functions in the detoxification of synthetic insecticides in three aphids of agricultural importance (A. pisum, M. cerasi, and M. persicae). Together, these findings suggest that TEs are selected around XRL in aphids for beneficial purposes.
In Chapter Five, 88 high quality genome assemblies are analysed to uncover the physiological and ecological determinants leading to variation in the TE content across butterflies. TE content, as a proportion of total genome size, is highly variable across butterflies, ranging between 6.21% in Danaus plexippus to 67.55% in Satyrium esculi. A strong phylogenetic signal was found in both TE abundance and diversity, indicating that TE abundance and diversity in extant butterflies are good indicators of the evolutionary past. Three life history traits were found to strongly correlate with TE abundance (forewing size, voltinism, and species distribution), whilst two were found to strongly correlate with TE diversity (voltinism and species distribution). All strongly correlated life history traits impacted TE abundance and diversity in the negative direction. For voltinism, this confirmed the hypothesis, where multivoltinism is predicted to result in more efficient purging of TE insertions from host genomes. However, this directionality contrasts hypotheses for forewing size and species distribution, where larger butterflies and those with larger distributions were expected to have higher TE abundances and diversities through more ecological opportunities raising the potential for invasion of novel TEs into host genomes through processes such as horizontal transfer. Overall, these findings highlight the significant impacts that ecological and physiological characteristics of species can have on TE abundance and diversity.
This thesis aims to provide methodologies for the automated annotation and curation of TEs, and apply these to further our understanding of TE-host dynamics. The development of an improved automated TE annotation tool highlights the continued need for enhanced methodologies to advance the field of TE biology. Community-led efforts have the potential to provide a significant benefit on this front through the combination of expertise to produce a consistent TE annotation tool of benefit both within the field of TE biology and more widely. Limited understanding of the processes leading to large differences in TE content among closely related species, maintenance of TEs around genes associated with processes under strong selection, and the ecological drivers influencing TE diversity and abundance, highlight the need for further large-scale comparative studies.
Biotechnology & Biological Sciences Research Council (BBSRC)
Biotechnology & Biological Sciences Research Council (BBSRC)