Resequencing of select cotton diploids

Insights into the evolution of cotton diploids and polyploids from whole-genome re-sequencing

Justin T. Page1, Mark D. Huynh2, Zach S. Liechty2, Kara Grupp3, David Stelly4, Amanda Hulse4, Hamid Ashrafi5, Allen Van Deynze5, Jonathan F. Wendel3, and Joshua A. Udall1,2

Understanding the composition, evolution, and function of the Gossypium hirsutum (cotton) allopolyploid genome is complicated by the joint presence in its nucleus of two genomes  (AT and DT genomes) derived from the progenitor A- and D-genome diploids involved in ancestral allopolyploidization. To better understand the allopolyploid genome, we re-sequenced the genomes of extant diploid relatives that contain the A1 (G. herbaceum), A2 (G. arboreum), or D5 (G. raimondii) genomes.  We conducted a comparative analysis using deep re-sequencing of multiple accessions of each diploid species and identified 24M SNPs between the A- and D-diploid genomes. These analyses facilitated the construction of a robust index of conserved SNPs between the A- and D-genomes at all detected polymorphic loci. This index is widely applicable for read mapping efforts of other diploid and allopolyploid Gossypium accessions. Further analysis also revealed locations of putative duplications and deletions in the A-genome relative to the D-genome reference sequence. The ~25,400 deleted regions included >50% deletion of 978 genes, including many involved with starch synthesis. In the polyploid genome, we also detected 1,472 conversion events between homoeologous chromosomes including events that overlapped 113 genes. Continued characterization of the Gossypium genomes will further enhance our ability to manipulate fiber and agronomic production of cotton.

The current manscript is under review.
The homoeo-SNPs can be visualized via Gbrowse at CottonGen (but you'll need to select the A/D SNP track to view the SNP results of this work).

Dgenome2_13.snp2.0.txt: SNPs between A and D diploids
Dgenome2_13.snp2.1.txt: SNPs between Maxxa's At and Dt genomes
psA.gene.fasta.txtpsA.cds.fasta.txtpsA.pep.fasta.txt: sequences modified to look like the A allele of Dgenome2_13.snp2.0
psAt.gene.fasta.txtpsAt.cds.fasta.txtpsAt.pep.fasta.txt: sequences modified to look like the At allele of Dgenome2_13.snp2.1
psDt.gene.fasta.txtpsDt.cds.fasta.txtpsDt.pep.fasta.txt: sequences modified to look like the Dt allele of Dgenome2_13.snp2.1
Dgenome2_13.gene.fasta.txtDgenome2_13.cds.fasta.txtDgenome2_13.pep.fasta.txt: unmodified sequences (matches D allele of Dgenome2_13.snp2.0)

Duplications and Deletions
A.peaks.bed.txtA1.peaks.bed.txtA2.peaks.bed.txt, F1_1.peaks.bed.txt: duplications shared by all A diploids, all A1s, all A2s, or in F1 deletions shared by all A diploids, all A1s, all A2s, or in F1

Conversion Events:
Adom.conv.bed.txt: At-biased conversion blocks
Ddom.conv.bed.txt: Dt-biased conversion blocks