Cotton Research

Check out our Structural Genomics website for our latest research focus on understanding genome evolution in polyploid cottons.

Cotton is known to many by the clothes we wear, but it also has important industrial uses, including Money! What you probably don't know is that the word 'cotton' refers to fibers that grow on the seed coats of four different plant species. So when we talk about cotton, we are loosely referring to Gossypium hirsutum (tetraploid AD1-genome) from Central America, since it currently represents 90-95% of all cotton produced worldwide. Before the Spanish Conquest of Central and South America, Europe imported cotton from India and Africa where indigenous cultures had separately domesticated G. herbaceum (diploid A1-genome) and G. arboreum (diploid A2-genome). Diploid cotton is still produced in those countries, though it is largely consumed domestically and only exported in limited amounts. The fourth species that produces cotton is G. barbadense (tetraploid AD2-genome) from South America. G. barbadense is often called Pima cotton, Egyptian cotton, or Sea-Island cotton. It has longer and stronger fibers than G. hirsutum, thus fewer fibers are needed to make a thread ... which allows a higher density of threads to be used in textiles where a higher thread-count is desirable. Unfortunately, G. barbasense does not produce as many flowers (or bolls) as G. hirsutum and consequently the yield per acre is substantially lower.

A broader use of the word 'cotton' includes other species with the genus of Gossypium. One species of particular note is G. raimondii (diploid D-genome) from Central America. This D-genome and the A-genome diploids mentioned above are closely related to the two co-resident genomes of the tetraploids G. hirsutum and G. barbadense.

Cotton plant

You may have noticed above that the G. hirsutum and G. barbadense species are New World tetraploids containing an A- and D-genome.  It is no coincidence that both the tetraploid species with duplicated genomes have longer fiber, higher strength, and greater yield. While it is not coincidence, we still do not know exactly why that is.  Some of the research in my lab is to figure out why these tetraploids are more productive than diploids. Other research areas include questions regarding basic biology ... how are duplicate genes expressed and regulated in the polyploid genome.

One of the most important historical events of cotton research occurred in December 2012 when the diploid genome of G. raimondii was published in Nature. I'm very proud of my students to contributed to its characterization. It is a big deal for many reasons, one being that we (cotton researchers) have a common genomic reference to use for all of our subsequent work.  Several ways we are using the cotton reference genome is to:

1) when we sequence tetraploid cotton, we are sequencing two genomes at the same time (A and D) because they cannot be separated during DNA extraction. When we look at the DNA sequence, we cannot tell if the sequence was derived from the A-genome or the D-genome without some prior knowledge. We created a program called PolyCat to categorize the reads of tetraploid cotton. This program will work with any other polyploid genome.

2) when we sequence other diploid varieties of both the A- and D-genome, what types and what degree of nucleotide variation do we discover. This is the focus of the "Diploid cotton re-sequencing effort"

3) When we assess gene expression in a model tissue (i.e. petals), how are the two genomes interacting during gene expression to produce the the proteins essential for this reproductive organ? We are not particularly interested in petal biology per se, rather the description of how transcription is regulated when two copies of every gene exist. This is the focus of the "Petal expression paper"

4) Are differences of gene expression by DNA methylation? This is the focus of our "Cotton petal DNA methylation paper"