PolyCat: A Resource for Genome Categorization of Sequencing Reads From Allopolyploid Organisms
Read mapping is a fundamental part of next-generation genomic research but is complicated
by genome duplication in many plants. Categorizing DNA sequence reads into their respective genomes
enables currentmethods to analyze polyploid genomes as if they were diploid. We present PolyCat—a pipeline
for mapping and categorizing all types of next-generation sequence data produced from allopolyploid organisms.
PolyCat uses GSNAP’s single-nucleotide polymorphism (SNP)-tolerant mapping to minimize the mapping
efficiency bias caused by SNPs between genomes. PolyCat then uses SNPs between genomes to categorize
reads according to their respective genomes. Bisulfite-treated reads have a significant reduction in nucleotide
complexity because nucleotide conversion events are confounded with transition substitutions. PolyCat includes
special provisions to properly handle bisulfite-treated data. We demonstrate the functionality of PolyCat on
allotetraploid cotton, Gossypium hirsutum, and create a functional SNP index for efficiently mapping sequence
reads to the D-genome sequence of G. raimondii. PolyCat is appropriate for all allopolyploids and all types of
next-generation genome analysis, including differential expression (RNA sequencing), differential methylation
(bisulfite sequencing), differential DNA-protein binding (chromatin immunoprecipitation sequencing), and population
The pdf of this paper is here and here on ResearchGate.