PolyCat: A Resource for Genome Categorization of Sequencing Reads From Allopolyploid Organisms


Read mapping is a fundamental part of next-generation genomic research but is complicated

by genome duplication in many plants. Categorizing DNA sequence reads into their respective genomes

enables currentmethods to analyze polyploid genomes as if they were diploid. We present PolyCata pipeline

for mapping and categorizing all types of next-generation sequence data produced from allopolyploid organisms.

PolyCat uses GSNAPs single-nucleotide polymorphism (SNP)-tolerant mapping to minimize the mapping

efficiency bias caused by SNPs between genomes. PolyCat then uses SNPs between genomes to categorize

reads according to their respective genomes. Bisulfite-treated reads have a significant reduction in nucleotide

complexity because nucleotide conversion events are confounded with transition substitutions. PolyCat includes

special provisions to properly handle bisulfite-treated data. We demonstrate the functionality of PolyCat on

allotetraploid cotton, Gossypium hirsutum, and create a functional SNP index for efficiently mapping sequence

reads to the D-genome sequence of G. raimondii. PolyCat is appropriate for all allopolyploids and all types of

next-generation genome analysis, including differential expression (RNA sequencing), differential methylation

(bisulfite sequencing), differential DNA-protein binding (chromatin immunoprecipitation sequencing), and population



The pdf of this paper is here and here on ResearchGate.