A total of 4,180 accessions were collected,including 3,085 previously published (PRJNA336461, PRJNA375965, PRJNA399050, PRJNA257154, PRJNA473334, PRJNA605345, PRJNA576032, PRJNA530048, PRJNA414461)[1-12] and 310 provided by Xinjiang Academy Of Agricultural And Reclamation Science and Shihezi University.
All resequence reads of the 4,180 accessions were mapped to the reference genome (Gossypium hirsutum (AD1) 'TM-1' genome WHU_updated v1)[13]. After genotype imputation and filtering, (MAF>0.05, missing < 0.1), 14,285,086 high-quality variants were obtained, including 12,903,345 SNPs and 1,381,741 InDels.
There are 242,859 SNPs and 10,584 InDels in the gene coding sequences, among them, including 152,192 nonsynonymous SNPs, and 7,139 disruptive inframe indels, and 13,468 variants caused potentially large effects such as stop codon gain or loss. Mutiple haplotypes can be formed by different variations in the same block. The difference of haplotype frequency between different subpopulations or regions often represents the difference of germplasm characteristics and breeding direction in subpopulations and regions. Differences in variant genotypes or haplotypes may cause differences in the expression of nearby genes, which in turn affects the phenotypic differences between materials. For this reason, we statistics haplotype frequency of different regions and subpopulations, collected quality phenotypic data about cotton Seed cotton weight (SCW) (g), Lint weight (LW) (g), Lint percentage (LP) (%), Effective boll number (EBN) (bolls), Plant height (PH) (cm), First fruit spur height (FFSH) (cm), Fruit spur branch number (FSBN) (branches), First fruit branch position (FFBP) (nodes), Flowering period (FP) (days), Whole growth period (WGP) (days), Fiber upper half mean length (FUHML) (mm), Fiber length (FL) (mm), Fiber elongation (FE) (%), Fiber strength (FS) (cn/tex), Micronaire value (MV), Fiber uniformity (FU) (%), Short fiber rate (FR) (%), Verticillium Wilt disease index (DI) (%), Days to flowering (FD) (days), Leaf pubescence amount (LPA) (count/cm2) and expression data in different tissues, showed the difference of haplotype frequency between different regions and subgroups, as well as the difference of phenotype and gene expression between materials with different haplotype by violin chart.