The collected accessions showed rich genetic diversity and wide geographical distribution, a total of 4,180 accessions including 3,743 G. hirsutum, 393 G. barbadense, 7 G. tomentosum, 6 G. darwinii, 6 G. mustelinum and 25 others accession[1-10]. 2,723 of these accessions came from Asia, including 2,619 from China and 104 from other parts of Asia. 726 accessions were from North America, 204 accessions from South America, 141 accessions from Europe, 82 accessions from Africa, 36 accessions from Oceania and 268 unknown.
In this database, 4,180 cotton accessions can be devided into eight groups according to SNP genotypes and their origins, named as G0-G7. G7 contains most G. barbadense accessions (n=400). G0 (n=39) consisting of wild G. hirsutum accessions from America. G1 (n=243) consisting of G. hirsutum landraces of median American. G2 (n=317) mainly consisting of G. hirsutum landraces of southern China. G3-6 comprising of the cultivated G. hirsutum accessions. Among them, most of accessions from Northwest China (NWC) and North China (NC) were grouped into G3 (n=538); G4 (n=795) contains accessions from three historical Chinese cotton planting areas; G5 (n=728) contains accessions from Yangzi River region (YZR); G6 (n=1,120) contains accessions from Yangtze River region (YZR) of China and the United States.
To identify genomic regions during the domestication and selection process, genetic diversity (π), Tajima's D pairwise fixation statistic (FST) and XP–CLR values were calculated. Average pairwise fixation statistic (FST) values among subgroups demonstrated that the genetic divergence within cultivated subgroups (G3-6) was low (0.007-0.028) compared with those among cultivated accessions and G1 (0.229-0.263) and between landraces (0.189), and those among cultivated accessions and G2 were intermediate (0.036-0.047).