CottonMD

A Multiomics Database for cotton biological study


About Variation

  Variation


A total of 4,180 accessions were collected,including 3,085 previously published (PRJNA336461, PRJNA375965, PRJNA399050, PRJNA257154, PRJNA473334, PRJNA605345, PRJNA576032, PRJNA530048, PRJNA414461)[1-12] and 310 provided by Xinjiang Academy Of Agricultural And Reclamation Science and Shihezi University.

All resequence reads of the 4,180 accessions were mapped to the reference genome (Gossypium hirsutum (AD1) 'TM-1' genome WHU_updated v1)[13]. After genotype imputation and filtering, (MAF>0.05, missing < 0.1), 14,285,086 high-quality variants were obtained, including 12,903,345 SNPs and 1,381,741 InDels.

There are 242,859 SNPs and 10,584 InDels in the gene coding sequences, among them, including 152,192 nonsynonymous SNPs, and 7,139 disruptive inframe indels, and 13,468 variants caused potentially large effects such as stop codon gain or loss. Mutiple haplotypes can be formed by different variations in the same block. The difference of haplotype frequency between different subpopulations or regions often represents the difference of germplasm characteristics and breeding direction in subpopulations and regions. Differences in variant genotypes or haplotypes may cause differences in the expression of nearby genes, which in turn affects the phenotypic differences between materials. For this reason, we statistics haplotype frequency of different regions and subpopulations, collected quality phenotypic data about cotton Seed cotton weight (SCW) (g), Lint weight (LW) (g), Lint percentage (LP) (%), Effective boll number (EBN) (bolls), Plant height (PH) (cm), First fruit spur height (FFSH) (cm), Fruit spur branch number (FSBN) (branches), First fruit branch position (FFBP) (nodes), Flowering period (FP) (days), Whole growth period (WGP) (days), Fiber upper half mean length (FUHML) (mm), Fiber length (FL) (mm), Fiber elongation (FE) (%), Fiber strength (FS) (cn/tex), Micronaire value (MV), Fiber uniformity (FU) (%), Short fiber rate (FR) (%), Verticillium Wilt disease index (DI) (%), Days to flowering (FD) (days), Leaf pubescence amount (LPA) (count/cm2) and expression data in different tissues, showed the difference of haplotype frequency between different regions and subgroups, as well as the difference of phenotype and gene expression between materials with different haplotype by violin chart.

  References


[1] Guo C , Pan Z , You C , et al. Association mapping and domestication analysis to dissect genetic improvement process of upland cotton yield-related traits in China[J]. Journal of Cotton Research, 2021, 4(2):12.
[2] Nie X, Wen T, Shao P, et al. High‐density genetic variation maps reveal the correlation between asymmetric interspecific introgressions and improvement of agronomic traits in Upland and Pima cotton varieties developed in Xinjiang, China[J]. The Plant Journal, 2020, 103 (2): 677-689.
[3] He S, Sun G, Geng X, et al. The genomic basis of geographic differentiation and fiber improvement in cultivated cotton[J]. Nature Genetics, 2021, 53 (6): 916-924.
[4] Wang M, Tu L, Lin M, et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication[J]. Nature Genetics, 2017, 49 (4):579.
[5] Ma Z, Zhang Y, Wu L, et al. High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement[J]. Nature genetics, 2021, 53(9): 1385-1391.
[6] Dai P , Sun G , Jia Y , et al. Extensive haplotypes are associated with population differentiation and environmental adaptability in Upland cotton ( Gossypium hirsutum )[J]. Theoretical and Applied Genetics, 2020, 133(12):3273-3285.
[7] Fang L, Wang Q, Hu Y, et al. Genomic analyses in cotton identify signatures of selection and loci associated with fiber quality and yield traits[J]. Nature genetics, 2017, 49 (7): 1089.
[8] Ma Z, He S, Wang X, et al. Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield[J]. Nature Genetics, 2018.
[9] Fang L, Gong H, Hu Y, et al. Genomic insights into divergence and dual domestication of cultivated allotetraploid cottons[J]. Genome biology, 2017, 18 (1): 1-13.
[10] Li J, Yuan D, Wang P, et al. Cotton pan-genome retrieves the lost sequences and genes during domestication and selection[J]. Genome biology, 2021, 22 (1): 1-26.
[11] Li B, Chen L, Sun W, et al. Phenomics‐based GWAS analysis reveals the genetic architecture for drought resistance in cotton[J]. Plant biotechnology journal, 2020, 18 (12): 2533-2544.
[12] Yuan D, Grover C E, Hu G, et al. Parallel and intertwining threads of domestication in allopolyploid cotton[J]. Advanced Science, 2021: 2003634.
[13] Huang G, Wu Z, Percy R G, et al. Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution[J]. Nature Genetics, 2020, 52 (5).