Asteraceae multi-omics information resource
——A multi-omics data platform for Asteraceae plants genetics and breeding research

About genomics (Download methods)

We collected 132 genomes from 74 species within the Asteraceae family, and 4,408,432 genes have been annotated (refer to the “Genome” module). We provide gene annotations from seven different perspectives, including homology with Arabidopsis and sunflower, gene and transcription factor families (PlantTFDB [1] and iTAK [2]), gene ontology [3], KEGG pathway (https://www.genome.jp/kegg/) and Pfam domains [4] (Table 1). Users can quickly retrieve 11,475,543 annotation data for individual genes or gene sets by searching with gene IDs, GO/KEGG/Pfam identifiers, or transcription factor/gene family names. The synteny analysis was carried out using MCScanX [5], and users can use genome synteny module to browse the alignment results among genomes.We selected genomes from 43 Asteraceae species to establish a robust pan-genome based on two criteria: i) BUSCO completeness scores greater than 80% [6], and ii) the utilization of RNA-seq data for gene structure prediction. The protein sequences of genes from 43 Asteraceae genomes were collected and input into OrthoFinder v2.5.4 [7]. This analysis resulted in a pan-genome comprising 95,770 gene clusters (groups of homologous genes).

We used Braker3 [8] to perform gene structure annotation on 24 high-quality Asteraceae genomes that lacked gene annotations. Braker3 integrates three annotation methods: ab initio gene prediction, homology protein-based gene prediction, and RNA-seq-based gene prediction. The homology protein library includes protein sequences from Arabidopsis, sunflower, cultivated Chrysanthemum, and lettuce, while the transcriptome data comprises samples from various tissues and treatments, with a mapping rate exceeding 80%.

Variations, including 30,507,963 SNPs and 12,257,327 InDels, were identified from 2,392 transcriptomes. To explore the impact of these variations on gene expression, we developed the 'Genome Variations' page. This page integrates genetic variation and gene expression data, providing descriptions of effect annotations and associations between genotype and gene expression. Users can enter a gene name, gene ID, or chromosome region to identify candidate variations that correlate with gene expression using the Single-locus module.

SNP calling was performed using the Genome Analysis Toolkit (v4.1.4.1) [9]. SNPs from the joint genotyping were further filtered to exclude sites with a minor allele frequency (MAF) < 0.05, and those with missing data. The annotations and effects of SNPs on gene function were predicted using SnpEff (v5.0) software [10].

Table1 The original sources of 132 genomes
SpeciesGenome IDGenome versionGenome size
(Gb/Mb)
Chromosomes+Scaffolds+Contigs
(FASTA Files)
Scaffold N50CDS numbersProtein-coding gene numbersGFF3 filesReference
Table2 Gene annotation for 54 species of Asteraceae
SpeciesVersionGeneArabidopsisSunflowerGOKEGGPfamTF(PlantTFDB)TF(iTAK)BuscoPan-genome
Table 3 The original sources of data used in Genome variation module.
SamplePRJSRRTissueTreatment typeTreatment groupTreatment timeReference
Tutorial video on "Genomics"
References
  1. Jin, J., Tian, F., Yang, D.C., Meng, Y.Q., Kong, L., Luo, J. and Gao, G. (2017) PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res, 45, D1040-D1045.
  2. Zheng, Y., Jiao, C., Sun, H., Rosli, Hernan G., Pombo, Marina A., Zhang, P., Banf, M., Dai, X., Martin, Gregory B., Giovannoni, James J. et al. (2016) iTAK: A Program for Genome-wide Prediction and Classification of Plant Transcription Factors, Transcriptional Regulators, and Protein Kinases. Molecular Plant, 9, 1667-1670.
  3. Aleksander, S.A., Balhoff, J., Carbon, S., Cherry, J.M., Drabkin, H.J., Ebert, D., Feuermann, M., Gaudet, P., Harris, N.L., Hill, D.P. et al. (2023) The Gene Ontology knowledgebase in 2023. Genetics, 224.
  4. Jones, P., Binns, D., Chang, H.-Y., Fraser, M., Li, W., McAnulla, C., McWilliam, H., Maslen, J., Mitchell, A., Nuka, G. et al. (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics, 30, 1236-1240.
  5. Wang, Y., Tang, H., DeBarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.h., Jin, H., Marler, B., Guo, H. et al. (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Research, 40, e49-e49.
  6. Raghavan, V., Kraft, L., Mesny, F. and Rigerte, L. (2022) A simple guide to de novo transcriptome assembly and annotation. Briefings in Bioinformatics, 23.
  7. Emms, D.M. and Kelly, S. (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology, 20.
  8. Gabriel L, Brůna T, Hoff KJ, Ebel M, Lomsadze A, Borodovsky M, et al. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. bioRxiv. 2024 Feb 29:2023.06.10.544449.
  9. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M. et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res, 20, 1297-1303.
  10. Cingolani, P., Platts, A., Wang, L.L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X. and Ruden, D.M. (2014) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly, 6, 80-92.