Genome information
Flowering genes in Arabidopsis thaliana
Flowering genes in Glycine max and Oryza sativa
The Pfam of flowering genes in Arabidopsis thaliana, Glycine max, and Oryza sativa
The Blast results of candidate flowering genes
The Pfam results of candidate flowering genes
Collection of flowering genes in Arabidopsis thaliana, Glycine max, and Oryza sativa
In PlantCFG, flowering genes are genes whose mutation and/or overexpression alters flowering time. We merged the results of three complementary approaches to collect flowering genes in Arabidopsis thaliana: i) starting with a gene list published by (Bouche et al., 2016); ii) we searched the GO and MapMan annotation files for all the Arabidopsis thaliana genes associated with the keywords ‘flowering’ and ‘vernalization’; iii) we searched the UniProt database (http://www.uniprot.org) for the Arabidopsis thaliana proteins associated with the keywords ‘flowering’ and ‘vernalization’. To ensure that the mutants and overexpression of these genes have an effect on flowering time, we retrieved the relevant publications by querying PubMed with the Arabidopsis thaliana gene names for each identified gene. The resulting compilation of more than 27,296 full-text PDF files was then analysed to find information about the phenotypes of the mutants and overexpressors. This was performed by contextual searches using FileLocator Pro (Version 8.5, https://www.mythicsoft.com/ ). Each phenotype was associated with the corresponding publication(s), and those genes were defined as flowering genes. The flowering genes of Glycine max were mainly collected from two review articles (Du et al., 2023; Lin et al., 2021). In addition, we retrieved the relevant publications in the two review articles, and the related flowering genes were also collected. The flowering genes of Oryza sativa were mainly collected from three review articles (Chen et al., 2022; Osnato, 2023; Zhou et al., 2021). In addition, we retrieved the relevant publications in the three review articles, and the related flowering genes were also collected.
Identification of candidate flowering genes in other species
We used the previous method in (Jia et al., 2022) to identify candidate flowering genes in other species. It is a combination of sequence similarity and conserved domain approaches. The details are described below: In the sequence similarity-based approach, i) the protein sequences of flowering genes in Arabidopsis thaliana, Glycine max, and Oryza sativa were used as queries. The BLASTP (2.10.0+) program was used to perform searches against each of the other species protein sequences with the following conditions: E-value < 1e-10 and identity> 50%; ii) all the protein sequences of other species were subjected to BLASTP and searched against the protein sequence of all genes in Arabidopsis thaliana, Glycine max, and Oryza sativa. For the protein sequences matched to Arabidopsis thaliana, Glycine max, and Oryza sativa, only the best alignment sequence was kept. iii) Based on the results in i) and ii), the homologous genes of other species were extracted. In the conserved domain-based approach, i) the protein sequences of the flowering genes in Arabidopsis thaliana, Glycine max, and Oryza sativa were searched against the Pfam-A database locally using HMMER 3.2.1 (Finn et al., 2011) hmmscan with an E-value of 1e-5; ii) HMMER 3.2.1 hmmbuild was used to construct the HMM models. All the protein sequences of other species were subjected to HMMER 3.2.1 hmmsearch and searched against the HMM models; iii) Extraction of the homologues of other species. Based on the results of the sequence similarity-based approach, if the homologous gene pair was supported by the conserved domain-based approach, the homologous gene pair was defined as ‘high confidence’.
Description of the methods used to improve the accuracy of homologous gene identification
i) The sequence similarity and motif conserved domain methods were used to classify the reliability of homologous gene pairs. ii) Candidate flowering genes in 112 species were identified based on Arabidopsis thaliana, Glycine max, and Oryza sativa flowering genes. The reliability of homologous gene pairs was determined based on the numbers of supports from Arabidopsis thaliana, Glycine max, and Oryza sativa. iii) We provide the phylogenetic analysis function for the gene family in the phylogenetic module. The shared homologous genes can be accessed according to the positions of candidate flowering genes in the phylogenetic tree.
The statistical results and supporting evidence of each method
i) In the “Confidence level” column of the Gene list table in the Search by species module, a total of 52,647 (64.1%) homologous gene pairs were supported by both BLAST and Pfam methods. ii) In the “Support” column of the Gene list table in the Search by species module, a total of 2,695 homologous genes were supported by all three species. Moreover, a total of 12,353 homologous genes were supported by at least two species. iii) In the phylogenetic tree of the SPL transcription factor family in Arabidopsis thaliana and Oryza sativa, AtSPL3, AtSPL4, AtSPL5, and LOC_Os07g32170 (OsSPL13) are on the same branch, indicating that OsSPL13 is a shared homologous gene of AtSPL3, AtSPL4, and AtSPL5. These results are consistent with previous results (Wang et al., 2021; Yao et al., 2022). Moreover, AtSPL9, AtSPL15, LOC_Os08g39890 (OsSPL14), and LOC_Os09g31438 (OsSPL17) are on the same branch, and these results are consistent with previously published results (Sun et al., 2021). The phylogenetic tree of the GRAS transcription factor family in Arabidopsis thaliana and Glycine max includes 5 Arabidopsis thaliana flowering genes (AtRGA1, AtGAI, AtRGL1, AtRGL2, and AtRGL3) and 7 Glycine max homologous genes, and their positions in the phylogenetic tree are consistent with previous results (Wang et al., 2020).
System Architecture and Software for Database Construction
The PlantCFG was built with softwares of MySQL (https://www.mysql.com), PHP 7.4 (http://www.php.net), ThinkPHP5.0 (http://www.thinkphp.cn/) and Apache 2.4 (http://www.apache.org) and all procedures were running on a Linux Ubuntu 7.5 (https://ubuntu.com/) operation system. To construct a user-friendly web interface, HTML5, CSS3, JQuery (http://jquery.com), Bootstrap (https://getbootstrap.com) was used in database pages. The statistical figures in the “Search by gene family” modules were visualized by Highcharts (https://www.highcharts.com/) and ECharts (https://echarts.apache.org/zh/index.html). The statistics information of gene was managed by Handsontable (https://handsontable.com/).and DataTables (https://datatables.net). Protein sequences in “Sequence identity” module were aligned using MAFFT (Katoh et al., 2005). The maximum-likelihood (ML) tree in the “Phylogenetic” module was constructed using FastTree (v.2.1.11) (Price et al., 2009). The collinearity relationship of candidate flowering genes between different species in the “Synteny” module was generated using bam2x (https://github.com/kentnf/bam2x.js).
References
1. Bouche, F., Lobet, G., Tocquin, P. and Perilleux, C. (2016) FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana. Nucleic Acids Res 44, D1167-1171.
2. Chen, R., Deng, Y., Ding, Y., Guo, J., Qiu, J., Wang, B., Wang, C., Xie, Y., Zhang, Z., Chen, J., Chen, L., Chu, C., He, G., He, Z., Huang, X., Xing, Y., Yang, S., Xie, D., Liu, Y. and Li, J. (2022) Rice functional genomics: decades' efforts and roads ahead. Sci China Life Sci 65, 33-92.
3. Du, H., Fang, C., Li, Y., Kong, F. and Liu, B. (2023) Understandings and future challenges in soybean functional genomics and molecular breeding. J Integr Plant Biol 65, 468-495.
4. Finn, R.D., Clements, J. and Eddy, S.R. (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39, W29-37.
5. Jia, Q., Brown, R., Kollner, T.G., Fu, J., Chen, X., Wong, G.K., et al. (2022) Origin and early evolution of the plant terpene synthase family. Proceedings of the National Academy of Sciences of the United States of America 119, e2100361119.
6. Katoh, K., Kuma, K., Toh, H. and Miyata, T. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33, 511-518.
7. Lin, X., Liu, B., Weller, J.L., Abe, J. and Kong, F. (2021) Molecular mechanisms for the photoperiodic regulation of flowering in soybean. J Integr Plant Biol 63, 981-994.
8. Osnato, M. (2023) Evolution of flowering time genes in rice: From the paleolithic to the anthropocene. Plant Cell Environ 46, 1046-1059.
9. Sun, H., Mei, J., Zhao, W., Hou, W., Zhang, Y., Xu, T., Wu, S. and Zhang, L. (2021) Phylogenetic Analysis of the SQUAMOSA Promoter-Binding Protein-Like Genes in Four Ipomoea Species and Expression Profiling of the IbSPLs During Storage Root Development in Sweet Potato (Ipomoea batatas). Front Plant Sci 12, 801061.
10. Price, M.N., Dehal, P.S. and Arkin, A.P. (2009) FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol 26, 1641-1650.
11. Wang, L., Ding, X., Gao, Y. and Yang, S. (2020) Genome-wide identification and characterization of GRAS genes in soybean (Glycine max). BMC Plant Biol 20, 415.
12. Yao, W., Li, C., Fu, H., Yang, M., Wu, H., Ding, Y., Li, L. and Lin, S. (2022) Genome-Wide Analysis of SQUAMOSA-Promoter-Binding Protein-like Family in Flowering Pleioblastus pygmaeus. Int J Mol Sci 23.
13. Zhou, S., Zhu, S., Cui, S., Hou, H., Wu, H., Hao, B., Cai, L., Xu, Z., Liu, L., Jiang, L., Wang, H. and Wan, J. (2021) Transcriptional and post-transcriptional regulation of heading date in rice. New Phytol 230, 943-956.