Tag: Plant genetics

  • International Wheat Genome Sequencing Consortium (IWGSC). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018).

    Article 

    Google Scholar
     

  • Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588, 277–283 (2020).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Salamini, F., Zkan, H., Brandolini, A., Schfer-Pregl, R. & Martin, W. Genetics and geography of wild cereal domestication in the near east. Nat. Rev. Genet. 3, 429–441 (2002).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • The International Wheat Genome Sequencing Consortium (IWGSC). A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014).

    Article 

    Google Scholar
     

  • Feldman, M. & Levy, A. A. Genome evolution due to allopolyploidization in wheat. Genetics 192, 763–774 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Biehl, P. F. et al. Ancient DNA from 8400 year-old catalhöyük wheat: implications for the origin of neolithic agriculture. PLoS ONE 11, e0151974 (2016).

    Article 

    Google Scholar
     

  • Zhao, X. B. et al. Population genomics unravels the Holocene history of bread wheat and its relatives. Nat. Plants 9, 403–419 (2023).

    Article 
    PubMed 

    Google Scholar
     

  • Michael F, S. et al. A 3,000-year-old Egyptian emmer wheat genome reveals dispersal and domestication history. Nat. Plants 5, 1120–1128 (2019).

    Article 

    Google Scholar
     

  • Mcclatchie, M. et al. Neolithic farming in north-western Europe: archaeobotanical evidence from Ireland. J. Archaeol. Sci. 51, 206–215 (2014).

    Article 

    Google Scholar
     

  • Liu, X. et al. From ecological opportunism to multi-cropping: mapping food globalisation in prehistory. Quat. Sci. Rev. 206, 21–28 (2019).

    Article 
    ADS 

    Google Scholar
     

  • Hao, C. et al. Resequencing of 145 landmark cultivars reveals asymmetric sub-genome selection and strong founder genotype effects on wheat breeding in China. Mol. Plant 13, 1733–1751 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Zhuang, Q. S. Chinese Wheat Improvement and Pedigree Analysis [Chinese] (Agricultural Press, 2003).

  • Murukarthick, J., Mona, S., Nils, S. & Martin, M. Building pan-genome infrastructures for crop plants and their use in association genetics. DNA Res. 28, dsaa030 (2021).

    Article 

    Google Scholar
     

  • Lei, L., Goltsman, E., Goodstein, D., Wu, G. A. & Vogel, J. P. Plant pan-genomics comes of age. Annu. Rev. Plant Biol. 72, 411–435 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Mona, S., Murukarthick, J., Nils, S. & Martin, M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat. Rev. Genet. https://doi.org/10.1038/s41576-024-00691-4 (2024).

  • Zhang, X. Y. & Appels, R. in The Wheat Genome (eds Appels, R. et al.) 93–111 (Springer, 2023).

  • Castillo, F. A. The Oxford Handbook of the Archaeology of Diet (Oxford Univ. Press, 2015).

  • Simon G, K. et al. A putative ABC transporter confers durable resistance to multiple fungal pathogens in wheat. Science 323, 1360–1363 (2009).

    Article 
    ADS 

    Google Scholar
     

  • Fu, D. et al. A kinase-START gene confers temperature-dependent resistance to wheat stripe rust. Science 323, 1357–1360 (2009).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, B. et al. De novo genome assembly and analyses of 12 founder inbred lines provide insights into maize heterosis. Nat. Genet. 55, 312–323 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542–3558.e16 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Song, L. et al. Reducing brassinosteroid signalling enhances grain yield in semi-dwarf wheat. Nature 617, 118–124 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Németh, A. & Längst, G. Genome organization in and around the nucleolus. Trends Genet. 27, 149–156 (2011).

    Article 
    PubMed 

    Google Scholar
     

  • Kishii, M. & Mao, L. Synthetic hexaploid wheat: yesterday, today, and tomorrow. Engineering 4, 552–558 (2018).

    Article 

    Google Scholar
     

  • Guo, W. et al. Origin and adaptation to high altitude of Tibetan semi-wild wheat. Nat. Commun. 11, 5085 (2020).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhou, Y. et al. Triticum population sequencing provides insights into wheat adaptation. Nat. Genet. 52, 1412–1422 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Monat, C., Padmarasu, S., Lux, T., Wicker, T. & Mascher, M. TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biol. 20, 284 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Athiyannan, N. et al. Long-read genome sequencing of bread wheat facilitates disease resistance gene cloning. Nat. Genet. 54, 227–231 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kale, S. M. et al. A catalogue of resistance gene homologs and a chromosome-scale reference sequence support resistance gene mapping in winter wheat. Plant Biotechnol. J. 20, 1730–1742 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, B. et al. Wheat centromeric retrotransposons: the new ones take a major role in centromeric structure. Plant J. 73, 952–965 (2013).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620, 830–838 (2023).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, Z. et al. Dispersed emergence and protracted domestication of polyploid wheat uncovered by mosaic ancestral haploblock inference. Nat. Commun. 13, 3891 (2022).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cheng, H., Liu, J., Wen, J., Nie, X. & Jiang, Y. Frequent intra- and inter-species introgression shapes the landscape of genetic variation in bread wheat. Genome Biol. 20, 136 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Oliver, S. N., Finnegan, E. J., Dennis, E. S., Peacock, W. J. & Trevaskis, B. Vernalization-induced flowering in cereals is associated with changes in histone methylation at the VERNALIZATION1 gene. Proc. Natl Acad. Sci. USA 106, 8386–8391 (2009).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161.e23 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, G. et al. A high-quality genome assembly highlights rye genomic characteristics and agronomically important genes. Nat. Genet. 53, 574–584 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rabanus-Wallace, M. T. et al. Chromosome-scale genome assembly provides insights into rye biology, evolution and agronomic potential. Nat. Genet. 53, 564–573 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gabay, G., Zhang, J., Burguener, G. F., Howell, T. & Dubcovsky, J. Structural rearrangements in wheat (1BS)–rye (1RS) recombinant chromosomes affect gene dosage and root length. Plant Genome 14, e20079 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Zhou, Y. et al. Introgressing the Aegilops tauschii genome into wheat as a basis for cereal improvement. Nat. Plants 7, 774–786 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Song, J. M. et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat. Plants 6, 34–45 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Saayman, X., Graham, E., Nathan, W. J., Nussenzweig, A. & Esashi, F. Centromeres as universal hotspots of DNA breakage, driving RAD51-mediated recombination during quiescence. Mol. Cell 83, 523–538.e7 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Nambiar, M. & Smith, G. R. Pericentromere-Specific cohesin complex prevents meiotic pericentric DNA double-strand breaks and lethal crossovers. Mol. Cell 71, 540–553.e4 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • He, F. et al. Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome. Nat. Genet. https://doi.org/10.1038/s41588-019-0382-2 (2019).

  • Zhao, J. et al. Centromere repositioning and shifts in wheat evolution. Plant Commun. 4, 100556 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Scott A, B. et al. Ppd-1 is a key regulator of inflorescence architecture and paired spikelet development in wheat. Nat. Plants 1, 14016 (2015).

    Article 

    Google Scholar
     

  • Yan, L. L. et al. The wheat VRN2 gene is a flowering repressor down-regulated by vernalization. Science 303, 1640–1644 (2004).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yan, L. et al. Positional cloning of the wheat vernalization gene VRN1. Proc. Natl Acad. Sci. USA 100, 6263–6268 (2003).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hazen, S. P. et al. Copy number variation affecting the Photoperiod-B1 and Vernalization-A1 genes is associated with altered flowering time in wheat (Triticum aestivum). PLoS ONE https://doi.org/10.1371/journal.pone.0033234 (2012).

  • Würschum, T., Boeven, P. H. G., Langer, S. M., Longin, C. F. H. & Leiser, W. L. Multiply to conquer: copy number variations at Ppd-B1 and Vrn-A1 facilitate global adaptation in wheat. BMC Genet. 16, 96 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Giroux, M. J. & Morris, C. F. Wheat grain hardness results from highly conserved mutations in the friabilin components puroindoline a and b. Proc. Natl Acad. Sci. USA 11, 6262–6266 (1998).

    Article 
    ADS 

    Google Scholar
     

  • Xie, T. et al. De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. Mol. Plant 8, 489–492 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR assembly index (LAI). Nucleic Acids Res. 46, e126 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Burkhard, S. et al. The NLR-Annotator tool enables annotation of the intracellular immune receptor repertoire. Plant Physiol. 183, 468–482 (2020).

    Article 

    Google Scholar
     

  • Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics https://doi.org/10.1002/0471250953.bi0410s05 (2009).

  • Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yu, X. J., Zheng, H. K., Wang, J., Wang, W. & Su, B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics 88, 745–751 (2006).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Guigo, R. Assembling genes from predicted exons in linear time with dynamic programming. J. Comput. Biol. 5, 681–702 (1998).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ghosh, S. & Chan, C. K. Analysis of RNA-seq data using TopHat and Cufflinks. Methods Mol. Biol. 1374, 339–361 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215 (2009).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Weber, J. A., Aldana, R., Gallagher, B. D. & Edwards, J. S. Sentieon DNA pipeline for variant detection-Software-only solution, over 20× faster than GATK 3.3 with identical results. PeerJ PrePrints 4, e1672v1672 (2016).


    Google Scholar
     

  • McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Marcais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chiang, C. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 12, 966–968 (2015).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).

    Article 
    PubMed 

    Google Scholar
     

  • Laurens, V. D. M. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014).

    MathSciNet 

    Google Scholar
     

  • Yang, Z. et al. ggComp enables dissection of germplasm resources and construction of a multiscale germplasm network in wheat. Plant Physiol. 188, 1950–1965 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gao, F., Ming, C., Hu, W. & Li, H. New software for the fast estimation of population recombination rates (FastEPRR) in the genomic era. G3 6, 1563–1571 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64 (2009).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chen, Y. et al. A collinearity-incorporating homology inference strategy for connecting emerging assemblies in the Triticeae tribe as a pilot practice in the plant pangenomic era. Mol. Plant 13, 1694–1708 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Ma, S. et al. WheatOmics: a platform combining multiple omics data to accelerate functional genomics studies in wheat. Mol. Plant 14, 1965–1968 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • He, W. et al. NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes. Bioinformatics 39, btad121 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Han, F., Lamb, J. C. & Birchler, J. A. High frequency of centromere inactivation resulting in stable dicentric chromosomes of maize. Proc. Natl Acad. Sci. USA 103, 3238–3243 (2006).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Fu, S., Chen, L., Wang, Y., Li, M. & Tang, Z. Oligonucleotide probes for ND-FISH analysis to identify rye and wheat chromosomes. Sci. Rep. 5, 10552 (2015).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tang, Z., Yang, Z. & Fu, S. Oligonucleotides replacing the roles of repetitive sequences pAs1, pSc119.2, pTa-535, pTa71, CCS1, and pAWRC.1 for FISH analysis. J. Appl. Genet. 55, 313–318 (2014).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

[ad_2]

Source link

  • Structural variation in the pangenome of wild and domesticated barley

    Structural variation in the pangenome of wild and domesticated barley

    [ad_1]

    Plant growth and high-molecular-weight DNA isolation

    Twenty-five seeds each from the selected accessions (Supplementary Tables 1 and 7) were sown on 16-cm-diameter pots with compost soil. Plants were grown under greenhouse conditions with sodium halogen artificial 21 °C in the day for 16 h and 18 °C at night for 8 h. Leaves (8 g) were collected from 7-day-old seedlings, ground with liquid nitrogen to a fine powder and stored at −80 °C.

    High-molecular-weight (HMW) DNA was purified from the powder, essentially as described56. In brief, nuclei were isolated, digested with proteinase K and lysed with SDS. Here, a standard watercolour brush with synthetic hair (size 8) was used to re-suspend the nuclei for digestion and lysis. HMW DNA was purified using phenol–chloroform extraction and precipitation with ethanol as described56. Subsequently, the HMW DNA was dissolved in 50 ml of TE (pH 8.0) and precipitated by the addition of 5 ml of 3 M sodium acetate (pH 5.2) and 100 ml of ice-cold ethanol. The suspension was mixed by slow circular movements resulting in the formation of a white precipitate (HMW DNA), which was collected using a wide-bore 5 ml pipette tip and transferred for 30 s into a tube containing 5 ml of 75% ethanol. The washing was repeated twice. The HMW DNA was transferred into a 2 ml tube using a wide-bore tip, collected with a polystyrene spatula, air-dried in a fresh 2 ml tube and dissolved in 500 µl of 10 mM Tris-Cl (pH 8.0). For quantification, the Qubit dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific) was used. The DNA size-profile was recorded using the Femto Pulse system and the Genomic DNA 165 kb kit (Agilent). In typical experiments the peak of the size-profile of the HMW DNA for library preparation was around 165 kb.

    DNA library preparation and PacBio HiFi sequencing

    For fragmentation of the HMW DNA into 20 kb fragments, a Megaruptor 3 device (speed: 30) was used (Diagenode). A minimum of two HiFi SMRTbell libraries were prepared for each barley genotype following essentially the manufacturer’s instructions and the SMRTbell Express Template Prep Kit (Pacific Biosciences). The final HiFi libraries were size-selected (narrow-size range: 18–21 kb) using the SageELF system with a 0.75% Agarose Gel Cassette (Sage Sciences) according to standard manufacturer protocols.

    HiFi circular consensus sequencing (CCS) reads were generated by operating the PacBio Sequel IIe instrument (Pacific Biosciences) following the manufacturer’s instructions. Per genotype, about four 8M SMRT cells (average yield: 24 gigabases HiFi CCS per 8M SMART cell) were sequenced to obtain an approximate haploid genome coverage of about 20-fold. In typical experiments the concentration of the HiFi library on plate was 80–95 pM. We used 30 h movie time, 2 h pre-extension and sequencing chemistry v.2.0. The resulting raw data were processed using the CCS4 algorithm (https://github.com/PacificBiosciences/ccs).

    Hi-C library preparation and Illumina sequencing

    In situ Hi-C libraries were prepared from 1-week-old barley seedlings on the basis of the previously published protocol13. Dovetail Omni-C data were generated for Bowman, Aizu6, Golden Melon and 10TJ18 as per the manufacturer’s instructions (https://dovetailgenomics.com/products/omni-c-product-page/). Sequencing and Hi-C raw data processing was performed as described before57,58.

    Genome sequence assembly and validation

    PacBio HiFi reads were assembled using hifiasm (v.0.11-r302)59. Pseudomolecule construction was done with the TRITEX pipeline60. Chimeric contigs and orientation errors were identified through manual inspection of Hi-C contact matrices. Genome completeness and consensus accuracy were evaluated using Merqury (v.1.3)61. Levels of duplication and heterozygosity were assessed with Merqury and FindGSE (v.1.94)62. Further, we estimated heterozygosity in the HiFi reads with a k-mer approach. We selected 35,202 bi-allelic SNPs from a genebank genomic study3. For each SNP we extracted the flanking sequences (±15 bp) from the SNP positions and put either SNP in the middle to obtain 31-mers for the reference and alternative alleles. The FASTA sequences of the k-mers are available from https://bitbucket.org/ipkdg/het_estimation. We counted the occurrence of these k-mers in the HiFi FASTQ files using BBDuk (https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/) with the parameter ‘rpkm’. Cenotype calling and the heterozygosity estimation were done in R. The full workflow is available from https://bitbucket.org/ipkdg/het_estimation.

    Single-copy pangenome construction

    The single-copy regions in each chromosome-level assembly were identified by filtering 31-mers occurring more than once in the genomic regions by BBDuk (BBMap_37.93, https://jgi.doe.gov/data-and-tools/software-tools/bbtools). BBMap was used to count k-mer occurrences in each genome with the parameter –mincount 2. Then, non-unique genomic regions (that is, those composed of k-mers occurring at least twice) were masked by BBDuk on the basis of k-mer counts. Single-copy regions extracted in BED format and their sequences (with the command ‘bedtools complement’) were retrieved using BEDTools (v.2.29.2)63. The single-copy sequences were clustered using MMseqs2 (Many-against-Many sequence searching)64 with the parameters ‘–cluster-mode’ and setting over 95% sequence identity. A representative from each cluster (the largest in a cluster) was selected to estimate the pangenome size.

    Illumina resequencing

    A total of 1,000 PGRs and 315 elite barley cultivars (Supplementary Table 6) were used for whole-genome resequencing. Illumina Nextera libraries were prepared and sequenced on an Illumina NovaSeq 6000 at IPK Gatersleben (Supplementary Table 6).

    SNP and SV calling

    Reciprocal genome alignment, in which each of the pangenome assemblies was aligned to the MorexV3 assembly with the latter acting either as alignment query or reference, was done with Minimap2 (v.2.20)65. From the resultant two alignment tables, indels were called by Assemblytics (v.1.2.1)66 and only deletions were selected in both alignments to convert into presence/absence variants relative to the Morex reference genome. Further, balanced rearrangements (inversions, translocations) were scanned for with SyRI67. To call SNPs, raw sequencing reads were trimmed using cutadapt (v.3.3)68 and aligned to the MorexV3 reference genome using Minimap2 (v.2.20)65. The resulting alignments were sorted with Novosort (v.3.09.01) (http://www.novocraft.com). BCFtools (v.1.9)69 was used to call SNPs and short indels. A genome-wide association study was performed in GEMMA (v.0.98.1)70 using default parameters with a mixed linear model and an estimated kinship matrix. Read depth was calculated at each complex locus in each accession. The raw HiFi reads were aligned to the respective genome using minimap2 (ref. 71) and the median depth per locus was calculated using mosdepth (v.0.2.6)72.

    Linkage disequilibrium in the Barke x HID055 population

    Linkage disequilibrium between each pair of SNPs (both intrachromosomal and interchromosomal) was calculated as the squared Pearson product-moment correlation between the quantitative identity-by-descent (IBD) matrix scores presented in Additional File 1 of ref. 73 (https://datadryad.org/stash/dataset/doi:10.5061/dryad.36rm1). The linkage disequilibrium plot was created with SAS PROC TEMPLATE and SGRENDER (SAS Institute) on the genetic map from ref. 18.

    Preparation and Illumina sequencing of narrow-size whole-genome sequencing libraries for core50

    First, 10 µg of DNA in 130 µl was sheared in tubes (Covaris microTUBE AFA Fiber Pre-Slit Snap Cap) to an average size of approximately 250 bp using a Covaris S220 focused-ultrasonicator (peak incidence power: 175 W, duty factor: 10%; cycles per burst: 200; time: 180 s) according to standard manufacturer protocols (Covaris). The sheared DNA was size-selected using a BluePippin device and a 1.5% agarose cassette with internal R2 marker (Sage Sciences). A tight size setting at 260 bp was used for the purification of fragments in the narrow range of 200–300 bp (typical yield: 1–3 µg). The size-selected DNA was used for the preparation of PCR-free whole-genome sequencing (WGS) libraries using the Roche KAPA Hyper Prep kit according to the manufacturer’s protocols (Roche Diagnostics). A total of 10–12 libraries were provided with unique barcodes, pooled at equimolar concentrations and quantified by quantitative PCR using the KAPA Library Quantification Kit for Illumina Platforms according to standard protocols (Roche Diagnostics). The pools were sequenced (2 × 151 bp, paired-end) using four S4 XP flowcells and the Illumina NovaSeq 6000 system (Illumina) at IPK Gatersleben.

    Contig assembly of core50 sequencing data

    Raw reads were demultiplexed on the basis of index sequences and duplicate reads were removed from the sequencing data using Fastuniq74. The read1 and read2 sequences were merged on the basis of the overlap using bbmerge.sh from bbmap (v.37.28)75. The merged reads were error-corrected using BFC (v.181)76. The error-corrected merged reads were used as an input for Minia3 (v.3.2.0)77 to assemble reads into unitigs with the following parameters, -no-bulge-removal -no-tip-removal -no-ec-removal -out-compress 9 -debloom original. The Minia3 source was assembled to enable k-mer size up to 512 as described in the Minia3 manual. Iterative Minia3 runs with increasing k-mer sizes (100, 150, 200, 250 and 300) were used for assembly generation as provided in the GATB Minia pipeline (https://github.com/GATB/gatb-minia-pipeline). In the first iteration, k-mer size of 50 was used to assemble input reads into unitigs. In the next runs, the input reads as well as the assembly of the previous iteration were used as input for the Minia3 assembler. BUSCO analysis was conducted on the contig assemblies using BUSCO (v.3.0.2) with embryophyta_odb9 dataset14. In addition, high-confidence gene models from the Morex V3 reference9 were aligned to the contig assemblies to assess completeness, with the parameters of greater than or equal to 90% query coverage and greater than or equal to 97% identity.

    Pangenome accessions in diversity space

    Pseudo-FASTQ paired-end reads (tenfold coverage) were generated from the 76 pangenome assemblies with fastq_generator (https://github.com/johanzi/fastq_generator) and aligned to the MorexV3 reference genome sequence assembly9 using Minimap2 (v.2.24-r1122, ref. 65). SNPs were called together with short-read data (Supplementary Table 6) using BCFtools78 v.1.9 with the command ‘mpileup -q 20 -Q20 –excl-flags 3332’. To plot the diversity space of cultivated barley, the resultant variant matrix was merged with that of 19,778 domesticated barleys from ref. 3 (genotyping-by-sequencing (GBS) data). SNPs with more than 20% missing or more than 20% heterozygous calls were discarded. Principal component analysis was done with smartpca79 v.7.2.1. To represent the diversity of wild barleys, we used published GBS and WGS data of 412 accessions of that taxon8,54. Variant calling for GBS data was done with BCFtools78 (v.1.9) using the command ‘mpileup -q 20 -Q20’. The resultant variant matrix was filtered as follows: (1) only bi-allelic SNP sites were kept; (2) homozygous genotype calls were retained if their read depth was greater than or equal to 2 and less than or equal to 50 and set to missing otherwise; (3) heterozygous genotype calls were retained if the read depth of both alleles was greater than or equal to 2 and set to missing otherwise. SNPs with more than 20% missing, more than 20% heterozygous calls or a minor allele frequency below 5% were discarded. Principal component analysis was done with smartpca79 v.7.2.1. A matrix of pairwise genetic distances on the basis of identity-by-state (IBS) was computed with Plink2 (v.2.00a3.3LM, ref. 80) and used to construct a neighbour-joining tree with Fneighbor (http://emboss.toulouse.inra.fr/cgi-bin/emboss/fneighbor) in the EMBOSS package81. The tree was visualized with Interactive Tree Of Life (iTOL)82.

    Haplotype representation

    Pangenome assemblies were mapped to MorexV3 as described above (‘Pangenome accessions in diversity space’). Read depth was calculated with SAMtools78 v.1.16.1. Genotype calls were set to missing if they were supported by fewer than two reads. IBS was calculated with PIink2 (v.2.000a3.3LM, ref. 80) in 1 Mb windows (shift: 0.5 Mb) using the using command ‘–sample-diff counts-only counts-cols=ibs0, ibs1’. Windows that in one of both accessions in the comparison had twofold coverage over less than 200 kb were set to missing. The number of differences (d) in a window was calculated as ibs0 + ibs1/2, where ibs0 is the number of homozygous differences and ibs1 that of heterozygous ones. This distance was normalized for coverage by the formula d/i × 1 Mb, where i is the size in bp of the region covered in both accessions in the comparison that had at least twofold coverage. In each window, we determined for each among the PGRs and cultivars panel the closest pangenome accession according to the coverage-normalized IBS distance. Only accessions with fewer than 10% missing windows due to low coverage were considered, leaving 899 PGRs and 264 cultivars.

    The distance to the closest pangenome accession was plotted with the R package ggplot2 to determine the threshold for similarity (Extended Data Fig. 2d).

    Transcriptome sequencing for gene annotation

    Data for transcript evidence-based genome annotation were provided by the International Barley Pan-Transcriptome Consortium, and a detailed description of sample preparation and sequencing is provided elsewhere83. In brief, the 20 genotypes sequenced for the first version of the barley pangenome8 were used for transcriptome sequencing. Five separate tissues were sampled for each genotype. These were: embryo (including mesocotyl and seminal roots), seedling shoot, seedling root, inflorescence and caryopsis. Three biological replicates were sampled from each tissue type, amounting to 330 samples. Four samples failed quality control and were excluded.

    Preparation of the strand-specific dUTP RNA-seq libraries and Illumina paired-end 150 bp sequencing were carried out by Novogene. In addition, PacBio Iso-Seq sequencing was carried out using a PacBio Sequel IIe sequencer at IPK Gatersleben. For this, a single sample per genotype was obtained by pooling equal amounts of RNA from a single replicate from all five tissues. Each sample was sequenced on an individual 8M SMRT cell.

    De novo gene annotation

    Structural gene annotation was done by combining de novo gene calling and homology-based approaches with RNA-seq, Iso-Seq and protein datasets (Extended Data Fig. 3a). Using evidence derived from expression data, RNA-seq data were first mapped using STAR84 (v.2.7.8a) and subsequently assembled into transcripts by StringTie85 (v.2.1.5, parameters -m 150-t -f 0.3). Triticeae protein sequences from available public datasets (UniProt86, https://www.uniprot.org, 10 May 2016) were aligned against the genome sequence using GenomeThreader87 (v.1.7.1; arguments -startcodon -finalstopcodon -species rice -gcmincoverage 70 -prseedlength 7 -prhdist 4). Iso-Seq datasets were aligned to the genome assembly using GMAP88 (v.2018-07-04). All assembled transcripts from RNA-seq, Iso-Seq and aligned protein sequences were combined using Cuffcompare89 (v.2.2.1) and subsequently merged with StringTie (v.2.1.5, parameters –merge -m150) into a pool of candidate transcripts. TransDecoder (v.5.5.0; http://transdecoder.github.io) was used to identify potential ORFs and to predict protein sequences within the candidate transcript set.

    Ab initio annotation was initially done using Augustus90 (v.3.3.3). GeneMark91 (v.4.35) was additionally used to further improve structural gene annotation. To avoid potential over-prediction, we generated guiding hints using the above-described RNA-seq, protein and Iso-Seq datasets as described before92. A specific Augustus model for barley was built by generating a set of gene models with full support from RNA-seq and Iso-Seq. Augustus was trained and optimized following a published protocol92. All structural gene annotations were joined using EVidenceModeller93 (v.1.1.1), and weights were adjusted according to the input source: ab initio (Augustus: 5, GeneMark: 2), homology-based (10). Additionally, two rounds of PASA94 (v.2.4.1) were run to identify untranslated regions and isoforms using the above-described Iso-Seq datasets.

    We used BLASTP95 (ncbi-blast-2.3.0+, parameters -max_target_seqs 1 -evalue 1e–05) to compare potential protein sequences with a trusted set of reference proteins (Uniprot Magnoliophyta, reviewed/Swissprot, downloaded on 3 August 2016; https://www.uniprot.org). This differentiated candidates into complete and valid genes, non-coding transcripts, pseudogenes and TEs. In addition, we used PTREP (release 19; http://botserv2.uzh.ch/kelldata/trep-db/index.html), a database of hypothetical proteins containing deduced amino acid sequences in which internal frameshifts have been removed in many cases. This step is particularly useful for the identification of divergent TEs with no significant similarity at the DNA level. Best hits were selected for each predicted protein from each of the three databases. Only hits with an e-value below 10 × 10−10 were considered. Furthermore, functional annotation of all predicted protein sequences was done using the AHRD pipeline (https://github.com/groupschoof/AHRD).

    Proteins were further classified into two confidence classes: high and low. Hits with subject coverage (for protein references) or query coverage (transposon database) above 80% were considered significant and protein sequences were classified as high-confidence using the following criteria: protein sequence was complete and had a subject and query coverage above the threshold in the UniMag database or no BLAST hit in UniMag but in UniPoa and not PTREP; a low-confidence protein sequence was incomplete and had a hit in the UniMag or UniPoa database but not in PTREP. Alternatively, it had no hit in UniMag, UniPoa or PTREP, but the protein sequence was complete. In a second refinement step, low-confidence proteins with an AHRD score of 3* were promoted to high-confidence.

    Gene projections

    Gene contents of the remaining 56 barley genotypes were modelled by the projection of high-confidence genes on the basis of evidence-based gene annotations of the 20 barley genotypes described above. The approach was similar to and built upon a previously described method8. To reduce computational load, 760,078 high-confidence genes of the 20 barley annotations were clustered by cd-hit96 requiring 100% protein sequence similarity and a maximal size difference of four amino acids. The resulting 223,182 source genes were subsequently used for all downstream projections as the non-redundant transcript set representative for the evidence-based annotations. For each source, its maximal attainable score was determined by global protein self-alignment using the Needleman–Wunsch algorithm as implemented in Biopython97 v.1.8 and the blosum62 substitution matrix98 with a gap open and extension penalty of 0.5 and 10.0, respectively.

    Next, we surveyed each barley genome sequence using minimap2 (ref. 65) with options ‘-ax splice:hq’ and ‘-uf’ for genomic matches of source transcripts. Each match was scored by its pairwise protein alignment with the source sequence that triggered the match. Only complete matches with start and stop codons and a score greater than or equal to 0.85 of the source self-score (see above) were retained. The source models were classified into four bins by decreasing confidence qualities: with or without pfam domains, plastid- and transposon-related genes. Projections were performed stepwise for the four qualities, starting from the highest to the lowest. In each quality group, matches were then added into the projected annotation if they did not overlap with any previously inserted model by their coding region. Insertion order progressed from the top to the lowest scoring match. In addition, we tracked the number of insertions for each source by its identifier. For the two top quality categories, we performed two rounds of projections, first inserting each source maximally only once followed by rounds allowing one source inserted multiple times into the projected annotation. To consolidate the 20 evidence-based, initial annotations for any genes potentially missed, we used an identical approach but inserted any non-overlapping matches starting from the previous RNA-seq-based annotation. A detailed description of the projection workflow, parameters and code is provided at the GitHub repository (https://github.com/GeorgHaberer/gene_projection/tree/main/panhordeum). An overview of the projection scheme can be found in the parent directory of the repository. Because complex loci contain numerous pseudogenes, the loci were searched by BLASTN99 for sequences homologous to annotated genes but not present in the set of annotated genes. Pseudogenes were accepted if they covered at least 80% of a gene homologue.

    Definition of core, cloud and shell genes

    Phylogenetic HOGs on the basis of the primary protein sequences from 76 annotated barley genotypes were calculated using Orthofinder100 v.2.5.5 (standard parameters). The scripts for calculation of core/shell and cloud genes have been deposited in the repository https://github.com/PGSB-HMGU/BPGv2. Core HOGs contain at least one gene model from all 76 barley genotypes included in the comparison. Shell HOGs contain gene models from at least two barley genotypes and at most 75 barley genotypes. Genes not included in any HOG (‘singletons’), or clustered with genes only from the same genotype, were defined as cloud genes. GENESPACE101 was used to determine syntenic relationships between the chromosomes of all 76 genotypes.

    Annotation of TEs

    The 20 barley accessions with expression data were softmasked for transposons before the de novo gene detection using the REdat_9.7_Triticeae section of the PGSB transposon library102. Vmatch (http://www.vmatch.de) was used as matching tool with the following parameters: identity > =70%, minimal hit length 75 bp, seedlength 12 bp (vmmatch -d -p -l 75 -identity 70 -seedlength 12 -exdrop 5 -qmaskmatch tolower). The percentage masked was around 84% and almost identical for all 20 accessions.

    Full-length long terminal repeat retrotransposon candidate elements were detected de novo for each of the 76 barley accessions by their structural hallmarks with LTRharvest103 followed by LTRdigest104. Both programs are contained in genometools87 (http://github.com/genometools/genometools, v.1.5.10). LTRharvest identifies within the specified parameters long terminal repeats and target site duplications whereas LTRdigest was used to determine polypurine tracts and primer binding sites. The transfer RNA library needed as input for the primer binding sites was beforehand created by running tRNAscan-SE-1.3 (ref. 105) on each assembly. The parameter settings for LTRharvest were: ‘-overlaps best -seed 30 -minlenltr 100 -maxlenltr 2000 -mindistltr 3000 -maxdistltr 25000 -similar 85 -mintsd 4 -maxtsd 20 -motif tgca -motifmis 1 -vic 60 -xdrop 5 -mat 2 -mis -2 -ins -3 -del -3 -longoutput’; for LTRdigest: ‘-pptlen 8 30 -uboxlen 3 30 -pptradius 30 -pbsalilen 10 30 -pbsoffset 0 10 -pbstrnaoffset 0 30 -pbsmaxedist 1 -pbsradius 30’. The insertion age of each long terminal repeat retrotransposon instance was calculated from the divergence of its 5′ and 3′ long terminal repeat sequences using a random mutation rate of 1.3 × 10−8 (ref. 106).

    Whole-genome pangenome graphs

    Genome graphs were constructed using Minigraph19 v.0.20-r559. Other graph construction tools (PGGB107, Minigraph-Cactus108) turned out to be computationally prohibitive for a genome of this size and complexity, combined with the large number of accessions used in this investigation. Minigraph does not support small variants (less than 50 bp), thus graph complexity is lower than with other tools. However, even with Minigraph, graph construction at the whole-genome level was computationally prohibitive and thus graphs had to be computed separately for each chromosome, precluding detection of interchromosomal translocations.

    Graph construction was initiated using the Morex V3 assembly9 as a reference. The remaining assemblies were added into the graph sequentially, in order of descending dissimilarity to Morex. SVs were called after each iteration using gfatools bubble (v.0.5-r250-dirty, https://github.com/lh3/gfatools). Following graph construction, the input sequences of all accessions were mapped back to the graph using Minigraph with the ‘–call’ option enabled, which generates a path through the graph for each accession. The resulting BED format files were merged using Minigraph’s mgutils.js utility script to convert them to P lines and then combined with the primary output of Minigraph in the proprietary RGFA format (https://github.com/lh3/gfatools/blob/master/doc/rGFA.md). Graphs were then converted from RGFA format to GFA format (https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md) using the ‘convert’ command from the vg toolkit109 v.1.46.0 ‘Altamura’. This step ensures that graphs are compatible with the wider universe of graph processing tools, most of which require GFA format as input. Chromosome-level graphs were then joined into a whole-genome graph using vg combine. The combined graph was indexed using vg index and vg gbwt, two components of the vg toolkit109.

    General statistics for the whole-genome graph were computed with vg stats. Graph growth was computed using the heaps command from the ODGI toolkit110 v.0.8.2-0-g8715c55, followed by plotting with its companion script heaps_fit.R. The latter also computes values for gamma, the slope coefficient of Heap’s law which allows the classification of pangenome graphs into open or closed pangenomes, that is, a prediction of whether the addition of further accessions would increase the size of the pangenome111.

    SV statistics were computed on the basis of the final BED file produced after the addition of the last line to the graph. A custom shell script was used to classify variants according to the Minigraph custom output format. This allows the extraction of simple, that is, non-nested, indels (relative to the MorexV3 graph backbone), as well simple inversions. The remaining SVs fall into the ‘complex’ category in which there can be multiple levels of nesting of different variant types and this precluded further, more fine-grained classification. To compute overlap with the SVs from Assemblytics, a custom script was used to extract the variant coordinates from both sets, and bedtools intersect63 was then used to compute their intersection on the basis of a spatial overlap of 70%.

    To elucidate the effect of a graph-based reference on short-read mapping, we obtained WGS Illumina reads from five barley samples (Extended Data Fig. 4b) in the European Nucleotide Archive and mapped these onto the whole-genome graph using vg giraffe112. For comparison with the standard approach of mapping reads to a linear single genome reference, we mapped the same reads to the MorexV3 reference genome sequence assembly9 with bwa mem113 v.0.7.17-r1188. Mapping statistics were computed with vg109 stats and samtools78 stats (v.1.9), respectively.

    To elucidate tool bias as a confounding factor in the comparison between the mappings, we first produced a linearized version of the pangenome graph using gfatools gfa2fa (https://github.com/lh3/gfatools) and then mapped the WGS reads from all five accessions to this new reference sequence, using BWA mem as before for the cv. Morex V3 reference sequence. This allows a more appropriate comparison between the single cultivar reference sequence and the pangenome sequence without being affected by algorithmic differences between the tools used (BWA/giraffe). Mappings were filtered to retain only reads with zero mismatches, using sambamba114. For the graph mappings, the ‘Total perfect’ statistic from the vg stats output of the GAM files was used.

    To investigate the srh1 paths in the pangenome graph, we first extracted all nodes from the graph into a FASTA file and then used the enhancer region identified in cv. Barke as associated with the long-haired srh1 phenotype (chr5H:496,182,748-496,187,020) as query in a BLAST search against the nodes. This recovered five nodes with an identity percentage value of greater than 98%. We then used vg find from the vg toolkit v.1.56.0 (ref. 109) to extract a subgraph from the full graph (with a graph context of five steps either side) using the node identifiers. The subgraph was then plotted using odgi viz from the ODGI toolkit v.0.8.3-26-gbc7742ed (ref. 110).

    To genotype samples from the core800 collection against the srh1 region of the graph, we first identified a small set of four samples each with either the short- or long-haired phenotype, picked at random from a group of core800 samples that all shared the same WGS read depth (5×). These samples were HOR_1102, HOR_17654, HOR_4065, HOR_1264, HOR_14704, HOR_7629, HOR_17678 and HOR_11406. We then mapped their Illumina WGS reads to the full pangenome graph using vg giraffe112 and extracted a subgraph of the mappings with vg chunk109. The subgraph was then genotyped using vg pack and vg call with cv. Barke as the reference accession, following the approach proposed in ref. 115. Variants in the resulting VCF files were identified using a simple grep command with the identifiers of the five nodes recovered with the Barke sequence as described above. Scripts used here are available at https://github.com/mb47/minigraph-barley/tree/main/scripts/srh1_analysis.

    Analysis of the Mla locus

    The coordinates and sequences of the 32 genes present at the Mla locus were extracted from the MorexV3 genome sequence assembly9. To find the corresponding position and copy number in each of the 76 genomes, we used BLAST95 (-perc_identity: 90, -word_size: 11, all other parameters set as default). The expected BLAST result for a perfectly conserved allele is a long fragment (exon_1) of 2,015 bp followed by a gap of approximately 1,000 bp due to the intron and another fragment (exon_2) of 820 bp. To detect the number of copies, first multiple BLAST results for a single gene were merged if two different BLAST segments were within 1.1 kb. Then only if the total length of the input was found, this was counted as a copy. To analyse the structural variation across all 76 accessions, the non-filtered BLAST results were plotted in a region of −20,000 and +500,000 base pairs around the start of the BPM gene HORVU.MOREX.r3.1HG0004540 that was used as an anchor (present in all 76 lines; Supplementary Figs. 5 and 6). To detect the different Mla alleles, three different thresholds of -Perc_identity for the BLAST were used: 100, 99 and 98.

    Scan for structurally complex loci

    We used a pipeline developed in ref. 27 that performs sequence-agnostic identification of long-duplication-prone regions (henceforth, complex regions) in a reference genome, followed by identification of gene families with a statistical tendency to occur within complex regions. The pipeline assumes that a candidate long, duplication-prone region will contain an elevated concentration of locally repeated sequences in the kb-scale length range. We first aligned the MorexV3 genome sequence assembly9 against itself using lastz116 (v.1.04.03; arguments: ‘–notransition –step=500 –gapped’). For practicality purposes, this was done in 2 Mb blocks with a 200 kb overlap, and any overlapping complex regions identified in multiple windows were merged. For each window, we ignored the trivial end-to-end alignment, and, of the remaining alignments, retained only those longer than 5 kb and falling fully within 200 kb of one and another. An alignment ‘density’ was calculated over the chromosome by calculating, at ‘interrogation points’ spaced equally at 1 kb intervals along the length of the chromosome, an alignment density score which is simply the sum of all the lengths of any of the filtered alignments spanning that interrogation point. A Gaussian kernel density (bandwidth 10 kb) was calculated over these interrogation points, weighted by their scores. To allow comparability between windows, the interrogation point densities were normalized by the sum of scores in the window. Runs of interrogation points at which the density surpassed a minimum density threshold were flagged as complex regions. A few minor adjustments to these regions (merging of overlapping regions, and trimming the end coordinates to ensure the stretches always begin and end in repeated sequence) yielded the final tabulated list of complex regions and their positions in the MorexV3 genome assembly (Supplementary Table 8). The method was implemented in R, making use of the package data.table. Genes in each long, duplication-prone region were clustered with UCLUST117 (v.11, default parameters) using a protein clustering distance cutoff of 0.5 and for each cluster the most frequent functional description as per the MorexV3 gene annotation9 was assigned as the functional description of the cluster. Self-alignment for characterization of evolutionary variability (Supplementary Fig. 7) was performed using lastz116 (v.1.04.03; settings ‘–self –notransition –gapped –nochain –gfextend –step=50’).

    Molecular dating of divergence times of duplicated genes in complex loci

    For molecular dating of gene duplications, we used segments of up to 4 kb, starting 1 kb upstream of duplicated genes in complex loci. With this, we presumed only to use intergenic sequences which are free from selection pressure and thus evolve at a neutral rate of 1.3 × 10−8 substitutions per site per year106. The upstream sequences of all duplicated genes of the respective complex locus were then aligned pairwise with the program Water from the EMBOSS package81 (obtained from Ubuntu repositories, https://ubuntu.com). This was done for all gene copies of all barley accession for which multiple gene copies were found. Molecular dating of the pairwise alignments was done as previously described118 using the substitution rate of 1.3 × 10−8 substitutions per site per year106.

    Amy1_1 analysis in pangenome assemblies

    The amy1_1 gene copy HORVU.MOREX.PROJ.6HG00545380 was used for BLAST against all 76 genome assemblies. Full-length sequences with identity over 95% were extracted and used for further analyses. Unique sequences were identified by clustering at 100% identity using CD-Hit96 and were aligned using MAFFT119 v.7.490. Sequence variants among amy1_1 gene copies at genomic DNA, coding sequence (CDS) and respective protein level were collected and amy1_1 haplotypes (that is, the combinations of copies) in each genotype assembly were summarized using R120 v.4.2.2. A Barke-specific SNP locus (GGCGCCAGGCATGATCGGGTGGTGGCCAGCCAAGGCGGTGACCTTCGTGGACAACCACGACACCGGCTCCACGCAGCACATGTGGCCCTTCCCTTCTGACA[A/G]GGTCATGCAGGGATATGCGTACATACTCACGCACCCAGGGACGCCATGCATCGTGAGTTCGTCGTACCAATACATCACATCTCAATTTTCTTTTCTTGTTTCGTTCATAA) for amy1_1 haplotype cluster ProtHap3 (Supplementary Table 21) was identified and used for KASP marker development (LGC Biosearch Technologies).

    Comparative analysis of the amy1_1 locus structure

    On the basis of the genome annotation of cv. Morex, 15 gene sequences on either side of amy1_1 gene copy HORVU.MOREX.PROJ.6HG00545440 were extracted. The 31 genes were compared against the 76 genome assemblies using NCBI-BLAST95 (BLASTN, word_size of 11 and percent identity of 90, other parameters as default). Alignment plots were generated from the BLAST result coordinates by scaling on the basis of the mid-point between HORVU.MOREX.r3.6HG0617300/HORVU.MOREX.PROJ.6HG00545250 and HORVU.MOREX.r3.6HG0617710/HORVU.MOREX.PROJ.6HG00545670. All BLAST results in the region (±1 Mb) around this mid-point were plotted using R120.

    Amy1_1 PacBio amplicon sequencing

    Genomic DNA from 1-week-old Morex seedling leaves was extracted with DNeasy Plant Mini Kit (QIAGEN). On the basis of the MorexV3 genome sequence assembly9, amy1_1 full-length copy-specific primers were designed using Primer3 (ref. 121) (https://primer3.ut.ee/): 6F: GTAGCAGTGCAGCGTGAAGTC; 80F: AGACATCGTTAACCACACATGC; 82F: GTTTCTCGTCCCTTTGCCTTAA; 82F: GTTTCTCGTCCCTTTGCCTTAA; 33R: GATCTGGATCGAAGGAGGGC; 79R: TCATACATGGGACCAGATCGAG; 80R: ACGTCAAGTTAGTAGGTAGCCC. All forward primers were tagged with bridge sequence (preceding T to primer name) [AmC6]gcagtcgaacatgtagctgactcaggtcac, whereas reverse primers were tagged with [AmC6]tggatcacttgtgcaagcatcacatcgtag to allow annealing to barcoding primers. These bridge sequence-tagged gene-specific primers were used in pairs with each other, targeting 1–2 copies of 3–6 kb amy1_1 genes, including upstream and downstream 500–1000 bp regions: T6F + T33R, T6F + T79R, T80F + T80R and T82F + T80R. A two-step PCR protocol was conducted. The first step PCR reaction was prepared in a 25 μl volume using 2 μl of DMSO, 0.3 μl of Q5 polymerase (New England Biolabs), 1 μl of amy1_1-specific primer pair (10 μM each), 2 μl of gDNA, 0.5 μl of dNTPs (10 mM), 5 μl of Q5 buffer and H2O. The PCR programme was as follows: initial denaturation at 98 °C/1 min followed by 25–28 cycles of 98 °C/30 s, 58 °C/30 s and 72 °C/3 min for extension, with a final extension step of 72 °C/2 min. The second PCR step (barcoding PCR) was prepared in the same way using 1 μl of the first PCR product as DNA template, barcoding primers (Pacific Biosciences) and the PCR programme reduced to 20 cycles. After quality check on 1% agarose gel, all barcoded PCR products were mixed and purified with AMPure PB (Pacific Biosciences). The SMRT bell library preparation and sequencing were carried out at BGI Tech Solutions. Sequencing data were analysed using SMRT Link v.10.2. To minimize PCR chimeric noise, CCSs were first constructed for each molecule. Second, long amplicon analysis was carried out on the basis of subreads from 50 bp windows spanning peak positions of all CCS length. Final consensus sequences for each amy1_1 were determined with the aid of size estimation from agarose gel imaging.

    Amy1_1 SNP haplotype analysis and k-mer-based copy number estimation

    SNP haplotypes were analysed in 1,315 PGRs and elite cultivars in the extended amy1_1 cluster region (MorexV3 chr6H: 516,385,490–517,116,415 bp). SNPs with more than 20% missing data among the analysed lines and minor allele frequency less than 0.01 were removed from downstream analyses. The data were converted to 0, 1 and 2 format using VCFtools122 and samples were clustered using the pheatmap package (https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf) from R statistical environment57. The sequential clustering approach was used to achieve the desired separation. At each step, two extreme clusters were selected and then samples from each cluster were clustered separately. The process was repeated until the desired separation was achieved on the basis of visual inspection.

    K-mers (k = 21) were generated from the Morex amy1_1 gene family members’ conserved region using jellyfish123 v.2.2.10. After removing k-mers with counts from regions other than amy1_1 in the Morex V3 genome assembly, k-mers were counted in the Illumina raw reads (Supplementary Table 6) using Seal (BBtools, https://jgi.doe.gov/data-and-tools/software-tools/bbtools/). All k-mer counts were normalized to counts per MorexV3 genome and amy1_1 copy number was estimated as the median count of all k-mers from each accession in R.

    Estimation ability was validated by comparing copy number from pangenome assemblies and short-read sequencing data (Extended Data Fig. 8c). For 1,000 PGRs, countries (with at least 10 accessions) were colour-shaded on the basis of their proportions of accessions with amy1_1 copy number greater than 5 on a world map using the R package maptools (https://cran.r-project.org/web/packages/maptools/index.html).

    To construct a network from SNP haplotypes, all 371 amy1_1 copies (except ORF 89, 90 and 93; Supplementary Table 14) were aligned using MAFFT119 v.7.490. Median-joining haplotype networks were generated using PopART124 with an epsilon value of 0.

    Local pangenome graph for amy1_1

    The coordinates of amy1_1 copies in 76 genome assemblies were obtained by BLAST searches with the Morex allele of HORVU.MOREX.PROJ.6HG00545380. The genomic intervals surrounding amy1_1 from 10 kb upstream of the first copy to 10 kb downstream of the last copy were extracted from corresponding assemblies and used for further analyses. We applied PGGB (v.0.4.0, https://github.com/pangenome/pggb) for 76 amy1_1 sequences with parameters ‘-n 76 -t 20 -p 90 -s 1000 -N’. The graph was visualized using Bandage125 (v.0.8.1). ODGI (v.0.7.3, command ‘paths’)110 was used to get a sparse distance matrix for paths with the parameter ‘-d’. The resultant distance matrix was plotted with the R package pheatmap (https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf). Six representative sequences of amy1_1 were aligned against Morex by BLAST+ (v.2.13.0)99.

    AMY1_1 protein structure and protein folding simulation

    The published protein structure of α-amylase AMY1_1 from accession Menuet, in complex with the pseudo-tetrasaccharide acarbose (PDB: 1BG9; ref. 42), was used to simulate the structural context of the amino acid variants identified in barley accessions Morex, Barke and RGT Planet. The amino acid sequences of the crystalized AMY1_1 protein from Menuet and the Morex reference copy amy1_1 HORVU.MOREX.PROJ.6HG00545380 used in this study are identical. The protein was visualized using PyMol 2.5.5 (Schrödinger). The Dynamut2 webserver126 was used to predict changes in protein stability and dynamics by introducing amino acid variants identified in the Morex, Barke and RGT Planet genome assemblies.

    Development of diverse amy1_1 haplotype barley NILs

    NILs with different amy1_1 haplotypes were derived from crosses between RGT Planet as recipient and Barke or Morex amy1_1 cluster donor parents (ProtHap3, ProtHap4 and ProtHap0, respectively; Supplementary Table 21), followed by two subsequent backcrosses to RGT Planet and one selfing step (BC2S1) to retrieve homozygous plants at the amy1_1 locus. A total of four amy_1_1–Barke NILs (ProtHap3) and one amy1_1–Morex NIL (ProtHap0) were developed and tested against RGT Planet (ProtHap4) replicates. Plants were grown in a greenhouse at 18 °C under 16/8-h light/dark cycles. Foreground and background molecular markers were used in each generation to assist plant selection. Respective BC2S1 plants were genotyped with the Barley Illumina 15K array (SGS Institut Fresenius, TraitGenetics Section, Germany) and grown to maturity. Grains were collected and further propagated in field plots in consecutive years in various locations (Nørre Aaby, Denmark; Lincoln, New Zealand; Maule, France). Grains from field plots were collected and threshed using a Wintersteiger Elite plot combiner, and sorted by size (threshold, 2.5 mm) using a Pfeuffer SLN3 sample cleaner (Pfeuffer).

    Micro-malting and α-amylase activity analysis

    Non-dormant barley samples of RGT Planet and respective NILs with different amy1_1 haplotypes (50 g each, graded greater than 2.5 mm) were micro-malted in perforated stainless-steel boxes. The barley samples were steeped at 15 °C by submersion of the boxes in water. Steeping took place for 6 h on day one, 3 h on day two and 1 h on day three, followed by air rests, to reach 35%, 40% and 45% water content, respectively. The actual water uptake of individual samples was determined as the weight difference between initial water content, measured with a Foss 1241 NIT instrument, and the sample weight after surface water removal. During air rest, metal beakers were placed into a germination box at 15 °C. Following the last steep, the barley samples were germinated for 3 d at 15 °C. Finally, barley samples were kiln-dried in an MMK Curio kiln (Curio Group) using a two-step ramping profile. The first ramping step started at a set point of 27 °C with a linear ramping at 2 °C h−1 to the breakpoint at 55 °C using 100% fresh air. The second linear ramping was at 4 °C h−1, reaching a maximum at 85 °C. This temperature was kept constant for 90 min using 50% air recirculation. The kilned samples were then deculmed using a manual root removal system (Wissenschaftliche Station für Brauerei). α-Amylase activity was measured using the Ceralpha method (Ceralpha Method MR-CAAR4, Megazyme) modified for Gallery Plus Beermaster (Thermo Fisher Scientific).

    Amy1_1 gene expression of RGT Planet and amy1_1–Barke NIL during micro-malting

    Samples (50 g each, graded greater than 2.5 mm) were micro-malted as described in the previous section. During micro-malting, grains were sampled at 24 h, 48 h and 72 h. Grain samples were first freeze-dried at −80 °C and then milled at room temperature. Total RNA was isolated from 20–200 mg of flour using the Spectrum Plant Total RNA Kit (Sigma Aldrich) and cleaned using RNA Clean & Concentrator (ZYMO Research) following a published protocol127. For RNA-seq analysis, libraries were prepared and single-end sequenced with a length of 75 bp as described in ref. 127. Gene expression was quantified as transcripts per million (TPM) using kallisto128 (v.0.48.0) with 100 bootstraps.

    Rachilla hair ploidy measurements

    Ploidy assessment was performed on rachillae collected from barley spikes at developmental stage129 approximately Waddington 9.0. Once isolated, rachillae were fixed with 50% ethanol/10% acetic acid for 16 h after which they were stained with 1 µM DAPI in 50 mM phosphate buffer (pH 7.2) supplemented with 0.05% Triton X100. Probes were analysed with a Zeiss LSM780 confocal laser scanning microscope using a ×20 NA 0.8 objective, zoom ×4 and image size 512 × 512 pixels. DAPI was visualized with a 405 nm laser line in combination with a 405–475 nm bandpass filter. The pinhole was set to ensure the whole nucleus was measured in one scan. Size and fluorescence intensity of the nuclei were measured with ZEN black (ZEISS) software. For data normalization, small, round nuclei of the epidermal proper were used for 2C (diploid) calibration.

    Scanning electron microscopy

    Sample preparation and recording by scanning electron microscopy were essentially performed as described previously130. In brief, samples were fixed overnight at 4 °C in 50 mM phosphate buffer (pH 7.2) containing 2% v/v glutaraldehyde and 2% v/v formaldehyde. After washing with distilled water and dehydration in an ascending ethanol series, samples were critical-point‐dried in a Bal‐Tec critical-point dryer (Leica Microsystems, https://www.leica-microsystems.com). Dried specimens were attached to carbon‐coated aluminium sample blocks and gold‐coated in an Edwards S150B sputter coater (Edwards High Vacuum, http://www.edwardsvacuum.com). Probes were examined in a Zeiss Gemini30 scanning electron microscope (Carl Zeiss, https://www.zeiss.de) at 5 kV acceleration voltage. Images were digitally recorded.

    Linkage mapping of SHORT RACHILLA HAIR 1 (HvSRH1)

    Initial linkage mapping was performed using GBS data of a large ‘Morex’ x ‘Barke’ F8 recombinant inbred line (RIL) population47 (European Nucleotide Archive project PRJEB14130). The GBS data of 163 RILs, phenotyped for rachilla hair in the F11 generation, and the two parental genotypes were extracted from the variant matrix using VCFtools122 and filtered as described previously3 for a minimum depth of sequencing to accept heterozygous and homozygous calls of 4 and 6, respectively, a minimum mapping quality score of the SNPs of 30, a minimal fraction of homozygous calls of 30% and a maximum fraction of missing data of 25%. The linkage map was built with the R package ASMap131 using the MSTMap algorithm132 and the Kosambi mapping function, forcing the linkage group to split according to the physical chromosomes. The linkage mapping was done with R/qtl133 using the binary model of the scanone function with the expectation maximization method134. The significance threshold was calculated running 1,000 permutations and the interval was determined by a logarithm of the odds drop of 1. To confirm consistency between the F8 RIL genotypes and F11 RIL phenotypes, three PCR Allele Competitive Extension (PACE) markers were designed through the 3CR Bioscience free assay design service, using polymorphisms between the genome assemblies of the two parents (Supplementary Table 24), and PACE genotyping was performed as described earlier135. To reduce the Srh1 interval, 22 recombinant F8 RILs were sequenced by Illumina WGS, the sequencing reads were mapped on the MorexV3 reference genome sequence assembly9 and the SNP was called. The 100 bp region around the flanking SNPs of the Srh1 interval as well as the sequence of the candidate gene HORVU.MOREX.r3.5HG0492730 were compared with the pangenome assemblies using BLASTN99 to identify the corresponding coordinates and extract the respective intervals for comparison. Gene sequences were aligned with Muscle5 (ref. 136). Structural variation between intervals was assessed with LASTZ116 v.1.04.03. The motif search was carried out with the EMBOSS81 6.5.7 tool fuzznuc.

    Cas9-mediated mutagenesis

    Guide RNA (gRNA) target motifs in the ‘Golden Promise’ HvSrh1 candidate gene HORVU.GOLDEN_PROMISE.PROJ.5HG00440000.1 were selected by using the online tool WU-CRISPR137 to induce translational frameshift mutations by insertion/deletion of nucleotides leading to loss-of-function of the gene. One pair of target motifs (gRNA1a: CCTCGCTGCCCGCCGACGC; gRNA1b: GACAAGACGAAGGCCGCGG) was selected within the HvSrh1 candidate gene on the basis of their position within the first half of the coding sequence and the two-dimensional minimum free energy structures of the cognate single-gRNAs (NNNNNNNNNNNNNNNNNNNNGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU) as modelled by the RNAfold WebServer138 and validated as suggested in ref. 139. gRNA-containing transformation vectors were cloned using the modular CasCADE vector system (https://doi.org/10.15488/13200). gRNA-specific sequences were ordered as DNA oligonucleotides (Supplementary Table 25) with specific overhangs for BsaI-based cloning into the gRNA-module vectors carrying the gRNA scaffold, driven by the Triticum aestivum U6 promoter. Golden Gate assembly of gRNAs and the cas9 module, driven by the Zea mays Polyubiquitin 1 (ZmUbi1) promotor, was performed according to the CasCADE protocol to generate the intermediate vector pHP21. To generate the binary vector pHP22, the gRNA and cas9 expression units were cloned using SfiI into the generic vector140 p6i-2x35S-TE9 which harbours an hpt gene under control of a double-enhanced CaMV35S promoter in its transfer-DNA for plant selection. Agrobacterium-mediated DNA transfer to immature embryos of the spring barley Golden Promise was performed as previously described141. In brief, immature embryos were excised from caryopses 12–14 d after pollination and co-cultivated with Agrobacterium strain AGL1 carrying pHP22 for 48 h. Then, the explants were cultivated for further callus formation under selective conditions using Timentin and hygromycin, which was followed by plant regeneration. The presence of T-DNA in regenerated plantlets was confirmed by hpt– and cas9-specific PCRs (primer sequences in Supplementary Table 25). Primary mutant plants (M1 generation) were identified by PCR amplification of the target region (primer sequences in Supplementary Table 25) followed by Sanger sequencing at LGC Genomics. Double or multiple peaks in the sequence chromatogram starting around the Cas9 cleavage site upstream of the target’s protospacer-adjacent motif were considered as an indication for chimeric and/or heterozygous mutants. Mutant plants were grown in a glasshouse until the formation of mature grains. M2 plants were grown in a climate chamber under speed breeding conditions (22 h light at 22 °C and 2 h dark at 19 °C, adapted from ref. 142) and genotyped by Sanger sequencing of PCR amplicons as given above. M2 grains were subjected to phenotyping.

    FIND-IT library construction

    We constructed a FIND-IT library in cv. ‘Etincel’ (6-row winter malting barley; SECOBRA Recherches) as described in ref. 50. In short, we induced mutations by incubating 2.5 kg of ‘Etincel’ grain in water overnight at 8 °C following an incubation in 0.3 mM NaN3 at pH 3.0 for 2 h at 20 °C with continuous application of oxygen. After thoroughly washing with water, the grains were air-dried in a fume hood for 48 h. Mutagenized grains were sown in fields in Nørre Aaby, Denmark, and collected in bulk using a Wintersteiger Elite plot combiner. In the following generation, 2.5 kg of grain was sown in fields in Lincoln, New Zealand, and 188 pools of approximately 300 plants each were hand-harvested and threshed. A representative sample, 25% of each pool, was milled (Retsch GM200), and DNA was extracted from 25 g of the flour by LGC Genomics.

    FIND-IT screening

    The FIND-IT ‘Etincel’ library was screened as described in ref. 50 using a single assay for the isolation of srh1P63S variant (ID no. CB-FINDit-Hv-014). Forward primer 5′ AATCCTGCAGTCCTTGG 3′, reverse primer 5′ GAGGAGAAGAAGGAGCC 3′, mutant probe 5′6-FAM/CGTGGACGT/ZEN/CGACG/3’IABkFQ/wild-type probe/5′SUN/ACGTGGGCG/ZEN/TCGA/3′IABkFQ/ (Integrated DNA Technologies).

    4K SNP chip genotyping

    Genotyping, including DNA extraction from freeze-dried leaf material, was conducted by TraitGenetics. srh1P63S mutant, the corresponding wild-type ‘Etincel’ and srh1 pangenome accessions Morex, RGT Planet, HOR 13942, HOR 9043 and HOR 21599 were genotyped for background confirmation. Pairwise genetic distance of individuals was calculated as the average of their per-locus distances143 using the R package stringdist144 (v.0.9.8). Principal coordinate analysis was done with R120 (v.4.0.2) base function cmdscale on the basis of this genetic distance matrix. The first two principal components were illustrated by ggplot2 (https://ggplot2.tidyverse.org).

    Sanger sequencing

    gDNA of the srh1P63S variant and ‘Etincel’ was extracted from 1-week-old seedling leaves (DNeasy, Plant Mini Kit, Qiagen). Genomic DNA fragments for sequencing were amplified by PCR using gene-specific primers (forward primer 5′ TTGCACGATTCAAATGTGGT 3′, reverse primer 5′ TCACCGGGATCTCTCTGAAT 3′) and Taq DNA Polymerase (NEB) for 35 cycles (initial denaturation at 94 °C/3 min followed by 35 cycles of 94 °C/45 s, 55 °C/60 s and 72 °C/60 s for extension, with a final extension step of 72 °C/10 min). PCR products were purified using the NucleoSpin Gel and PCR Clean-Up Kit (Macherey-Nagel) according to the manufacturer’s instructions. Sanger sequencing was done at Eurofins Genomics Germany using a gene-specific sequencing primer (5′ AGAACGGAGAGGAGAGAAAGAAG 3′).

    RNA preparation, sequencing and data analysis

    Rachilla tissues from two contrast groups, Morex (short) and Barke (long), and Bowman (long) and BW-NIL-srh1 (short), were used for RNA-seq. The rachilla tissues were collected from the central spikelets of the respective genotypes at rachilla hair initiation (Waddington 8.0) and elongation (Waddington 9.5) stages. Total RNA was extracted using TRIzol reagent (Invitrogen) followed by 2-propanol precipitation. Genomic DNA residues were removed with DNase I (NEB, M0303L). High-throughput paired-end sequencing was conducted at Novogene (Cambridge, UK) with the Illumina NovaSeq 6000 PE150 platform. RNA-seq reads were trimmed for adaptor sequences with Trimmomatic145 (v.0.39) and the MorexV3 genome annotation was used as reference to estimate read abundance with Kallisto128. The raw read counts were normalized to TPM expression levels.

    Messenger RNA in situ hybridization

    In situ hybridization was conducted in longitudinal sections and cross-sections derived from whole spikelet tissues of Bowman and Morex at rachilla hair elongation developmental stage (Waddington 9.5) with HvSRH1 sense and antisense probes (124 bp). The in situ hybridization was performed as described before146 with few modifications.

    Reporting summary

    Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

    [ad_2]

    Source link

  • Promises and challenges of crop translational genomics

    [ad_1]

  • McCabe, E. R. B. Translational genomics in medical genetics. Genet. Med. 4, 468–471 (2002).

    Article 
    PubMed 

    Google Scholar
     

  • Zeggini, E., Gloyn, A. L., Barton, A. C. & Wain, L. V. Translational genomics and precision medicine: moving from the lab to the clinic. Science 365, 1409–1413 (2019).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • Salentijn, E. M. J. et al. Plant translational genomics: from model species to crops. Mol. Breed. 20, 1–13 (2007).

    Article 

    Google Scholar
     

  • Cannon, S. B., May, G. D. & Jackson, S. A. Three sequenced legume genomes and many crop species: rich opportunities for translational genomics. Plant Physiol. 151, 970–977 (2009).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ronald, P. C. Lab to farm: applying research on plant genetics and genomics to crop improvement. PLoS Biol. 12, e1001878 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sun, Y., Shang, L., Zhu, Q.-H., Fan, L. & Guo, L. Twenty years of plant genome sequencing: achievements and challenges. Trends Plant Sci. 27, 391–401 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Bennetzen, J. L. & Ma, J. The genetic colinearity of rice and other cereals on the basis of genomic sequence analysis. Curr. Opin. Plant Biol. 6, 128–133 (2003).

    Article 
    PubMed 

    Google Scholar
     

  • Carlson, E. A. H. J. Muller’s contributions to mutation research. Mutat. Res. 752, 1–5 (2013).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • Simmonds, N. W. Bandwagons I Have Known. Tropical Agriculture Association Newsletter December 1991, 7–10 (Tropical Agriculture Association International, 1991).

  • Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat. Rev. Genet. 12, 499–510 (2011).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • Schneeberger, K. et al. SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nat. Methods 6, 550–551 (2009).

    Article 
    PubMed 

    Google Scholar
     

  • Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

    Article 
    PubMed 

    Google Scholar
     

  • Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ho, S. S., Urban, A. E. & Mills, R. E. Structural variation in the sequencing era. Nat. Rev. Genet. 21, 171–189 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Lei, L. et al. Plant pan-genomics comes of age. Annu. Rev. Plant Biol. 72, 411–435 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Orlando, L. et al. Ancient DNA analysis. Nat. Rev. Methods Primers 1, 14 (2021).

    Article 

    Google Scholar
     

  • Tanksley, S. D., Young, N. D., Paterson, A. H. & Bonierbale, M. W. RFLP mapping in plant breeding: new tools for an old science. Bio/Technology 7, 257–264 (1989).


    Google Scholar
     

  • Rafalski, J. A. Association genetics in crop improvement. Curr. Opin. Plant Biol. 13, 174–180 (2010).

    Article 
    PubMed 

    Google Scholar
     

  • Bernardo, R. Bandwagons I, too, have known. Theor. Appl. Genet. 129, 2323–2332 (2016).

    Article 
    PubMed 

    Google Scholar
     

  • Holland, J. B. Genetic architecture of complex traits in plants. Curr. Opin. Plant Biol. 10, 156–161 (2007).

    Article 
    PubMed 

    Google Scholar
     

  • Korte, A. & Farlow, A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9, 29 (2013).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Distelfeld, A., Li, C. & Dubcovsky, J. Regulation of flowering in temperate cereals. Curr. Opin. Plant Biol. 12, 178–184 (2009).

    Article 
    PubMed 

    Google Scholar
     

  • Comadran, J. et al. Natural variation in a homolog of Antirrhinum CENTRORADIALIS contributed to spring growth habit and environmental adaptation in cultivated barley. Nat. Genet. 44, 1388–1392 (2012).

    Article 
    PubMed 

    Google Scholar
     

  • Cheng, S. et al. Harnessing landrace diversity empowers wheat breeding. Nature 632, 823–831 (2024).

  • Wulff, B. B. & Krattinger, S. G. The long road to engineering durable disease resistance in wheat. Curr. Opin. Biotechnol. 73, 270–275 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Athiyannan, N. et al. Long-read genome sequencing of bread wheat facilitates disease resistance gene cloning. Nat. Genet. 54, 227–231 (2022). A good example of how the recent progress in genome sequencing has made gene isolation easier.

    Article 
    PubMed 

    Google Scholar
     

  • Meuwissen, T. H., Hayes, B. J. & Goddard, M. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lin, Z., Hayes, B. J. & Daetwyler, H. D. Genomic selection in crops, trees and forages: a review. Crop Pasture Sci. 65, 1177–1191 (2014).

    Article 

    Google Scholar
     

  • Rembe, M., Zhao, Y., Jiang, Y. & Reif, J. C. Reciprocal recurrent genomic selection: an attractive tool to leverage hybrid wheat breeding. Theor. Appl. Genet. 132, 687–698 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Poland, J. & Rutkoski, J. Advances and challenges in genomic selection for disease resistance. Annu. Rev. Phytopathol. 54, 79–98 (2016).

    Article 
    PubMed 

    Google Scholar
     

  • Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022). Structural variants derived from pangenomes improve the accuracy of quantitative genetic analyses.

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jensen, S. E. et al. A sorghum practical haplotype graph facilitates genome-wide imputation and cost-effective genomic prediction. Plant Genome 13, e20009 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Seyum, E. G. et al. Genomic selection in tropical perennial crops and plantation trees: a review. Mol. Breed. 42, 58 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wolfe, M. D. et al. Prospects for genomic selection in cassava breeding. Plant Genome 10, https://doi.org/10.3835/plantgenome2017.03.0015 (2017).

  • Flor, H. H. Current status of the gene-for-gene concept. Annu. Rev. Phytopathol. 9, 275–296 (1971).

    Article 

    Google Scholar
     

  • Tamborski, J. & Krasileva, K. V. Evolution of plant NLRs: from natural history to precise modifications. Annu. Rev. Plant Biol. 71, 355–378 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Barragan, A. C. & Weigel, D. Plant NLR diversity: the known unknowns of pan-NLRomes. Plant Cell 33, 814–831 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Moore, J. W. et al. A recently evolved hexose transporter variant confers resistance to multiple pathogens in wheat. Nat. Genet. 47, 1494–1498 (2015).

    Article 
    PubMed 

    Google Scholar
     

  • Krattinger, S. G. et al. A putative ABC transporter confers durable resistance to multiple fungal pathogens in wheat. Science 323, 1360–1363 (2009).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • Ercoli, M. F. et al. Plant immunity: rice XA21-mediated resistance to bacterial infection. Proc. Natl Acad. Sci. USA 119, e2121568119 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jupe, F. et al. Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations. Plant J. 76, 530–544 (2013).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hafeez, A. N. et al. Creation and judicious application of a wheat resistance gene atlas. Mol. Plant 14, 1053–1070 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Guo, Y. et al. Population genomics of Puccinia graminis f.sp. tritici highlights the role of admixture in the origin of virulent wheat rust races. Nat. Commun. 13, 6287 (2022).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Seong, K. & Krasileva, K. V. Prediction of effector protein structures from fungal phytopathogens enables evolutionary analyses. Nat. Microbiol. 8, 174–187 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Förderer, A. et al. A wheat resistosome defines common principles of immune receptor channels. Nature 610, 532–539 (2022).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhao, Y.-B. et al. Pathogen effector AvrSr35 triggers Sr35 resistosome assembly via a direct recognition mechanism. Sci. Adv. 8, eabq5108 (2022).

    Article 
    ADS 
    MathSciNet 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ma, S. et al. Direct pathogen-induced assembly of an NLR immune receptor complex to form a holoenzyme. Science 370, eabe3069 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Frankel, O. H. Genetic conservation: our evolutionary responsibility. Genetics 78, 53–65 (1974).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Altieri, M. A. & Merrick, L. In situ conservation of crop genetic resources through maintenance of traditional farming systems. Econ. Bot. 41, 86–96 (1987).

    Article 

    Google Scholar
     

  • Meilleur, B. A. & Hodgkin, T. In situ conservation of crop wild relatives: status and trends. Biodivers. Conserv. 13, 663–684 (2004).

    Article 

    Google Scholar
     

  • Marden, E., Sackville Hamilton, R., Halewood, M. & McCouch, S. International agreements and the plant genetics research community: a guide to practice. Proc. Natl Acad. Sci. USA 120, e2205773119 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Mascher, M. et al. Genebank genomics bridges the gap between the conservation of crop diversity and plant breeding. Nat. Genet. 51, 1076–1081 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Sansaloni, C. et al. Diversity analysis of 80,000 wheat accessions reveals consequences and opportunities of selection footprints. Nat. Commun. 11, 4572 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Schulthess, A. W. et al. Genomics-informed prebreeding unlocks the diversity in genebanks for wheat improvement. Nat. Genet. 54, 1544–1552 (2022). Genomics helps to bridge the gap between the conservation of plant genetic resources and practical breeding.

    Article 
    PubMed 

    Google Scholar
     

  • Milner, S. G. et al. Genebank genomics highlights the diversity of a global barley collection. Nat. Genet. 51, 319–326 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Romay, M. C. et al. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol. 14, R55 (2013).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • McCouch, S. R., McNally, K. L., Wang, W. & Sackville Hamilton, R. Genomics of gene banks: a case study in rice. Am. J. Bot. 99, 407–423 (2012).

    Article 
    PubMed 

    Google Scholar
     

  • De Beukelaer, H., Davenport, G. F. & Fack, V. Core Hunter 3: flexible core subset selection. BMC Bioinformatics 19, 203 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yu, X. et al. Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nat. Plants 2, 16150 (2016).

    Article 
    PubMed 

    Google Scholar
     

  • Bhullar, N. K., Street, K., Mackay, M., Yahiaoui, N. & Keller, B. Unlocking wheat genetic resources for the molecular identification of previously undescribed functional alleles at the Pm3 resistance locus. Proc. Natl Acad. Sci. USA 106, 9519–9524 (2009).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Milne, R. J. et al. The wheat Lr67 gene from the Sugar Transport Protein 13 family confers multipathogen resistance in barley. Plant Physiol. 179, 1285–1297 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Risk, J. M. et al. The wheat Lr34 gene provides resistance against multiple fungal pathogens in barley. Plant Biotechnol. J. 11, 847–854 (2013).

    Article 
    PubMed 

    Google Scholar
     

  • Luo, M. et al. A five-transgene cassette confers broad-spectrum resistance to a fungal rust pathogen in wheat. Nat. Biotechnol. 39, 561–566 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Wulff, B. B. & Moscou, M. J. Strategies for transferring resistance into wheat: from wide crosses to GM cassettes. Frontiers Plant Sci. 5, 692 (2014).

    Article 

    Google Scholar
     

  • Athiyannan, N. et al. Long-read genome sequencing of bread wheat facilitates disease resistance gene cloning. Nat. Genet. 54, 227–231 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, Y. et al. An unusual tandem kinase fusion protein confers leaf rust resistance in wheat. Nat. Genet. 55, 914–920 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cavalet-Giorsa, E. et al. Origin and evolution of the bread wheat D genome. Nature https://doi.org/10.1038/s41586-024-07808-z (2024).

  • Cardi, T. et al. CRISPR/Cas-mediated plant genome editing: outstanding challenges a decade after implementation. Trends Plant Sci. 28, 1144–1165 (2023).

    Article 
    PubMed 

    Google Scholar
     

  • Watson, A. et al. Speed breeding is a powerful tool to accelerate crop research and breeding. Nat. Plants 4, 23–29 (2018).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • Cha, J.-K. et al. Speed vernalization to accelerate generation advance in winter cereal crops. Mol. Plant 15, 1300–1309 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Mascher, M. et al. A chromosome conformation capture ordered sequence of the barley genome. Nature 544, 427–433 (2017).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • The International Wheat Genome Sequencing Consortium. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018). In the past, large international consortia were needed to assemble reference sequences of large crop genomes.

    Article 

    Google Scholar
     

  • Jayakodi, M. et al. The barley pan-genome reveals the hidden legacy of mutation breeding. Nature 588, 284–289 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhou, Y. et al. Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian rice. Nat. Commun. 14, 1567 (2023).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Przewieslik-Allen, A. M. et al. The role of gene flow and chromosomal instability in shaping the bread wheat genome. Nat. Plants 7, 172–183 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • van Rengs, W. M. J. et al. A chromosome scale tomato genome built from complementary PacBio and Nanopore sequences alone reveals extensive linkage drag during breeding. Plant J. 110, 572–588 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Wendler, N. et al. Bulbosum to go: a toolbox to utilize Hordeum vulgare/bulbosum introgressions for breeding and beyond. Mol. Plant 8, 1507–1519 (2015).

    Article 
    PubMed 

    Google Scholar
     

  • Mieulet, D. et al. Unleashing meiotic crossovers in crops. Nat. Plants 4, 1010–1016 (2018). Single genes can have large effects on the recombination landscape.

    Article 
    PubMed 

    Google Scholar
     

  • Rönspies, M., Dorn, A., Schindele, P. & Puchta, H. CRISPR–Cas-mediated chromosome engineering for crop improvement and synthetic biology. Nat. Plants 7, 566–573 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Schmidt, C. et al. Changing local recombination patterns in Arabidopsis by CRISPR/Cas mediated chromosome engineering. Nat. Commun. 11, 4418 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Schwartz, C. et al. CRISPR–Cas9-mediated 75.5-Mb inversion in maize. Nat. Plants 6, 1427–1431 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Bartlett, M. E., Moyers, B. T., Man, J., Subramaniam, B. & Makunga, N. P. The power and perils of de novo domestication using genome editing. Annu. Rev. Plant Biol. 74, 727–750 (2023).

    Article 
    PubMed 

    Google Scholar
     

  • Yu, H. & Li, J. Breeding future crops to feed the world through de novo domestication. Nat. Commun. 13, 1171 (2022).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hanak, T., Madsen, C. K. & Brinch-Pedersen, H. Genome editing-accelerated re-domestication (GEaReD)—a new major direction in plant breeding. Biotechnol. J. 17, 2100545 (2022).

    Article 

    Google Scholar
     

  • Zhang, S. et al. Sustained productivity and agronomic potential of perennial rice. Nat. Sust. 6, 28–38 (2023).

    Article 

    Google Scholar
     

  • Singh, D., Buhmann, A. K., Flowers, T. J., Seal, C. E. & Papenbrock, J. Salicornia as a crop plant in temperate regions: selection of genetically characterized ecotypes and optimization of their cultivation conditions. AoB Plants 6, plu071 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lenser, T. & Theißen, G. Molecular mechanisms involved in convergent crop domestication. Trends Plant Sci. 18, 704–714 (2013).

    Article 
    PubMed 

    Google Scholar
     

  • Larson, S. et al. Genome mapping of quantitative trait loci (QTL) controlling domestication traits of intermediate wheatgrass (Thinopyrum intermedium). Theor. Appl. Genet. 132, 2325–2351 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Stetter, M. G., Gates, D. J., Mei, W. & Ross-Ibarra, J. How to make a domesticate. Curr. Biol. 27, R896–R900 (2017).

    Article 
    PubMed 

    Google Scholar
     

  • Abbo, S. et al. Plant domestication versus crop evolution: a conceptual framework for cereals and grain legumes. Trends Plant Sci. 19, 351–360 (2014).

    Article 
    PubMed 

    Google Scholar
     

  • Fuller, D. Q. Contrasting patterns in crop domestication and domestication rates: recent archaeobotanical insights from the Old World. Ann. Bot. 100, 903–924 (2007).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lemmon, Z. H. et al. Rapid improvement of domestication traits in an orphan crop by genome editing. Nat. Plants 4, 766–770 (2018). Agronomically relevant traits in a minor crop were improved by targeted mutagenesis.

    Article 
    PubMed 

    Google Scholar
     

  • Li, T. et al. Domestication of wild tomato is accelerated by genome editing. Nat. Biotechnol. 36, 1160–1163 (2018).

    Article 

    Google Scholar
     

  • Fernie, A. R. & Yan, J. De novo domestication: an alternative route toward new crops for the future. Mol. Plant 12, 615–631 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Bevan, M. W. et al. Genomic innovation for crop improvement. Nature 543, 346–354 (2017).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • Khoury, C. K. et al. Crop genetic erosion: understanding and responding to loss of crop diversity. New Phytol. 233, 84–118 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Brown, W. L. Genetic diversity and genetic vulnerability—an appraisal. Econ. Bot. 37, 4–12 (1983).

    Article 

    Google Scholar
     

  • Mayer, M. et al. Discovery of beneficial haplotypes for complex traits in maize landraces. Nat. Commun. 11, 4954 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Stephan, W. Genetic hitchhiking versus background selection: the controversy and its implications. Philos. Trans. R. Soc. B 365, 1245–1253 (2010).

    Article 

    Google Scholar
     

  • Yang, J. et al. Incomplete dominance of deleterious alleles contributes substantially to trait variation and heterosis in maize. PLoS Genet. 13, e1007019 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, L. et al. The interplay of demography and selection during maize domestication and expansion. Genome Biol. 18, 215 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lozano, R. et al. Comparative evolutionary genetics of deleterious load in sorghum and maize. Nat. Plants 7, 17–24 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Liu, Q., Zhou, Y., Morrell, P. L. & Gaut, B. S. Deleterious variants in Asian rice and the potential cost of domestication. Mol. Biol. Evol. 34, 908–924 (2017).

    PubMed 

    Google Scholar
     

  • Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Khan, A. W. et al. Super-pangenome by Integrating the wild side of a species for accelerated crop improvement. Trends Plant Sci. 25, 148–158 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Gao, H. et al. The landscape of tolerated genetic variation in humans and primates. Science 380, eabn8153 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ramstein, G. P. & Buckler, E. S. Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize. Genome Biol. 23, 183 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wallace, J. G., Rodgers-Melnick, E. & Buckler, E. S. On the road to breeding 4.0: unraveling the good, the bad, and the boring of crop quantitative genomics. Annu. Rev. Genet. 52, 421–444 (2018).

    Article 
    PubMed 

    Google Scholar
     

  • Roze, D. A simple expression for the strength of selection on recombination generated by interference among mutations. Proc. Natl Acad. Sci. USA 118, e2022805118 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gabriel, W., Lynch, M. & Bürger, R. Muller’s ratchet and mutational meltdowns. Evolution 47, 1744–1757 (1993).

    Article 
    PubMed 

    Google Scholar
     

  • Naeem, M., Demirel, U., Yousaf, M. F., Caliskan, S. & Caliskan, M. E. Overview on domestication, breeding, genetic gain and improvement of tuber quality traits of potato using fast forwarding technique (GWAS): a review. Plant Breed. 140, 519–542 (2021).

    Article 

    Google Scholar
     

  • Jansky, S. H. et al. Reinventing potato as a diploid inbred line–based crop. Crop Sci. 56, 1412–1422 (2016).

    Article 

    Google Scholar
     

  • ter Steeg, E. M. S., Struik, P. C., Visser, R. G. F. & Lindhout, P. Crucial factors for the feasibility of commercial hybrid breeding in food crops. Nat. Plants 8, 463–473 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Zhou, Q. et al. Haplotype-resolved genome analyses of a heterozygous diploid potato. Nat. Genet. 52, 1018–1023 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sun, H. et al. Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar. Nat. Genet. 54, 342–348 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tang, D. et al. Genome evolution and diversity of wild and cultivated potatoes. Nature 606, 535–541 (2022). Initial analysis of a genus-wide pangenome of potato and its wild relatives.

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhang, C. et al. The genetic basis of inbreeding depression in potato. Nat. Genet. 51, 374–378 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Wu, Y. et al. Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding. Cell 186, 2313–2328.e2315 (2023).

    Article 
    PubMed 

    Google Scholar
     

  • Ye, M. et al. Generation of self-compatible diploid potato by knockout of S-RNase. Nat. Plants 4, 651–654 (2018).

    Article 
    PubMed 

    Google Scholar
     

  • Mascher, M., Jayakodi, M. & Stein, N. The reinvention of potato. Cell Res. 31, 1144–1145 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Servin, B., Martin, O. C., Mézard, M. & Hospital, F. Toward a theory of marker-assisted gene pyramiding. Genetics 168, 513–523 (2004).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hurni, S. et al. The powdery mildew resistance gene Pm8 derived from rye is suppressed by its wheat ortholog Pm3. Plant J. 79, 904–913 (2014).

    Article 
    PubMed 

    Google Scholar
     

  • Cordell, H. J. Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11, 2463–2468 (2002).

    Article 
    PubMed 

    Google Scholar
     

  • Soyk, S. et al. Bypassing negative epistasis on yield in tomato imposed by a domestication gene. Cell 169, 1142–1155.e1112 (2017).

    Article 
    PubMed 

    Google Scholar
     

  • Soyk, S., Benoit, M. & Lippman, Z. B. New horizons for dissecting epistasis in crop quantitative trait variation. Annu. Rev. Genet. 54, 287–307 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Jiang, Y., Schmidt, R. H., Zhao, Y. & Reif, J. C. A quantitative genetic framework highlights the role of epistatic effects for grain-yield heterosis in bread wheat. Nat. Genet. 49, 1741–1746 (2017).

    Article 
    PubMed 

    Google Scholar
     

  • Bouché, F., Lobet, G., Tocquin, P. & Périlleux, C. FLOR-ID: an interactive database of flowering-time gene networks in Arabidopsis thaliana. Nucleic Acids Res. 44, D1167–D1171 (2016).

    Article 
    PubMed 

    Google Scholar
     

  • Chen, D., Yan, W., Fu, L.-Y. & Kaufmann, K. Architecture of gene regulatory networks controlling flower development in Arabidopsis thaliana. Nat. Commun. 9, 4534 (2018).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ahsan, A. et al. Identification of epistasis loci underlying rice flowering time by controlling population stratification and polygenic effect. DNA Res. 26, 119–130 (2018).

    Article 
    PubMed Central 

    Google Scholar
     

  • Mathew, B., Léon, J., Sannemann, W. & Sillanpää, M. J. Detection of epistasis for flowering time using Bayesian multilocus estimation in a barley MAGIC population. Genetics 208, 525–536 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Durand, E. et al. Flowering time in maize: linkage and epistasis at a major effect locus. Genetics 190, 1547–1562 (2012).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Padmarasu, S., Himmelbach, A., Mascher, M. & Stein, N. In situ Hi-C for plants: an improved method to detect long-range chromatin interactions. Methods Mol. Biol. 1933, 441–472 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Liu, L. et al. Enhancing grain-yield-related traits by CRISPR–Cas9 promoter editing of maize CLE genes. Nat. Plants 7, 287–294 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Aguirre, L., Hendelman, A., Hutton, S. F., McCandlish, D. M. & Lippman, Z. B. Idiosyncratic and dose-dependent epistasis drives variation in tomato fruit size. Science 382, 315–320 (2023). On the molecular genetics of regulatory variation in tomato.

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhao, L. et al. Integrative analysis of reference epigenomes in 20 rice varieties. Nat. Commun. 11, 2658 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Han, T. et al. An epigenetic basis of inbreeding depression in maize. Sci. Adv. 7, eabg5442 (2021).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Thiel, J. et al. Transcriptional landscapes of floral meristems in barley. Sci. Adv. 7, eabf0832 (2021).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhang, T.-Q., Chen, Y., Liu, Y., Lin, W.-H. & Wang, J.-W. Single-cell transcriptome atlas and chromatin accessibility landscape reveal differentiation trajectories in the rice root. Nat. Commun. 12, 2053 (2021).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Watt, M. et al. Phenotyping: new windows into the plant for breeders. Annu. Rev. Plant Biol. 71, 689–712 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Araus, J. L. et al. Crop phenotyping in a context of global change: what to measure and how to do it. J. Integr. Plant Biol. 64, 592–618 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Sweet, D. D., Tirado, S. B., Springer, N. M., Hirsch, C. N. & Hirsch, C. D. Opportunities and challenges in phenotyping row crops using drone-based RGB imaging. Plant Phenome J. 5, e20044 (2022).

    Article 

    Google Scholar
     

  • Barker, J. et al. Development of a field-based high-throughput mobile phenotyping platform. Comput. Electron. Agric. 122, 74–85 (2016).

    Article 

    Google Scholar
     

  • Araus, J. L. & Cairns, J. E. Field high-throughput phenotyping: the new crop breeding frontier. Trends Plant Sci. 19, 52–61 (2014).

    Article 
    PubMed 

    Google Scholar
     

  • Heuermann, M. C., Knoch, D., Junker, A. & Altmann, T. Natural plant growth and development achieved in the IPK PhenoSphere by dynamic environment simulation. Nat. Commun. 14, 5783 (2023).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Perez de Souza, L., Alseekh, S., Scossa, F. & Fernie, A. R. Ultra-high-performance liquid chromatography high-resolution mass spectrometry variants for metabolomics research. Nat. Methods 18, 733–746 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Dubin, M. J. et al. DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation. eLife 4, e05255 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Nica, A. C. & Dermitzakis, E. T. Expression quantitative trait loci: present and future. Philos. Trans. R. Soc. B 368, 20120362 (2013).

    Article 

    Google Scholar
     

  • Monroe, J. G. et al. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature 602, 101–105 (2022).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Araus, J. L., Kefauver, S. C., Zaman-Allah, M., Olsen, M. S. & Cairns, J. E. Translating high-throughput phenotyping into genetic gain. Trends Plant Sci. 23, 451–466 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hu, Y. & Schmidhalter, U. Opportunity and challenges of phenotyping plant salt tolerance. Trends Plant Sci. 28, 552–566 (2023).

    Article 
    PubMed 

    Google Scholar
     

  • Reynolds, M. et al. Breeder friendly phenotyping. Plant Sci. 295, 110396 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Awada, L., Phillips, P. W. B. & Smyth, S. J. The adoption of automated phenotyping by plant breeders. Euphytica 214, 148 (2018).

    Article 

    Google Scholar
     

  • Papoutsoglou, E. A., Athanasiadis, I. N., Visser, R. G. F. & Finkers, R. The benefits and struggles of FAIR data: the case of reusing plant phenotyping data. Sci. Data 10, 457 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Papoutsoglou, E. A. et al. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. New Phytol. 227, 260–273 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Selby, P. et al. BrAPI—an application programming interface for plant breeding applications. Bioinformatics 35, 4147–4155 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bell, G., Hey, T. & Szalay, A. Beyond the data deluge. Science 323, 1297–1298 (2009).

    Article 
    PubMed 

    Google Scholar
     

  • Jones, J. W. et al. Brief history of agricultural systems modeling. Agric. Syst. 155, 240–254 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chenu, K. et al. Contribution of crop models to adaptation in wheat. Trends Plant Sci. 22, 472–490 (2017).

    Article 
    PubMed 

    Google Scholar
     

  • De Souza, A. P. et al. Soybean photosynthesis and crop yield are improved by accelerating recovery from photoprotection. Science 377, 851–854 (2022).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • Habier, D., Fernando, R. L. & Garrick, D. J. Genomic BLUP decoded: a look into the black box of genomic prediction. Genetics 194, 597–607 (2013).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hammer, G., Messina, C., Wu, A. & Cooper, M. Biological reality and parsimony in crop models—why we need both in crop improvement! in silico Plants 1, diz010 (2019).

    Article 

    Google Scholar
     

  • Roeder, A. H. K. et al. Fifteen compelling open questions in plant cell biology. Plant Cell 34, 72–102 (2021). A collection of thought-provoking perspectives on future directions in basic plant science.

    Article 
    PubMed Central 

    Google Scholar
     

  • Alexandratos, N. & Bruinsma, J. World agriculture towards 2030/2050: the 2012 revision. ESA Working Paper 12-03 (FAO, 2012).

  • Roser, M. Breaking out of the Malthusian trap: How pandemics allow us to understand why our ancestors were stuck in poverty. Our World in Data https://ourworldindata.org/breaking-the-malthusian-trap (2020).

  • Ritchie, H., Rosado P. & Roser, M. Hunger and Undernourishment. Our World in Data https://ourworldindata.org/hunger-and-undernourishment (2023).

  • Ghazal, H. et al. Plant genomics in Africa: present and prospects. Plant J. 107, 21–36 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Jamnadass, R. et al. Enhancing African orphan crops with genomics. Nat. Genet. 52, 356–360 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • VanBuren, R. et al. Exceptional subgenome stability and functional divergence in the allotetraploid Ethiopian cereal teff. Nat. Commun. 11, 884 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, M. et al. Improved assembly and annotation of the sesame genome. DNA Res. 29, dsac041 (2022).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Qi, W. et al. The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features. GigaScience 11, giac028 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kuon, J.-E. et al. Haplotype-resolved genomes of geminivirus-resistant and geminivirus-susceptible African cassava cultivars. BMC Biol. 17, 75 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Varshney, R. K. et al. Achievements and prospects of genomics-assisted breeding in three legume crops of the semi-arid tropics. Biotechnol. Adv. 31, 1120–1134 (2013).

    Article 
    PubMed 

    Google Scholar
     

  • Mboowa, G., Sserwadda, I. & Aruhomukama, D. Genomics and bioinformatics capacity in Africa: no continent is left behind. Genome 64, 503–513 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Santantonio, N. et al. Strategies for effective use of genomic information in crop breeding programs serving Africa and South Asia. Frontiers Plant Sci 11, 353 (2020).

    Article 

    Google Scholar
     

  • Poore, J. & Nemecek, T. Reducing food’s environmental impacts through producers and consumers. Science 360, 987–992 (2018). This paper presents strong arguments for why environmental concerns matter to everyone, including plant breeders.

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  • [ad_2]

    Source link

  • Teosinte Pollen Drive guides maize diversification and domestication by RNAi

    [ad_1]

  • Sandler, L. & Novitski, E. Meiotic drive as an evolutionary force. Am. Nat. 91, 105–110 (1957).

    Article 

    Google Scholar
     

  • Presgraves, D. C. The molecular evolutionary basis of species formation. Nat. Rev. Genet. 11, 175–180 (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Kistler, L. et al. Multiproxy evidence highlights a complex evolutionary legacy of maize in South America. Science 362, 1309–1313 (2018).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Schneider, K. L., Xie, Z., Wolfgruber, T. K. & Presting, G. G. Inbreeding drives maize centromere evolution. Proc. Natl Acad. Sci. USA 113, E987–E996 (2016).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Anderson, E. & Stebbins, G. L. Hybridization as an evolutionary stimulus. Evolution 8, 378–388 (1954).

    Article 

    Google Scholar
     

  • Arnold, M. L. Transfer and origin of adaptations through natural hybridization: were Anderson and Stebbins right? Plant Cell 16, 562–570 (2004).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bayes, J. J. & Malik, H. S. Altered heterochromatin binding by a hybrid sterility protein in Drosophila sibling species. Science 326, 1538–1541 (2009).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tang, S. & Presgraves, D. C. Evolution of the Drosophila nuclear pore complex results in multiple hybrid incompatibilities. Science 323, 779–782 (2009).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bomblies, K. et al. Autoimmune response as a mechanism for a Dobzhansky–Muller-type incompatibility syndrome in plants. PLoS Biol. 5, e236 (2007).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • McLaughlin, R. N. Jr & Malik, H. S. Genetic conflicts: the usual suspects and beyond. J. Exp. Biol. 220, 6–17 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lindholm, A. K. et al. The ecology and evolutionary dynamics of meiotic drive. Trends Ecol. Evol. 31, 315–326 (2016).

    Article 
    PubMed 

    Google Scholar
     

  • Fishman, L. & Saunders, A. Centromere-associated female meiotic drive entails male fitness costs in monkeyflowers. Science 322, 1559–1562 (2008).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Chmátal, L. et al. Centromere strength provides the cell biological basis for meiotic drive and karyotype evolution in mice. Curr. Biol. 24, 2295–2300 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Fishman, L. & McIntosh, M. Standard deviations: the biological bases of transmission ratio distortion. Annu. Rev. Genet. 53, 347–372 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Buckler, E. S. 4th et al. Meiotic drive of chromosomal knobs reshaped the maize genome. Genetics 153, 415–426 (1999).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dawe, R. K. et al. A kinesin-14 motor activates neocentromeres to promote meiotic drive in maize. Cell 173, 839–850.e18 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Lyon, M. F. Transmission ratio distortion in mice. Annu. Rev. Genet. 37, 393–408 (2003).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • McDermott, S. R. & Noor, M. A. F. The role of meiotic drive in hybrid male sterility. Phil. Trans. R. Soc. B 365, 1265–1272 (2010).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Herrmann, B. G., Koschorz, B., Wertz, K., McLaughlin, K. J. & Kispert, A. A protein kinase encoded by the t complex responder gene causes non-Mendelian inheritance. Nature 402, 141–146 (1999).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Bauer, H., Willert, J., Koschorz, B. & Herrmann, B. G. The t complex-encoded GTPase-activating protein Tagap1 acts as a transmission ratio distorter in mice. Nat. Genet. 37, 969–973 (2005).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Hartl, D. L. Genetic dissection of segregation distortion. I. Suicide combinations of SD genes. Genetics 76, 477–486 (1974).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Larracuente, A. M. & Presgraves, D. C. The selfish segregation distorter gene complex of Drosophila melanogaster. Genetics 192, 33–53 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zanders, S. E. et al. Genome rearrangements and pervasive meiotic drive cause hybrid infertility in fission yeast. eLife 3, e02630 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Nuckolls, N. L. et al. wtf Genes are prolific dual poison–antidote meiotic drivers. eLife 6, e26033 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lewontin, R. C. & Dunn, L. C. The evolutionary dynamics of a polymorphism in the house mouse. Genetics 45, 705–722 (1960).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hurst, L. D. & Pomiankowski, A. Causes of sex ratio bias may account for unisexual sterility in hybrids: a new explanation of Haldane’s rule and related phenomena. Genetics 128, 841–858 (1991).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Coughlan, J. M. The role of conflict in shaping plant biodiversity. New Phytol. https://doi.org/10.1111/nph.19233 (2023).

  • Phadnis, N. & Orr, H. A. A single gene causes both male sterility and segregation distortion in Drosophila hybrids. Science 323, 376–379 (2009).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Zhang, L., Sun, T., Woldesellassie, F., Xiao, H. & Tao, Y. Sex ratio meiotic drive as a plausible evolutionary mechanism for hybrid male sterility. PLoS Genet. 11, e1005073 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kermicle, J. L. & Allen, J. P. Cross-incompatibility between maize and teosinte. Maydica 35, 399–408 (1990).


    Google Scholar
     

  • Lu, Y., Hokin, S. A., Kermicle, J. L., Hartwig, T. & Evans, M. M. S. A pistil-expressed pectin methylesterase confers cross-incompatibility between strains of Zea mays. Nat. Commun. 10, 2304 (2019).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hufford, M. B. et al. The genomic signature of crop-wild introgression in maize. PLoS Genet. 9, e1003477 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rojas-Barrera, I. C. et al. Contemporary evolution of maize landraces and their wild relatives influenced by gene flow with modern maize varieties. Proc. Natl Acad. Sci. USA 116, 21302–21311 (2019).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, C. et al. A natural gene drive system confers reproductive isolation in rice. Cell 186, 3577–3592.e18 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Yang, Z. & Bielawski, J. P. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15, 496–503 (2000).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yoshikawa, M., Peragine, A., Park, M. Y. & Poethig, R. S. A pathway for the biogenesis of trans-acting siRNAs in Arabidopsis. Genes Dev 19, 2164–2175 (2005).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Parent, J.-S., Bouteiller, N., Elmayan, T. & Vaucheret, H. Respective contributions of Arabidopsis DCL2 and DCL4 to RNA silencing. Plant J. 81, 223–232 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Deleris, A. et al. Hierarchical action and inhibition of plant Dicer-like proteins in antiviral defense. Science 313, 68–71 (2006).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Bouché, N., Lauressergues, D., Gasciolli, V. & Vaucheret, H. An antagonistic function for Arabidopsis DCL2 in development and a new function for DCL4 in generating viral siRNAs. EMBO J. 25, 3347–3356 (2006).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wu, Y.-Y. et al. DCL2- and RDR6-dependent transitive silencing of SMXL4 and SMXL5 in Arabidopsis dcl4 mutants causes defective phloem transport and carbohydrate over-accumulation. Plant J. 90, 1064–1078 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Taochy, C. et al. A genetic screen for impaired systemic RNAi highlights the crucial role of DICER-LIKE 2. Plant Physiol. 175, 1424–1437 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Mlotshwa, S. et al. DICER-LIKE2 plays a primary role in transitive silencing of transgenes in Arabidopsis. PLoS ONE 3, e1755 (2008).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tagami, Y., Motose, H. & Watanabe, Y. A dominant mutation in DCL1 suppresses the hyl1 mutant phenotype by promoting the processing of miRNA. RNA 15, 450–458 (2009).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Welker, N. C. et al. Dicer’s helicase domain discriminates dsRNA termini to promote an altered reaction mode. Mol. Cell 41, 589–599 (2011).

    Article 
    MathSciNet 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Aderounmu, A. M., Aruscavage, P. J., Kolaczkowski, B. & Bass, B. L. Ancestral protein reconstruction reveals evolutionary events governing variation in Dicer helicase function. eLife 12, e85120 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Slotkin, R. K., Freeling, M. & Lisch, D. Heritable transposon silencing initiated by a naturally occurring transposon inverted duplication. Nat. Genet. 37, 641–644 (2005).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Bhutani, K. et al. Widespread haploid-biased gene expression enables sperm-level natural selection. Science 371, eabb1723 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Shan, X. et al. Mobilization of the active MITE transposons mPing and Pong in rice by introgression from wild rice (Zizania latifolia Griseb.). Mol. Biol. Evol. 22, 976–990 (2005).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Ding, L.-N. et al. Advances in plant GDSL lipases: from sequences to functional mechanisms. Acta Physiol. Plant 41, 151 (2019).

    Article 

    Google Scholar
     

  • An, X. et al. ZmMs30 encoding a novel GDSL lipase is essential for male fertility and valuable for hybrid breeding in maize. Mol. Plant 12, 343–359 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Huo, Y. et al. IRREGULAR POLLEN EXINE2 encodes a GDSL lipase essential for male fertility in maize. Plant Physiol. 184, 1438–1454 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhao, J. et al. RMS2 encoding a GDSL lipase mediates lipid homeostasis in anthers to determine rice male fertility. Plant Physiol. 182, 2047–2064 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tsugama, D., Fujino, K., Liu, S. & Takano, T. A GDSL-type esterase/lipase gene, GELP77, is necessary for pollen dissociation and fertility in Arabidopsis. Biochem. Biophys. Res. Commun. 526, 1036–1041 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Wu, H. et al. Plant 22-nt siRNAs mediate translational repression and stress adaptation. Nature 581, 89–93 (2020).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Borges, F. & Martienssen, R. A. The expanding world of small RNAs in plants. Nat. Rev. Mol. Cell Biol. 16, 727–741 (2015).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Fang, X. & Qi, Y. RNAi in plants: an Argonaute-centered view. Plant Cell 28, 272–285 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Axtell, M. J., Westholm, J. O. & Lai, E. C. Vive la différence: biogenesis and evolution of microRNAs in plants and animals. Genome Biol. 12, 221 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Manavella, P. A., Koenig, D. & Weigel, D. Plant secondary siRNA production determined by microRNA-duplex structure. Proc. Natl Acad. Sci. USA 109, 2461–2466 (2012).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Nelms, B. & Walbot, V. Gametophyte genome activation occurs at pollen mitosis I in maize. Science 375, 424–429 (2022).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Wongpalee, S. P. et al. CryoEM structures of Arabidopsis DDR complexes involved in RNA-directed DNA methylation. Nat. Commun. 10, 3916 (2019).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jauvion, V., Rivard, M., Bouteiller, N., Elmayan, T. & Vaucheret, H. RDR2 partially antagonizes the production of RDR6-dependent siRNA in sense transgene-mediated PTGS. PLoS ONE 7, e29785 (2012).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Creasey, K. M. et al. miRNAs trigger widespread epigenetically activated siRNAs from transposons in Arabidopsis. Nature 508, 411–415 (2014).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Romero Navarro, J. A. et al. A study of allelic diversity underlying flowering-time adaptation in maize landraces. Nat. Genet. 49, 476–480 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Chen, L. et al. Genome sequencing reveals evidence of adaptive variation in the genus Zea. Nat. Genet. 54, 1736–1745 (2022).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Lu, Y., Kermicle, J. L. & Evans, M. M. S. Genetic and cellular analysis of cross-incompatibility in Zea mays. Plant Reprod. 27, 19–29 (2014).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Hartl, D. L. Population dynamics of sperm and pollen killers. Theor. Appl. Genet. 42, 81–88 (1972).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Sweigart, A. L., Brandvain, Y. & Fishman, L. Making a murderer: the evolutionary framing of hybrid gamete-killers. Trends Genet. 35, 245–252 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Bravo Núñez, M. A., Lange, J. J. & Zanders, S. E. A suppressor of a wtf poison–antidote meiotic driver acts via mimicry of the driver’s antidote. PLoS Genet. 14, e1007836 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Barnes, A. C. et al. An adaptive teosinte mexicana introgression modulates phosphatidylcholine levels and is associated with maize flowering time. Proc. Natl Acad. Sci. USA 119, e2100036119 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • McClintock, B., Kato Yamakake, T. A., Blumenschein, A. & Escuela Nacional de Agricultura (Mexico). Chromosome Constitution of Races of Maize: Its Significance in the Interpretation of Relationships between Races and Varieties in the Americas (Colegio de Postgraduados, 1981).

  • Borges, F. et al. Transposon-derived small RNAs triggered by miR845 mediate genome dosage response in Arabidopsis. Nat. Genet. 50, 186–192 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Martinez, G. et al. Paternal easiRNAs regulate parental genome dosage in Arabidopsis. Nat. Genet. 50, 193–198 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Durand, E. et al. Dominance hierarchy arising from the evolution of a complex small RNA regulatory network. Science 346, 1200–1205 (2014).

    Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar
     

  • Czech, B. et al. An endogenous small interfering RNA pathway in Drosophila. Nature 453, 798–802 (2008).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wen, J. et al. Adaptive regulation of testis gene expression and control of male fertility by the Drosophila hairpin RNA pathway. Mol. Cell 57, 165–178 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Tao, Y. et al. A sex-ratio meiotic drive system in Drosophila simulans. II: an X-linked distorter. PLoS Biol. 5, e293 (2007).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lin, C.-J. et al. The hpRNA/RNAi pathway is essential to resolve intragenomic conflict in the Drosophila male germline. Dev. Cell 46, 316–326.e5 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Flemr, M. et al. A retrotransposon-driven Dicer isoform directs endogenous small interfering RNA production in mouse oocytes. Cell 155, 807–816 (2013).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Tam, O. H. et al. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453, 534–538 (2008).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Su, R. et al. Global profiling of RNA-binding protein target sites by LACE-seq. Nat. Cell Biol. 23, 664–675 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Begcy, K. & Dresselhaus, T. Tracking maize pollen development by the leaf collar method. Plant Reprod. 30, 171–178 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bass, H. W. et al. A maize root tip system to study DNA replication programmes in somatic and endocycling nuclei during plant development. J. Exp. Bot. 65, 2747–2756 (2014).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Kalkar, S. A. & Neha, K. Evaluation of FDA staining technique in stored maize pollen. Middle East J. Sci. Res. 12, 560–562 (2012).

  • Nagar, R. & Schwessinger, B. DNA size selection (>3–4 kb) and purification of DNA using an improved homemade SPRIbeads solution. Protocols.io https://doi.org/10.17504/protocols.io.n7hdhj6 (2018).

  • Schalamun, M., Nagar, R. & Kainer, D. Harnessing the MinION: an example of how to establish long‐read sequencing in a laboratory using challenging plant tissue from Eucalyptus pauciflora. Mol. Ecol. https://doi.org/10.1111/1755-0998.12938 (2018).

  • Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 314–324 (IEEE, 2019).

  • Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long read assembly. Bioinformatics https://doi.org/10.1093/bioinformatics/btz891 (2019).

  • Aury, J.-M. & Istace, B. Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads. NAR Genom. Bioinform. 3, lqab034 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Springer, N. M. et al. The maize W22 genome provides a foundation for functional genomics and transposon biology. Nat. Genet. 50, 1282–1288 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10 (2011).

    Article 

    Google Scholar
     

  • Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  • Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907 (2012).

  • Takagi, H. et al. QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. Plant J. 74, 174–183 (2013).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rubinacci, S., Ribeiro, D. M., Hofmeister, R. J. & Delaneau, O. Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat. Genet. 53, 120–126 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2009).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Alexa, A. & Rahnenfuhrer, J. topGO: enrichment analysis for gene ontology. R package version 2.42.0 (2023).

  • Sayols, S. rrvgo: a Bioconductor package for interpreting lists of Gene Ontology terms. MicroPubl. Biol. https://doi.org/10.17912/micropub.biology.000811 (2023).

  • Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, 160–165 (2016).

    Article 

    Google Scholar
     

  • Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Axtell, M. J. ShortStack: comprehensive annotation and quantification of small RNA genes. RNA 19, 740–751 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gruber, A. R., Lorenz, R., Bernhart, S. H., Neuböck, R. & Hofacker, I. L. The Vienna RNA websuite. Nucleic Acids Res. 36, W70–W74 (2008).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • German, M. A., Luo, S., Schroth, G., Meyers, B. C. & Green, P. J. Construction of parallel analysis of RNA ends (PARE) libraries for the study of cleaved miRNA targets and the RNA degradome. Nat. Protoc. 4, 356–362 (2009).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dai, X., Zhuang, Z. & Zhao, P. X. psRNATarget: a plant small RNA target analysis server (2017 release). Nucleic Acids Res. 46, W49–W54 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Szpiech, Z. A. selscan 2.0: scanning for sweeps in unphased data. Bioinformatics 40, btae006 (2024).

  • Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Grzybowski, M. W. et al. A common resequencing-based genetic marker data set for global maize diversity. Plant J. 113, 1109–1121 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Yang, N. et al. Two teosintes made modern maize. Science 382, eadg8940 (2023).

  • Browning, B. L., Tian, X., Zhou, Y. & Browning, S. R. Fast two-stage phasing of large-scale sequence data. Am. J. Hum. Genet. 108, 1880–1890 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Portwood, J. L. II et al. MaizeGDB 2018: the maize multi-genome genetics and genomics database. Nucleic Acids Res. 47, D1146–D1154 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Stitzer, M. C. & Ross-Ibarra, J. Maize domestication and gene interaction. New Phytol. 220, 395–408 (2018).

    Article 
    PubMed 

    Google Scholar
     

  • Walley, J. W. et al. Integration of omic networks in a developmental atlas of maize. Science 353, 814–818 (2016).

    Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liu, L. & Li, J. Communications between the endoplasmic reticulum and other organelles during abiotic stress response in plants. Front. Plant Sci. 10, 749 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Taurino, M. et al. SEIPIN proteins mediate lipid droplet biogenesis to promote pollen transmission and reduce seed dormancy. Plant Physiol. 176, 1531–1546 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Beissinger, T. M. et al. Recent demography drives changes in linked selection across the maize genome. Nat. Plants 2, 16084 (2016).

    Article 
    PubMed 

    Google Scholar
     

  • [ad_2]

    Source link

  • The complex polyploid genome architecture of sugarcane

    [ad_1]

    Genome sequencing

    Illumina libraries

    Illumina libraries for this manuscript were sequenced on a combination of Illumina X10, HiSeq and NovaSeq platforms. HipMer assembly and selfed progeny (Extended Data Fig. 1a): sequencing libraries were constructed using an Illumina TruSeq DNA PCR-free library kit using standard protocols. Libraries were sequenced on an Illumina X10 instrument using paired ends and a read length of 150 base pairs.

    Single flow-sorted chromosome libraries

    Sequencing libraries were constructed using an Illumina TruSeq DNA Nano library kit using standard protocols. Libraries were sequenced on either the Illumina HiSeq2500 or NovaSeq 6000 instrument using paired ends and a read length of 150 base pairs.

    Remaining Illumina libraries

    Illumina Tight Insert Fragment, 400 bp–2 ug of DNA was sheared to 400 bp using the Covaris LE220 and size selected using the Pippin (Sage Science). The fragments were treated with end-repair, A-tailing and ligation of Illumina compatible adaptors (IDT) using the KAPA-Illumina library creation kit (KAPA Biosystems). The prepared libraries were quantified using KAPA Biosystems’ next-generation sequencing library qPCR kit (Roche) and run on a Roche LightCycler 480 real-time PCR instrument. The quantified libraries were then prepared for sequencing on the Illumina HiSeq sequencing platform using a TruSeq Rapid paired-end cluster kit, v.2, with the HiSeq 2500 sequencer instrument to generate a clustered flowcell for sequencing. Sequencing of the flowcell was performed on the Illumina HiSeq 2500 sequencer using HiSeq Rapid SBS sequencing kits, v.2, following a 2 × 250 indexed run recipe.

    PacBio libraries

    Continuous long-read PacBio sequencing primer was then annealed to the SMRTbell template library and sequencing polymerase was bound to them using a Sequel Binding kit v.2.1. The prepared SMRTbell template libraries were then sequenced on a Pacific Biosystem Sequel sequencer using v.3 sequencing primer, 1 M v.2 single-molecule real-time cells and v.2.1 sequencing chemistry with 1 × 600 sequencing video run times. PacBio HiFi sequencing was performed using circular consensus sequencing (CCS) mode on a PacBio Sequel II instrument. High molecular weight DNA was either needle-sheared or sheared using a Diagenode Megaruptor 3 instrument. Libraries were constructed using SMRTbell Template Prep Kit v.2.0 and tightly sized on a SAGE ELF instrument (1–18 kb). Sequencing was performed using a 30 h video time with 2 h pre-extension and the resulting raw data was processed using the CCS4 algorithm.

    RNA-seq libraries

    Illumina RNA-Seq with poly(A) selection plate-based RNA sample preparation was performed on the PerkinElmer Sciclone NGS robotic liquid handling system using Illumina’s TruSeq Stranded mRNA HT sample prep kit using poly(A) selection of mRNA following the protocol outlined by Illumina in their user guide: https://support.illumina.com/sequencing/sequencing_kits/truseq-stranded-mrna.html, and with the following conditions: total RNA starting material was 1 ug per sample and eight cycles of PCR were used for library amplification. The prepared libraries were quantified using KAPA Biosystems’ next-generation sequencing library qPCR kit and run on a Roche LightCycler 480 real-time PCR instrument. Sequencing of the flowcell was performed on the Illumina NovaSeq sequencer using NovaSeq XP v.1 reagent kits and an S4 flowcell, following a 2 × 150 bp indexed run recipe.

    Chromosome in situ hybridization

    Chromosome mitotic metaphase preparations and fluorescence in situ hybridization were performed as described in ref. 13. The S. spontaneum retro-transposon specific oligo probe was designed by Arbor Biosciences using their proprietary software based on the retro-transposon sequences as described in ref. 50. Probes were either labelled with fluorochromes ATTO 488 or ATTO 550.

    Single flow-sorted chromosome preparation

    Stems of adult plants were cut into single-bud segments, cleaned and soaked in 0.5% carbendazim solution for 24 h, placed in a plastic tray, covered with wet perlite and incubated at 32 °C in the dark, until the roots were approximately 1.5 cm long. For cell-cycle synchronization and accumulation of metaphases, the segments were washed in ddH2O, then transferred to a plastic tray filled with 150 ml 0.1 × Hoagland solution containing 3 mmol l−1 hydroxyurea and incubated at 25 or 32 °C for 18 h in the dark. After a 2 h recovery treatment, the roots were immersed in 2.5 µmo l−1 amiprophos-methyl solution and incubated for 3 h at 25 or 32 °C. Suspensions of intact chromosomes were prepared by mechanical homogenization of root tips fixed with 3% formaldehyde and 0.5% Triton X-100, and stained with 4′,6-Diamidino-2-phenylindole dihydrochloride (DAPI)51. The instrument used for flow sorting was a FACSAria II SORP flow cytometer (BD Biosciences) and Beckman Coulter MoFlo AstriosEQ cell sorter (Beckman Coulter). The software used was FACSDiva v.6.1.3 (BD Biosciences) and Summit v.6.2.2 (Beckman Coulter). For chromosome sorting, initial gating was set on dotplots DAPI-A versus FSC-A and the final sorting gate was set on DAPI-A versus DAPI-W dotplots to exclude chromosome doublets (Supplementary Fig. 15). The identity of flow-sorted fractions was determined by fluorescence microscopy of chromosomes sorted onto microscope slides51. The analysis revealed that chromosomes could be separated into a few size fractions and while the sorted populations were 100% pure chromosomes, it was not possible to sort individual sugarcane chromosomes. To overcome this problem and prepare samples of chromosome-specific DNA for sequencing, single copies of chromosomes were sorted and their DNA amplified52. This strategy for preparing sugarcane chromosomes for flow cytometry was first described in ref. 51 and is a modification of the protocol described in ref. 53.

    Optical map construction

    Ultra-high molecular weight (uHMW) DNA was isolated from agarose-embedded nuclei as previously described in ref. 54 with some modifications. Approximately 2 g of young, healthy R570 leaves were collected and fast-frozen in a 50 ml conical tube, ground in a mortar with liquid nitrogen and briefly incubated in Bionano homogenization buffer (HB+; Bionano Plant DNA isolation Kit; Bionano Genomics). Cell debris was filtered out by sequentially passing the homogenate through 100 µm and 40 µm cell strainers. Nuclei in suspension were pelleted by centrifugation at 2,000g at 8 °C for 20 min, resuspended in 3 ml homogenization buffer HB+ and subjected to discontinuous density gradient centrifugation as described in the Plant Tissue DNA Isolation Base Protocol (Revision D; Bionano Genomics). The nuclei-enriched interphase layer was recovered, pelleted and embedded in low-melting-point agarose using a 90-µl CHEFgel electrophoresis plug mould (Bio-Rad). The resulting plug was incubated twice, for a total of 12 h at 50 °C, in Bionano Lysis buffer supplemented with 1.6 mg ml−1 Puregene Proteinase K, washed four times in Bionano Wash Buffer and five times in TE buffer. The uHMW nDNA was recovered by melting and digesting the plug with agarase at 43 °C, followed by drop dialysis. In total, approximately 9 µg uHMW DNA was recovered at a concentration of 136 ng µl−1 and used for subsequent genome mapping processes.

    Genome mapping was performed using the Bionano Genomics Direct Label and Stain chemistry in a Bionano Saphyr instrument, using the method described in ref. 55, with a few modifications. Approximately 800 ng of uHMW DNA was used per reaction and a total of eight flow cells were loaded to collect molecules with a total combined length of 3,499,160 Mbp. A subset of 1,650,737 molecules with a minimum length of 450 kb, and N50 of 547 kb were selected for assembly. The final total combined length of the filtered subset was 1,097,878,758 bp, with estimated effective coverage of assembly of ×101.2.

    Genome assembly was performed using the Bionano Genomics Access software platform (Bionano Tools v.1.3.8041.8044; Bionano Solve v.3.3_10252018), running the pipeline v.7981 and RefAligner v.7989. Two separated assemblies were performed using the optArguments_nonhaplotype_noES_BG_DLE1_saphyr.xml parameters. The initial assembly was performed without complex multi-path region (CMPR) cuts and produced 570 maps with a N50 length of 36.444 Mbp and total map length of 7,654.039 Mbp. One additional assembly was performed using the CMPR cut option, which introduces map cuts at potential duplications to reduce potential homeolog and phase switching. CMPR-cut-enabled assembly generated 1,512 maps with N50 length of 9.546 Mbp and total map length of 9,282.351 Mbp.

    PacBio HiFi Bionano hybrid scaffolds were generated using the Bionano Genomics Access software (Tools v.1.3) and the DLE-1 configuration file hybridScaffold_DLE1_config.xml using auto-conflict resolution. In total, the genome was captured in 122 hybrid scaffolds (Scaffold N50 = 78.823 and maximum scaffold size of 131.769 Mbp. The total scaffold length was 5,074 Mbp, with 4.9 Mbp of sequence remaining un-scaffolded.

    Genome assembly overview

    Complete representation of all sequences in the 10 Gb genome of R570 was impossible without artificially duplicating collapsed sequences, of which there are many. To scaffold the contigs into chromosomes, we applied five complementary techniques (Supplementary Data). First, we used the Bionano optical map to initially order contigs into long-range scaffolds. Second, scaffolds were clustered into homeologous groups based on 237 linkage groups constructed from approximately 1.8 million simplex markers that were assayed from 96 self-pollinated progeny. Third, additional clustering was performed using genetic markers derived from single flow-sorted chromosome libraries sequenced from R570 (refs. 52,53). After making initial joins, both simplex and single-chromosome genetic markers were re-aligned putative chromosomes to investigate misjoins, which were broken and corrected. Fourth, we resolved overlapping scaffolds by checking for redundant collinear sets of Sorghum bicolor gene models mapped against the contigs using pblat56 with default parameters. Finally, we manually evaluated chromatin linkages from 558 Gb (approximately ×56) Hi-C data to manually verify joins made between scaffolds during chromosome construction (Extended Data Fig. 1a). The highly contiguous primary assembly (5.04 Gb, 12.6 Mb contig N50; 67 chromosomes) also includes optical scaffolds (‘os’; n = 20) and unanchored scaffolds (n = 56). The primary assembly contains 0.1% gaps with an LTR assembly index21 (LAI; measure of intact LTR elements) of 22.82, indicating the assembly is high quality and complete. Where possible, the alternate assembly (3.73 Gb, 2.1 Mb contig N50; comprised of nearly identical haplotypes in the primary assembly; discussed in Supplementary Data), was physically anchored to the most similar chromosome in the primary assembly based on best unique alignments using minimap2 (v.2.20-r1061)57. Contigs and scaffolds that did not have a single best unique alignment were left unanchored. It should be noted that this sequence similarity-based grouping does not suggest that contigs on alternative scaffolds with the same name (for example, Chr6E and Chr6E_alt) necessarily come from the same biological haplotype. Thus, we provide the alternate scaffolds to represent the complete population of sequences in R570, and not as a source for global comparisons against the primary or other reference genomes.

    Collapsed haplotypes

    To determine which regions of the genome were perfectly identical and collapsed into a single haplotype (in contrast to the alternate assembly that contains nearly identical haplotypes, which could be distinguished by the assembler but most often not by unique HiFi read placements), PacBio HiFi reads were re-aligned back to the assembly using minimap2 (ref. 57) (parameters: -M 0 –secondary=no –hard-mask-level -t 30 -x asm5). Read coverage (script: combinePAFsAndCount.R) was calculated using script: relative to the median depth (37) per 10 kb window, ignoring repetitive regions where the median coverage was greater than five (greater than ×185 raw coverage). Depth classifications (×0–4) were calculated from the median coverage ranges (×0 (0–0.25), ×1 (0.25–1.4), ×2 (1.4–2.3), ×3 (2.3–3.5), ×4 (3.5–5.0)), based on histogram peaks. Depth classifications per 10 kb window were converted to their run-length equivalent using the script: convertCountsToRLEs.R. To ensure accurate representation of haplotypes, NucFreq54 was used to analyse regions where haplotypes were collapsed (×2–4 depth regions; approximately 1.2 Gb of primary genome sequence). In summary, HiFi reads were aligned to the combined primary and alternate assembly using pbmm2 (v.1.1.0; parameters: –log-level DEBUG –preset SUBREAD –min-length 5,000 –sort). Samtools58 was then used to merge individual bam files (from each HiFi sequencing run) and exclude unmapped reads and supplementary alignments. (samtools view -F 2308). The NucFreq output coverage bed (obed) file was converted to run-length equivalents (script: RLEruns.R), where alternate base calls were greater than 20% of the combined coverage. To ensure adequate coverage for analysis, regions with outlier depth ranges beyond the 10th and 90th percentiles were excluded. Additionally, repetitive regions of the genome (95% repetitive, masked with a 24mer and 10 kb regions where greater than 90% of bases were annotated as retrotransposons (from LAI analysis) were also excluded using BEDtools59 subtract. Of the approximately 1.2 Gb considered, approximately 4.8 Mb of sequence (0.4% of considered regions; 0.1% of bases within constructed primary chromosomes) appear to contain non-identically collapsed haplotypes, mainly driven by high depth collapsed regions (×2–3 depth regions = 0.3% of bases; ×4 depth regions = 1.5% of bases).

    Genome annotation

    Gene models were annotated using our PERTRAN pipeline (described in detail in ref. 60 using approximately 3.7 B pairs of 2 × 150 stranded paired-end Illumina RNA-seq and 31 M PacBio Iso-Seq CCSs reads. In short, PERTRAN conducts genome-guided transcriptome short read assembly via GSNAP (v.2013-09-30)61 and builds splice alignment graphs after alignment validation, realignment and correction. The resulting approximately 1.5 M putative full-length transcripts were corrected and collapsed by genome-guided correction pipeline, which aligns CCS reads to the genome with GMAP61 with intron correction for small indels in splice junctions if any and clusters alignments when all introns are the same or 95% overlap for single exon. Subsequently 1,763,610 transcript assemblies were constructed using PASA (v.2.0.2)62 from RNA-seq transcript assemblies above. Homology support was provided by alignments to 17 publicly available genomes and Swiss-Prot proteomes. Gene models were predicted by homology-based predictors, FGENESH+ (v.3.1.0)63, FGENESH_EST (similar to FGENESH+, but using expressed sequence tags (ESTs) to compute splice site and intron input instead of protein/translated open reading frames (ORFs) and EXONERATE (v.2.4.0)64, PASA assembly ORFs (in-house homology constrained ORF finder) and from AUGUSTUS (v.3.1.0)65 trained by the high confidence PASA assembly ORFs and with intron hints from short read alignments. We improved these preliminary annotations by comparing sequences and gene quality between R570 subgenomes by aligning high-quality gene models between subgenomes and forming gene models from intragenomic alignments. We compared scores between these intragenomic homology-based models and the PASA assemblies; higher-scoring homology supported models that were not contradicted by transcriptome evidence were retained to replace existing partial copy. The selected gene models were subject to Pfam analysis and gene models with greater than 30% Pfam TE domains were removed. We also removed (1) incomplete, (2) low-homology-supported without full transcriptome support and (3) short single exon (less than 300 BP CDS) without protein domain nor transcript support gene models. Repetitive sequences were defined using de novo by RepeatModeler (v.open1.0.11)66 and known repeat sequences in RepBase.

    Comparative genomics

    Syntenic orthologs among the R570 primary annotation, S. bicolor (v.3.1)67, S. spontaneum (genotype AP85-441)32, Setaria viridis (v.2.1)68 and the R570 monoploid path16 were inferred via GENESPACE (v.0.9.4)23 pipeline using default parameters (analysis script: genespaceCommands.R). In brief, GENESPACE compares protein similarity scores into syntenic blocks using MCScanX69 and uses Orthofinder (v.2.5.4)70 to search for orthologs/paralogs within synteny constrained blocks. Syntenic blocks were used to query pairwise peptide differences among progenitor alleles, determine divergence among progenitor orthologs using S. bicolor syntenic anchors and search for progenitor specific orthogroups (scripts, PID_calc.R; GENESPACE_orthogroupParsing.R; Jupyter Notebook: r570_orthogroupProgenitorAnalysis_forSupp.ipynb).

    Structural variants

    To identify the large structural rearrangements (inversions, translocations and inverted translocations) and local variations (insertions and deletions), each homeologous chromosome group (B, C, D, E, F, G) was aligned to chromosome A using minimap2 (v.2.20-r1061)57 with parameter setting ‘-ax asm5 -eqx’. The resulting alignments were used to identify structural variations with SyRI (v.1.6)71 and annotation gff3 was used to obtain genes affected by variations between homeologous chromosomes.

    Orthogroup diversity

    Calculation of mean pairwise differences among progenitor specific homeologs was performed by first extracting all pairwise combinations of progenitor assigned alleles within orthogroups that were anchored by an S. bicolor ortholog. Among these, 25,000 peptide pairs per progenitor were randomly selected and pairwise aligned using R package Biostrings (v.2.70.2)72. Pairwise identity calculation was based on matches/alignment length (PID2; script PID_calc.R). Multiple sequence alignments among syntenic orthogroups for sugar transport gene candidates were performed using MAFFT (v.7.487)73 and were visualized using ggmsa74 (script MSAalignmentPlots.R). Fold scores for each peptide were calculated using ESMfold (v.2.0.1)75.

    Resistance gene analogues

    RGAs were annotated on scaffolds larger than 10 megabases with NLR-Annotator (v.2)38 using default parameters. The 4,116 predicted RGAs (Supplementary Table 11) were assigned to progenitors by intersecting the location of each motif with progenitor assignment blocks (Supplementary Table 6).

    Progenitor divergence

    To determine the neutral substitution rate between S. officinarum and S. spontaneum, 45,000 random ortholog pairs were extracted from all pairwise combinations of progenitor assigned alleles (n = 193,815) within S. bicolor anchored orthogroups. Peptide sequence pairs were aligned using MAFFT (v.7.487)73 and converted into coding sequence (CDS) using pal2nal (v.13)76. Pairwise synonymous mutation rates (Ks) among sequences were calculated using seqinr (v.4.2-16)77, finding a single synonymous (ks) mutation peak at 0.012 (Supplementary Fig. 13). Assuming a neutral nuclear mutation rate of 0.383 × 10−8 to 0.386 × 10−8 (ref. 78), S. officinarum and S. spontaneum diverged approximately 1.55–1.56 million years ago.

    Bru1 genetic and physical maps

    We developed a map-based cloning approach adapted to the high polyploid context of sugarcane to target the durable major rust resistance gene Bru1. Haplotype-specific chromosome walking was performed through fine genetic mapping exploiting 2,383 individuals from self-progenies of R570 and physical mapping exploiting two BAC libraries44,79. The high-resolution genetic map of the targeted region included flanking markers for Bru1 (at 0.14 and 0.28 cM), 13 co-segregating markers and the partial BAC physical map of the target haplotype included two gaps44; Fig. 3b. To complete the physical map of the target Bru1 haplotype, we constructed a new BAC library (using enzyme BamHI) using a mix of DNA from four brown-rust-resistant individuals from the R570 S1 population. The BAC library contained 119,040 clones with an average insert size of 130 kb and covered 3.2-fold the target haplotype and 1.6-fold the total genome.

    BAC-ends and BAC subclones from the four BACs (CIR009O20, 022M06, CIR012E03 and 164H22) surrounding the two remaining gaps (‘left’ and ‘right’) in the physical map of the Bru1 haplotype were isolated and used for chromosome walking (as described in ref. 44). Two BACs (CIRB251D13 (150 kb) and CIRB286F09 (130 kb)) were identified and sequenced to fill the right gap. Five BACs (CIRB009N07 (100 kb), CIRB114G05 (100 kb), CIRB127D08 (125 kb), CIRB210D07 (105 kb) and CIRB236L05 (150 kb)) reduced the size of the left gap by 35 kb, but an unsized gap remained. The R570 genome assembly spanned the entirety of the Bru1 target haplotype region with one contig, closing the left gap (99,750 bp) enabling all candidate genes in the region to be investigated (Fig. 3b).

    Bru1 candidate genes

    The target gap-filled haplotype that represented 0.42 cM and 309 kb was manually annotated, predicting a total of 13 genes (Fig. 3b and Supplementary Table 13). Nine of these genes were also present on all or some of the hom(e)ologous BACs/haplotypes in the R570 genome27. Three of the curated genes were present only in the insertion specific to the Bru1 haplotype. Other whole-genome annotated genes (SoffiXsponR570.03Dg024000; SoffiXsponR570.03Dg024100; SoffiXsponR570.03Dg024600; SoffiXsponR570.03Dg024700) in the region were short, mono-exonic peptides that either contained no protein domains or appeared to be annotated transposable elements, and thus were not supported in the curated candidate gene list (Supplementary Table 13). Among the 13 predicted genes, we searched genes that presented high homology with genes already shown to be involved in resistance mechanisms. We identify five such genes, four genes encoding serine/threonine kinases (genes 1, 5, 7 and 8) and one gene encoding an endoglucanase (gene 13). Annotation of these genes was refined manually through phylogenetic analysis that included genes with high homology from other plants present in databases and search of conserved functional protein domains.

    Gene 13, which encodes an endoglucanase, comprised 3 exons and two introns with a genomic size of 1.8 kb for a predicted transcript of 1.5 kb. Sequence alignment and phylogenetic analyses performed with beta-1-4 endoglucanase and beta-1-3 endoglucanase from monocots and dicots showed that gene 13 belongs to the beta-1-4 endoglucanase. This gene presents high homology (greater than 60%) with beta-1-4 endoglucanase from other plants and has the highest homology (88% of identity, 100% coverage) with the orthologous Miscanthus gene (CAD6248271.1). Beta-1-4 endoglucanases are involved in cell development80 in particular on elongation of the cell wall81 but have not been reported as involved in disease resistance. This suggested that this gene is not a good candidate for being Bru1.

    Gene 1 is composed of eight exons and seven introns. Its genomic size is 4.3 kb and the CDS size is 882 bp. The protein encoded by the gene has 96.5 % identity (100% coverage) with a kinase involved in cell division control in Sorghum (XP_002451427.1) and therefore, it did not appear to be a good candidate.

    Gene 5 is composed of six exons and five introns. Its genomic size was 1.1 kb and the predicted CDS size 534 bp. Alignment of its amino acid sequence with Interpro conserved protein domain database showed that only part of the protein (exons 4 to 6) has homology with subdomains VIb to XI of the serine/threonine kinases. This serine/threonine kinase was thus not complete, lacking some of the functional subdomains and appeared to be a pseudogene. Therefore, it did not appear to be a good candidate.

    Gene 7 is composed of six exons and five introns, and gene 8 has four exons and three introns. Both present homology with receptor-like kinases. Annotation of conserved protein domains showed that gene 7 has all the 12 subdomains of kinases and thus could encode a functional protein, while gene 8 encompasses only part of these sub domains (I to VII) and could correspond to a pseudokinase. The classification with the ITAK database (http://itak.feilab.net/cgi-bin/itak/index.cgi) revealed they both belong to the RLK-PELLE-DSLV family45, the same family to which belong the barley stem rust resistance gene (RPG1 (ref. 46)) and the wheat yellow rust resistance gene (Yr15 (ref. 47)) shown to be a tandem kinase-pseudokinase (TKP). In addition, the third intron of gene 7 has a very large size of approximately 11 kb, including a large TE, a particular structure shared with RPG1 and Yr15 TKPs. Bru1, like RPG1 and Yr15, is among the relatively rare resistance genes that confer durable fungal resistance. This tandem kinase-pseudokinase (TKP7 and TKP8) is therefore a solid candidate for Bru1.

    Reporting summary

    Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

    [ad_2]

    Source link