Plant breeding – Carbon Chemist

Tag: Plant breeding

Pan-genome bridges wheat structural variations with habitat and breeding

[ad_1]
International Wheat Genome Sequencing Consortium (IWGSC). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018).

Article

Google Scholar
Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature 588, 277–283 (2020).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Salamini, F., Zkan, H., Brandolini, A., Schfer-Pregl, R. & Martin, W. Genetics and geography of wild cereal domestication in the near east. Nat. Rev. Genet. 3, 429–441 (2002).

Article
CAS
PubMed

Google Scholar
The International Wheat Genome Sequencing Consortium (IWGSC). A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014).

Article

Google Scholar
Feldman, M. & Levy, A. A. Genome evolution due to allopolyploidization in wheat. Genetics 192, 763–774 (2012).

Article
CAS
PubMed
PubMed Central

Google Scholar
Biehl, P. F. et al. Ancient DNA from 8400 year-old catalhöyük wheat: implications for the origin of neolithic agriculture. PLoS ONE 11, e0151974 (2016).

Article

Google Scholar
Zhao, X. B. et al. Population genomics unravels the Holocene history of bread wheat and its relatives. Nat. Plants 9, 403–419 (2023).

Article
PubMed

Google Scholar
Michael F, S. et al. A 3,000-year-old Egyptian emmer wheat genome reveals dispersal and domestication history. Nat. Plants 5, 1120–1128 (2019).

Article

Google Scholar
Mcclatchie, M. et al. Neolithic farming in north-western Europe: archaeobotanical evidence from Ireland. J. Archaeol. Sci. 51, 206–215 (2014).

Article

Google Scholar
Liu, X. et al. From ecological opportunism to multi-cropping: mapping food globalisation in prehistory. Quat. Sci. Rev. 206, 21–28 (2019).

Article
ADS

Google Scholar
Hao, C. et al. Resequencing of 145 landmark cultivars reveals asymmetric sub-genome selection and strong founder genotype effects on wheat breeding in China. Mol. Plant 13, 1733–1751 (2020).

Article
CAS
PubMed

Google Scholar
Zhuang, Q. S. Chinese Wheat Improvement and Pedigree Analysis [Chinese] (Agricultural Press, 2003).
Murukarthick, J., Mona, S., Nils, S. & Martin, M. Building pan-genome infrastructures for crop plants and their use in association genetics. DNA Res. 28, dsaa030 (2021).

Article

Google Scholar
Lei, L., Goltsman, E., Goodstein, D., Wu, G. A. & Vogel, J. P. Plant pan-genomics comes of age. Annu. Rev. Plant Biol. 72, 411–435 (2021).

Article
CAS
PubMed

Google Scholar
Mona, S., Murukarthick, J., Nils, S. & Martin, M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat. Rev. Genet. https://doi.org/10.1038/s41576-024-00691-4 (2024).
Zhang, X. Y. & Appels, R. in The Wheat Genome (eds Appels, R. et al.) 93–111 (Springer, 2023).
Castillo, F. A. The Oxford Handbook of the Archaeology of Diet (Oxford Univ. Press, 2015).
Simon G, K. et al. A putative ABC transporter confers durable resistance to multiple fungal pathogens in wheat. Science 323, 1360–1363 (2009).

Article
ADS

Google Scholar
Fu, D. et al. A kinase-START gene confers temperature-dependent resistance to wheat stripe rust. Science 323, 1357–1360 (2009).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Wang, B. et al. De novo genome assembly and analyses of 12 founder inbred lines provide insights into maize heterosis. Nat. Genet. 55, 312–323 (2023).

Article
CAS
PubMed

Google Scholar
Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 3542–3558.e16 (2021).

Article
CAS
PubMed

Google Scholar
Song, L. et al. Reducing brassinosteroid signalling enhances grain yield in semi-dwarf wheat. Nature 617, 118–124 (2023).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Németh, A. & Längst, G. Genome organization in and around the nucleolus. Trends Genet. 27, 149–156 (2011).

Article
PubMed

Google Scholar
Kishii, M. & Mao, L. Synthetic hexaploid wheat: yesterday, today, and tomorrow. Engineering 4, 552–558 (2018).

Article

Google Scholar
Guo, W. et al. Origin and adaptation to high altitude of Tibetan semi-wild wheat. Nat. Commun. 11, 5085 (2020).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Zhou, Y. et al. Triticum population sequencing provides insights into wheat adaptation. Nat. Genet. 52, 1412–1422 (2020).

Article
CAS
PubMed

Google Scholar
Monat, C., Padmarasu, S., Lux, T., Wicker, T. & Mascher, M. TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biol. 20, 284 (2019).

Article
CAS
PubMed
PubMed Central

Google Scholar
Athiyannan, N. et al. Long-read genome sequencing of bread wheat facilitates disease resistance gene cloning. Nat. Genet. 54, 227–231 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar
Kale, S. M. et al. A catalogue of resistance gene homologs and a chromosome-scale reference sequence support resistance gene mapping in winter wheat. Plant Biotechnol. J. 20, 1730–1742 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar
Li, B. et al. Wheat centromeric retrotransposons: the new ones take a major role in centromeric structure. Plant J. 73, 952–965 (2013).

Article
CAS
PubMed

Google Scholar
Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620, 830–838 (2023).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Wang, Z. et al. Dispersed emergence and protracted domestication of polyploid wheat uncovered by mosaic ancestral haploblock inference. Nat. Commun. 13, 3891 (2022).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Cheng, H., Liu, J., Wen, J., Nie, X. & Jiang, Y. Frequent intra- and inter-species introgression shapes the landscape of genetic variation in bread wheat. Genome Biol. 20, 136 (2019).

Article
PubMed
PubMed Central

Google Scholar
Oliver, S. N., Finnegan, E. J., Dennis, E. S., Peacock, W. J. & Trevaskis, B. Vernalization-induced flowering in cereals is associated with changes in histone methylation at the VERNALIZATION1 gene. Proc. Natl Acad. Sci. USA 106, 8386–8391 (2009).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Alonge, M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161.e23 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar
Li, G. et al. A high-quality genome assembly highlights rye genomic characteristics and agronomically important genes. Nat. Genet. 53, 574–584 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar
Rabanus-Wallace, M. T. et al. Chromosome-scale genome assembly provides insights into rye biology, evolution and agronomic potential. Nat. Genet. 53, 564–573 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar
Gabay, G., Zhang, J., Burguener, G. F., Howell, T. & Dubcovsky, J. Structural rearrangements in wheat (1BS)–rye (1RS) recombinant chromosomes affect gene dosage and root length. Plant Genome 14, e20079 (2021).

Article
CAS
PubMed

Google Scholar
Zhou, Y. et al. Introgressing the Aegilops tauschii genome into wheat as a basis for cereal improvement. Nat. Plants 7, 774–786 (2021).

Article
CAS
PubMed

Google Scholar
Song, J. M. et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat. Plants 6, 34–45 (2020).

Article
CAS
PubMed
PubMed Central

Google Scholar
Saayman, X., Graham, E., Nathan, W. J., Nussenzweig, A. & Esashi, F. Centromeres as universal hotspots of DNA breakage, driving RAD51-mediated recombination during quiescence. Mol. Cell 83, 523–538.e7 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar
Nambiar, M. & Smith, G. R. Pericentromere-Specific cohesin complex prevents meiotic pericentric DNA double-strand breaks and lethal crossovers. Mol. Cell 71, 540–553.e4 (2018).

Article
CAS
PubMed
PubMed Central

Google Scholar
He, F. et al. Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome. Nat. Genet. https://doi.org/10.1038/s41588-019-0382-2 (2019).
Zhao, J. et al. Centromere repositioning and shifts in wheat evolution. Plant Commun. 4, 100556 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar
Scott A, B. et al. Ppd-1 is a key regulator of inflorescence architecture and paired spikelet development in wheat. Nat. Plants 1, 14016 (2015).

Article

Google Scholar
Yan, L. L. et al. The wheat VRN2 gene is a flowering repressor down-regulated by vernalization. Science 303, 1640–1644 (2004).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Yan, L. et al. Positional cloning of the wheat vernalization gene VRN1. Proc. Natl Acad. Sci. USA 100, 6263–6268 (2003).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Hazen, S. P. et al. Copy number variation affecting the Photoperiod-B1 and Vernalization-A1 genes is associated with altered flowering time in wheat (Triticum aestivum). PLoS ONE https://doi.org/10.1371/journal.pone.0033234 (2012).
Würschum, T., Boeven, P. H. G., Langer, S. M., Longin, C. F. H. & Leiser, W. L. Multiply to conquer: copy number variations at Ppd-B1 and Vrn-A1 facilitate global adaptation in wheat. BMC Genet. 16, 96 (2015).

Article
PubMed
PubMed Central

Google Scholar
Giroux, M. J. & Morris, C. F. Wheat grain hardness results from highly conserved mutations in the friabilin components puroindoline a and b. Proc. Natl Acad. Sci. USA 11, 6262–6266 (1998).

Article
ADS

Google Scholar
Xie, T. et al. De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. Mol. Plant 8, 489–492 (2015).

Article
CAS
PubMed

Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

Article
CAS
PubMed
PubMed Central

Google Scholar
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).

Article
PubMed
PubMed Central

Google Scholar
Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).

Article
CAS
PubMed

Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

Article
CAS
PubMed
PubMed Central

Google Scholar
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

Article
CAS
PubMed

Google Scholar
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR assembly index (LAI). Nucleic Acids Res. 46, e126 (2018).

PubMed
PubMed Central

Google Scholar
Burkhard, S. et al. The NLR-Annotator tool enables annotation of the intracellular immune receptor repertoire. Plant Physiol. 183, 468–482 (2020).

Article

Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics https://doi.org/10.1002/0471250953.bi0410s05 (2009).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

Article
CAS
PubMed
PubMed Central

Google Scholar
Yu, X. J., Zheng, H. K., Wang, J., Wang, W. & Su, B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics 88, 745–751 (2006).

Article
CAS
PubMed

Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).

Article
CAS
PubMed
PubMed Central

Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).

Article
CAS
PubMed
PubMed Central

Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).

Article
CAS
PubMed
PubMed Central

Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

Article
CAS
PubMed

Google Scholar
Guigo, R. Assembling genes from predicted exons in linear time with dynamic programming. J. Comput. Biol. 5, 681–702 (1998).

Article
CAS
PubMed

Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).

Article
CAS
PubMed

Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).

Article
PubMed
PubMed Central

Google Scholar
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

Article
PubMed
PubMed Central

Google Scholar
Ghosh, S. & Chan, C. K. Analysis of RNA-seq data using TopHat and Cufflinks. Methods Mol. Biol. 1374, 339–361 (2016).

Article
CAS
PubMed

Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

Article
PubMed
PubMed Central

Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

Article
CAS
PubMed

Google Scholar
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

Article
CAS
PubMed

Google Scholar
Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215 (2009).

Article
CAS
PubMed

Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

Article
CAS
PubMed
PubMed Central

Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).

Article
CAS
PubMed
PubMed Central

Google Scholar
Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005).

Article
CAS
PubMed
PubMed Central

Google Scholar
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).

Article
ADS
CAS
PubMed

Google Scholar
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).

Article
CAS
PubMed

Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

Article
CAS
PubMed
PubMed Central

Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

Article
PubMed
PubMed Central

Google Scholar
Weber, J. A., Aldana, R., Gallagher, B. D. & Edwards, J. S. Sentieon DNA pipeline for variant detection-Software-only solution, over 20× faster than GATK 3.3 with identical results. PeerJ PrePrints 4, e1672v1672 (2016).

Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

Article
CAS
PubMed
PubMed Central

Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

Article
PubMed
PubMed Central

Google Scholar
Marcais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).

Article
PubMed
PubMed Central

Google Scholar
Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).

Article
PubMed
PubMed Central

Google Scholar
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Chiang, C. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 12, 966–968 (2015).

Article
CAS
PubMed
PubMed Central

Google Scholar
Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).

Article
PubMed

Google Scholar
Laurens, V. D. M. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014).

MathSciNet

Google Scholar
Yang, Z. et al. ggComp enables dissection of germplasm resources and construction of a multiscale germplasm network in wheat. Plant Physiol. 188, 1950–1965 (2022).

Article
CAS
PubMed
PubMed Central

Google Scholar
Gao, F., Ming, C., Hu, W. & Li, H. New software for the fast estimation of population recombination rates (FastEPRR) in the genomic era. G3 6, 1563–1571 (2016).

Article
CAS
PubMed
PubMed Central

Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

Article
CAS
PubMed
PubMed Central

Google Scholar
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

Article
CAS
PubMed
PubMed Central

Google Scholar
Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64 (2009).

Article
CAS
PubMed

Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).

Article
CAS
PubMed
PubMed Central

Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

Article
CAS
PubMed
PubMed Central

Google Scholar
Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016).

Article
PubMed
PubMed Central

Google Scholar
Chen, Y. et al. A collinearity-incorporating homology inference strategy for connecting emerging assemblies in the Triticeae tribe as a pilot practice in the plant pangenomic era. Mol. Plant 13, 1694–1708 (2020).

Article
CAS
PubMed

Google Scholar
Ma, S. et al. WheatOmics: a platform combining multiple omics data to accelerate functional genomics studies in wheat. Mol. Plant 14, 1965–1968 (2021).

Article
CAS
PubMed

Google Scholar
He, W. et al. NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes. Bioinformatics 39, btad121 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

Article
CAS
PubMed
PubMed Central

Google Scholar
Han, F., Lamb, J. C. & Birchler, J. A. High frequency of centromere inactivation resulting in stable dicentric chromosomes of maize. Proc. Natl Acad. Sci. USA 103, 3238–3243 (2006).

Article
ADS
CAS
PubMed
PubMed Central

Google Scholar
Fu, S., Chen, L., Wang, Y., Li, M. & Tang, Z. Oligonucleotide probes for ND-FISH analysis to identify rye and wheat chromosomes. Sci. Rep. 5, 10552 (2015).

Article
ADS
PubMed
PubMed Central

Google Scholar
Tang, Z., Yang, Z. & Fu, S. Oligonucleotides replacing the roles of repetitive sequences pAs1, pSc119.2, pTa-535, pTa71, CCS1, and pAWRC.1 for FISH analysis. J. Appl. Genet. 55, 313–318 (2014).

Article
CAS
PubMed

Google Scholar

Structural variation in the pangenome of wild and domesticated barley

[ad_1]

Plant growth and high-molecular-weight DNA isolation

Twenty-five seeds each from the selected accessions (Supplementary Tables 1 and 7) were sown on 16-cm-diameter pots with compost soil. Plants were grown under greenhouse conditions with sodium halogen artificial 21 °C in the day for 16 h and 18 °C at night for 8 h. Leaves (8 g) were collected from 7-day-old seedlings, ground with liquid nitrogen to a fine powder and stored at −80 °C.

High-molecular-weight (HMW) DNA was purified from the powder, essentially as described⁵⁶. In brief, nuclei were isolated, digested with proteinase K and lysed with SDS. Here, a standard watercolour brush with synthetic hair (size 8) was used to re-suspend the nuclei for digestion and lysis. HMW DNA was purified using phenol–chloroform extraction and precipitation with ethanol as described⁵⁶. Subsequently, the HMW DNA was dissolved in 50 ml of TE (pH 8.0) and precipitated by the addition of 5 ml of 3 M sodium acetate (pH 5.2) and 100 ml of ice-cold ethanol. The suspension was mixed by slow circular movements resulting in the formation of a white precipitate (HMW DNA), which was collected using a wide-bore 5 ml pipette tip and transferred for 30 s into a tube containing 5 ml of 75% ethanol. The washing was repeated twice. The HMW DNA was transferred into a 2 ml tube using a wide-bore tip, collected with a polystyrene spatula, air-dried in a fresh 2 ml tube and dissolved in 500 µl of 10 mM Tris-Cl (pH 8.0). For quantification, the Qubit dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific) was used. The DNA size-profile was recorded using the Femto Pulse system and the Genomic DNA 165 kb kit (Agilent). In typical experiments the peak of the size-profile of the HMW DNA for library preparation was around 165 kb.

DNA library preparation and PacBio HiFi sequencing

For fragmentation of the HMW DNA into 20 kb fragments, a Megaruptor 3 device (speed: 30) was used (Diagenode). A minimum of two HiFi SMRTbell libraries were prepared for each barley genotype following essentially the manufacturer’s instructions and the SMRTbell Express Template Prep Kit (Pacific Biosciences). The final HiFi libraries were size-selected (narrow-size range: 18–21 kb) using the SageELF system with a 0.75% Agarose Gel Cassette (Sage Sciences) according to standard manufacturer protocols.

HiFi circular consensus sequencing (CCS) reads were generated by operating the PacBio Sequel IIe instrument (Pacific Biosciences) following the manufacturer’s instructions. Per genotype, about four 8M SMRT cells (average yield: 24 gigabases HiFi CCS per 8M SMART cell) were sequenced to obtain an approximate haploid genome coverage of about 20-fold. In typical experiments the concentration of the HiFi library on plate was 80–95 pM. We used 30 h movie time, 2 h pre-extension and sequencing chemistry v.2.0. The resulting raw data were processed using the CCS4 algorithm (https://github.com/PacificBiosciences/ccs).

Hi-C library preparation and Illumina sequencing

In situ Hi-C libraries were prepared from 1-week-old barley seedlings on the basis of the previously published protocol¹³. Dovetail Omni-C data were generated for Bowman, Aizu6, Golden Melon and 10TJ18 as per the manufacturer’s instructions (https://dovetailgenomics.com/products/omni-c-product-page/). Sequencing and Hi-C raw data processing was performed as described before^57,58.

Genome sequence assembly and validation

PacBio HiFi reads were assembled using hifiasm (v.0.11-r302)⁵⁹. Pseudomolecule construction was done with the TRITEX pipeline⁶⁰. Chimeric contigs and orientation errors were identified through manual inspection of Hi-C contact matrices. Genome completeness and consensus accuracy were evaluated using Merqury (v.1.3)⁶¹. Levels of duplication and heterozygosity were assessed with Merqury and FindGSE (v.1.94)⁶². Further, we estimated heterozygosity in the HiFi reads with a k-mer approach. We selected 35,202 bi-allelic SNPs from a genebank genomic study³. For each SNP we extracted the flanking sequences (±15 bp) from the SNP positions and put either SNP in the middle to obtain 31-mers for the reference and alternative alleles. The FASTA sequences of the k-mers are available from https://bitbucket.org/ipkdg/het_estimation. We counted the occurrence of these k-mers in the HiFi FASTQ files using BBDuk (https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/) with the parameter ‘rpkm’. Cenotype calling and the heterozygosity estimation were done in R. The full workflow is available from https://bitbucket.org/ipkdg/het_estimation.

Single-copy pangenome construction

The single-copy regions in each chromosome-level assembly were identified by filtering 31-mers occurring more than once in the genomic regions by BBDuk (BBMap_37.93, https://jgi.doe.gov/data-and-tools/software-tools/bbtools). BBMap was used to count k-mer occurrences in each genome with the parameter –mincount 2. Then, non-unique genomic regions (that is, those composed of k-mers occurring at least twice) were masked by BBDuk on the basis of k-mer counts. Single-copy regions extracted in BED format and their sequences (with the command ‘bedtools complement’) were retrieved using BEDTools (v.2.29.2)⁶³. The single-copy sequences were clustered using MMseqs2 (Many-against-Many sequence searching)⁶⁴ with the parameters ‘–cluster-mode’ and setting over 95% sequence identity. A representative from each cluster (the largest in a cluster) was selected to estimate the pangenome size.

Illumina resequencing

A total of 1,000 PGRs and 315 elite barley cultivars (Supplementary Table 6) were used for whole-genome resequencing. Illumina Nextera libraries were prepared and sequenced on an Illumina NovaSeq 6000 at IPK Gatersleben (Supplementary Table 6).

SNP and SV calling

Reciprocal genome alignment, in which each of the pangenome assemblies was aligned to the MorexV3 assembly with the latter acting either as alignment query or reference, was done with Minimap2 (v.2.20)⁶⁵. From the resultant two alignment tables, indels were called by Assemblytics (v.1.2.1)⁶⁶ and only deletions were selected in both alignments to convert into presence/absence variants relative to the Morex reference genome. Further, balanced rearrangements (inversions, translocations) were scanned for with SyRI⁶⁷. To call SNPs, raw sequencing reads were trimmed using cutadapt (v.3.3)⁶⁸ and aligned to the MorexV3 reference genome using Minimap2 (v.2.20)⁶⁵. The resulting alignments were sorted with Novosort (v.3.09.01) (http://www.novocraft.com). BCFtools (v.1.9)⁶⁹ was used to call SNPs and short indels. A genome-wide association study was performed in GEMMA (v.0.98.1)⁷⁰ using default parameters with a mixed linear model and an estimated kinship matrix. Read depth was calculated at each complex locus in each accession. The raw HiFi reads were aligned to the respective genome using minimap2 (ref. ⁷¹) and the median depth per locus was calculated using mosdepth (v.0.2.6)⁷².

Linkage disequilibrium in the Barke x HID055 population

Linkage disequilibrium between each pair of SNPs (both intrachromosomal and interchromosomal) was calculated as the squared Pearson product-moment correlation between the quantitative identity-by-descent (IBD) matrix scores presented in Additional File 1 of ref. ⁷³ (https://datadryad.org/stash/dataset/doi:10.5061/dryad.36rm1). The linkage disequilibrium plot was created with SAS PROC TEMPLATE and SGRENDER (SAS Institute) on the genetic map from ref. ¹⁸.

Preparation and Illumina sequencing of narrow-size whole-genome sequencing libraries for core50

First, 10 µg of DNA in 130 µl was sheared in tubes (Covaris microTUBE AFA Fiber Pre-Slit Snap Cap) to an average size of approximately 250 bp using a Covaris S220 focused-ultrasonicator (peak incidence power: 175 W, duty factor: 10%; cycles per burst: 200; time: 180 s) according to standard manufacturer protocols (Covaris). The sheared DNA was size-selected using a BluePippin device and a 1.5% agarose cassette with internal R2 marker (Sage Sciences). A tight size setting at 260 bp was used for the purification of fragments in the narrow range of 200–300 bp (typical yield: 1–3 µg). The size-selected DNA was used for the preparation of PCR-free whole-genome sequencing (WGS) libraries using the Roche KAPA Hyper Prep kit according to the manufacturer’s protocols (Roche Diagnostics). A total of 10–12 libraries were provided with unique barcodes, pooled at equimolar concentrations and quantified by quantitative PCR using the KAPA Library Quantification Kit for Illumina Platforms according to standard protocols (Roche Diagnostics). The pools were sequenced (2 × 151 bp, paired-end) using four S4 XP flowcells and the Illumina NovaSeq 6000 system (Illumina) at IPK Gatersleben.

Contig assembly of core50 sequencing data

Raw reads were demultiplexed on the basis of index sequences and duplicate reads were removed from the sequencing data using Fastuniq⁷⁴. The read1 and read2 sequences were merged on the basis of the overlap using bbmerge.sh from bbmap (v.37.28)⁷⁵. The merged reads were error-corrected using BFC (v.181)⁷⁶. The error-corrected merged reads were used as an input for Minia3 (v.3.2.0)⁷⁷ to assemble reads into unitigs with the following parameters, -no-bulge-removal -no-tip-removal -no-ec-removal -out-compress 9 -debloom original. The Minia3 source was assembled to enable k-mer size up to 512 as described in the Minia3 manual. Iterative Minia3 runs with increasing k-mer sizes (100, 150, 200, 250 and 300) were used for assembly generation as provided in the GATB Minia pipeline (https://github.com/GATB/gatb-minia-pipeline). In the first iteration, k-mer size of 50 was used to assemble input reads into unitigs. In the next runs, the input reads as well as the assembly of the previous iteration were used as input for the Minia3 assembler. BUSCO analysis was conducted on the contig assemblies using BUSCO (v.3.0.2) with embryophyta_odb9 dataset¹⁴. In addition, high-confidence gene models from the Morex V3 reference⁹ were aligned to the contig assemblies to assess completeness, with the parameters of greater than or equal to 90% query coverage and greater than or equal to 97% identity.

Pangenome accessions in diversity space

Pseudo-FASTQ paired-end reads (tenfold coverage) were generated from the 76 pangenome assemblies with fastq_generator (https://github.com/johanzi/fastq_generator) and aligned to the MorexV3 reference genome sequence assembly⁹ using Minimap2 (v.2.24-r1122, ref. ⁶⁵). SNPs were called together with short-read data (Supplementary Table 6) using BCFtools⁷⁸ v.1.9 with the command ‘mpileup -q 20 -Q20 –excl-flags 3332’. To plot the diversity space of cultivated barley, the resultant variant matrix was merged with that of 19,778 domesticated barleys from ref. ³ (genotyping-by-sequencing (GBS) data). SNPs with more than 20% missing or more than 20% heterozygous calls were discarded. Principal component analysis was done with smartpca⁷⁹ v.7.2.1. To represent the diversity of wild barleys, we used published GBS and WGS data of 412 accessions of that taxon^8,54. Variant calling for GBS data was done with BCFtools⁷⁸ (v.1.9) using the command ‘mpileup -q 20 -Q20’. The resultant variant matrix was filtered as follows: (1) only bi-allelic SNP sites were kept; (2) homozygous genotype calls were retained if their read depth was greater than or equal to 2 and less than or equal to 50 and set to missing otherwise; (3) heterozygous genotype calls were retained if the read depth of both alleles was greater than or equal to 2 and set to missing otherwise. SNPs with more than 20% missing, more than 20% heterozygous calls or a minor allele frequency below 5% were discarded. Principal component analysis was done with smartpca⁷⁹ v.7.2.1. A matrix of pairwise genetic distances on the basis of identity-by-state (IBS) was computed with Plink2 (v.2.00a3.3LM, ref. ⁸⁰) and used to construct a neighbour-joining tree with Fneighbor (http://emboss.toulouse.inra.fr/cgi-bin/emboss/fneighbor) in the EMBOSS package⁸¹. The tree was visualized with Interactive Tree Of Life (iTOL)⁸².

Haplotype representation

Pangenome assemblies were mapped to MorexV3 as described above (‘Pangenome accessions in diversity space’). Read depth was calculated with SAMtools⁷⁸ v.1.16.1. Genotype calls were set to missing if they were supported by fewer than two reads. IBS was calculated with PIink2 (v.2.000a3.3LM, ref. ⁸⁰) in 1 Mb windows (shift: 0.5 Mb) using the using command ‘–sample-diff counts-only counts-cols=ibs0, ibs1’. Windows that in one of both accessions in the comparison had twofold coverage over less than 200 kb were set to missing. The number of differences (d) in a window was calculated as ibs0 + ibs1/2, where ibs0 is the number of homozygous differences and ibs1 that of heterozygous ones. This distance was normalized for coverage by the formula d/i × 1 Mb, where i is the size in bp of the region covered in both accessions in the comparison that had at least twofold coverage. In each window, we determined for each among the PGRs and cultivars panel the closest pangenome accession according to the coverage-normalized IBS distance. Only accessions with fewer than 10% missing windows due to low coverage were considered, leaving 899 PGRs and 264 cultivars.

The distance to the closest pangenome accession was plotted with the R package ggplot2 to determine the threshold for similarity (Extended Data Fig. 2d).

Transcriptome sequencing for gene annotation

Data for transcript evidence-based genome annotation were provided by the International Barley Pan-Transcriptome Consortium, and a detailed description of sample preparation and sequencing is provided elsewhere⁸³. In brief, the 20 genotypes sequenced for the first version of the barley pangenome⁸ were used for transcriptome sequencing. Five separate tissues were sampled for each genotype. These were: embryo (including mesocotyl and seminal roots), seedling shoot, seedling root, inflorescence and caryopsis. Three biological replicates were sampled from each tissue type, amounting to 330 samples. Four samples failed quality control and were excluded.

Preparation of the strand-specific dUTP RNA-seq libraries and Illumina paired-end 150 bp sequencing were carried out by Novogene. In addition, PacBio Iso-Seq sequencing was carried out using a PacBio Sequel IIe sequencer at IPK Gatersleben. For this, a single sample per genotype was obtained by pooling equal amounts of RNA from a single replicate from all five tissues. Each sample was sequenced on an individual 8M SMRT cell.

De novo gene annotation

Structural gene annotation was done by combining de novo gene calling and homology-based approaches with RNA-seq, Iso-Seq and protein datasets (Extended Data Fig. 3a). Using evidence derived from expression data, RNA-seq data were first mapped using STAR⁸⁴ (v.2.7.8a) and subsequently assembled into transcripts by StringTie⁸⁵ (v.2.1.5, parameters -m 150-t -f 0.3). Triticeae protein sequences from available public datasets (UniProt⁸⁶, https://www.uniprot.org, 10 May 2016) were aligned against the genome sequence using GenomeThreader⁸⁷ (v.1.7.1; arguments -startcodon -finalstopcodon -species rice -gcmincoverage 70 -prseedlength 7 -prhdist 4). Iso-Seq datasets were aligned to the genome assembly using GMAP⁸⁸ (v.2018-07-04). All assembled transcripts from RNA-seq, Iso-Seq and aligned protein sequences were combined using Cuffcompare⁸⁹ (v.2.2.1) and subsequently merged with StringTie (v.2.1.5, parameters –merge -m150) into a pool of candidate transcripts. TransDecoder (v.5.5.0; http://transdecoder.github.io) was used to identify potential ORFs and to predict protein sequences within the candidate transcript set.

Ab initio annotation was initially done using Augustus⁹⁰ (v.3.3.3). GeneMark⁹¹ (v.4.35) was additionally used to further improve structural gene annotation. To avoid potential over-prediction, we generated guiding hints using the above-described RNA-seq, protein and Iso-Seq datasets as described before⁹². A specific Augustus model for barley was built by generating a set of gene models with full support from RNA-seq and Iso-Seq. Augustus was trained and optimized following a published protocol⁹². All structural gene annotations were joined using EVidenceModeller⁹³ (v.1.1.1), and weights were adjusted according to the input source: ab initio (Augustus: 5, GeneMark: 2), homology-based (10). Additionally, two rounds of PASA⁹⁴ (v.2.4.1) were run to identify untranslated regions and isoforms using the above-described Iso-Seq datasets.

We used BLASTP⁹⁵ (ncbi-blast-2.3.0+, parameters -max_target_seqs 1 -evalue 1e–05) to compare potential protein sequences with a trusted set of reference proteins (Uniprot Magnoliophyta, reviewed/Swissprot, downloaded on 3 August 2016; https://www.uniprot.org). This differentiated candidates into complete and valid genes, non-coding transcripts, pseudogenes and TEs. In addition, we used PTREP (release 19; http://botserv2.uzh.ch/kelldata/trep-db/index.html), a database of hypothetical proteins containing deduced amino acid sequences in which internal frameshifts have been removed in many cases. This step is particularly useful for the identification of divergent TEs with no significant similarity at the DNA level. Best hits were selected for each predicted protein from each of the three databases. Only hits with an e-value below 10 × 10⁻¹⁰ were considered. Furthermore, functional annotation of all predicted protein sequences was done using the AHRD pipeline (https://github.com/groupschoof/AHRD).

Proteins were further classified into two confidence classes: high and low. Hits with subject coverage (for protein references) or query coverage (transposon database) above 80% were considered significant and protein sequences were classified as high-confidence using the following criteria: protein sequence was complete and had a subject and query coverage above the threshold in the UniMag database or no BLAST hit in UniMag but in UniPoa and not PTREP; a low-confidence protein sequence was incomplete and had a hit in the UniMag or UniPoa database but not in PTREP. Alternatively, it had no hit in UniMag, UniPoa or PTREP, but the protein sequence was complete. In a second refinement step, low-confidence proteins with an AHRD score of 3* were promoted to high-confidence.

Gene projections

Gene contents of the remaining 56 barley genotypes were modelled by the projection of high-confidence genes on the basis of evidence-based gene annotations of the 20 barley genotypes described above. The approach was similar to and built upon a previously described method⁸. To reduce computational load, 760,078 high-confidence genes of the 20 barley annotations were clustered by cd-hit⁹⁶ requiring 100% protein sequence similarity and a maximal size difference of four amino acids. The resulting 223,182 source genes were subsequently used for all downstream projections as the non-redundant transcript set representative for the evidence-based annotations. For each source, its maximal attainable score was determined by global protein self-alignment using the Needleman–Wunsch algorithm as implemented in Biopython⁹⁷ v.1.8 and the blosum62 substitution matrix⁹⁸ with a gap open and extension penalty of 0.5 and 10.0, respectively.

Next, we surveyed each barley genome sequence using minimap2 (ref. ⁶⁵) with options ‘-ax splice:hq’ and ‘-uf’ for genomic matches of source transcripts. Each match was scored by its pairwise protein alignment with the source sequence that triggered the match. Only complete matches with start and stop codons and a score greater than or equal to 0.85 of the source self-score (see above) were retained. The source models were classified into four bins by decreasing confidence qualities: with or without pfam domains, plastid- and transposon-related genes. Projections were performed stepwise for the four qualities, starting from the highest to the lowest. In each quality group, matches were then added into the projected annotation if they did not overlap with any previously inserted model by their coding region. Insertion order progressed from the top to the lowest scoring match. In addition, we tracked the number of insertions for each source by its identifier. For the two top quality categories, we performed two rounds of projections, first inserting each source maximally only once followed by rounds allowing one source inserted multiple times into the projected annotation. To consolidate the 20 evidence-based, initial annotations for any genes potentially missed, we used an identical approach but inserted any non-overlapping matches starting from the previous RNA-seq-based annotation. A detailed description of the projection workflow, parameters and code is provided at the GitHub repository (https://github.com/GeorgHaberer/gene_projection/tree/main/panhordeum). An overview of the projection scheme can be found in the parent directory of the repository. Because complex loci contain numerous pseudogenes, the loci were searched by BLASTN⁹⁹ for sequences homologous to annotated genes but not present in the set of annotated genes. Pseudogenes were accepted if they covered at least 80% of a gene homologue.

Definition of core, cloud and shell genes

Phylogenetic HOGs on the basis of the primary protein sequences from 76 annotated barley genotypes were calculated using Orthofinder¹⁰⁰ v.2.5.5 (standard parameters). The scripts for calculation of core/shell and cloud genes have been deposited in the repository https://github.com/PGSB-HMGU/BPGv2. Core HOGs contain at least one gene model from all 76 barley genotypes included in the comparison. Shell HOGs contain gene models from at least two barley genotypes and at most 75 barley genotypes. Genes not included in any HOG (‘singletons’), or clustered with genes only from the same genotype, were defined as cloud genes. GENESPACE¹⁰¹ was used to determine syntenic relationships between the chromosomes of all 76 genotypes.

Annotation of TEs

The 20 barley accessions with expression data were softmasked for transposons before the de novo gene detection using the REdat_9.7_Triticeae section of the PGSB transposon library¹⁰². Vmatch (http://www.vmatch.de) was used as matching tool with the following parameters: identity > =70%, minimal hit length 75 bp, seedlength 12 bp (vmmatch -d -p -l 75 -identity 70 -seedlength 12 -exdrop 5 -qmaskmatch tolower). The percentage masked was around 84% and almost identical for all 20 accessions.

Full-length long terminal repeat retrotransposon candidate elements were detected de novo for each of the 76 barley accessions by their structural hallmarks with LTRharvest¹⁰³ followed by LTRdigest¹⁰⁴. Both programs are contained in genometools⁸⁷ (http://github.com/genometools/genometools, v.1.5.10). LTRharvest identifies within the specified parameters long terminal repeats and target site duplications whereas LTRdigest was used to determine polypurine tracts and primer binding sites. The transfer RNA library needed as input for the primer binding sites was beforehand created by running tRNAscan-SE-1.3 (ref. ¹⁰⁵) on each assembly. The parameter settings for LTRharvest were: ‘-overlaps best -seed 30 -minlenltr 100 -maxlenltr 2000 -mindistltr 3000 -maxdistltr 25000 -similar 85 -mintsd 4 -maxtsd 20 -motif tgca -motifmis 1 -vic 60 -xdrop 5 -mat 2 -mis -2 -ins -3 -del -3 -longoutput’; for LTRdigest: ‘-pptlen 8 30 -uboxlen 3 30 -pptradius 30 -pbsalilen 10 30 -pbsoffset 0 10 -pbstrnaoffset 0 30 -pbsmaxedist 1 -pbsradius 30’. The insertion age of each long terminal repeat retrotransposon instance was calculated from the divergence of its 5′ and 3′ long terminal repeat sequences using a random mutation rate of 1.3 × 10⁻⁸ (ref. ¹⁰⁶).

Whole-genome pangenome graphs

Genome graphs were constructed using Minigraph¹⁹ v.0.20-r559. Other graph construction tools (PGGB¹⁰⁷, Minigraph-Cactus¹⁰⁸) turned out to be computationally prohibitive for a genome of this size and complexity, combined with the large number of accessions used in this investigation. Minigraph does not support small variants (less than 50 bp), thus graph complexity is lower than with other tools. However, even with Minigraph, graph construction at the whole-genome level was computationally prohibitive and thus graphs had to be computed separately for each chromosome, precluding detection of interchromosomal translocations.

Graph construction was initiated using the Morex V3 assembly⁹ as a reference. The remaining assemblies were added into the graph sequentially, in order of descending dissimilarity to Morex. SVs were called after each iteration using gfatools bubble (v.0.5-r250-dirty, https://github.com/lh3/gfatools). Following graph construction, the input sequences of all accessions were mapped back to the graph using Minigraph with the ‘–call’ option enabled, which generates a path through the graph for each accession. The resulting BED format files were merged using Minigraph’s mgutils.js utility script to convert them to P lines and then combined with the primary output of Minigraph in the proprietary RGFA format (https://github.com/lh3/gfatools/blob/master/doc/rGFA.md). Graphs were then converted from RGFA format to GFA format (https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md) using the ‘convert’ command from the vg toolkit¹⁰⁹ v.1.46.0 ‘Altamura’. This step ensures that graphs are compatible with the wider universe of graph processing tools, most of which require GFA format as input. Chromosome-level graphs were then joined into a whole-genome graph using vg combine. The combined graph was indexed using vg index and vg gbwt, two components of the vg toolkit¹⁰⁹.

General statistics for the whole-genome graph were computed with vg stats. Graph growth was computed using the heaps command from the ODGI toolkit¹¹⁰ v.0.8.2-0-g8715c55, followed by plotting with its companion script heaps_fit.R. The latter also computes values for gamma, the slope coefficient of Heap’s law which allows the classification of pangenome graphs into open or closed pangenomes, that is, a prediction of whether the addition of further accessions would increase the size of the pangenome¹¹¹.

SV statistics were computed on the basis of the final BED file produced after the addition of the last line to the graph. A custom shell script was used to classify variants according to the Minigraph custom output format. This allows the extraction of simple, that is, non-nested, indels (relative to the MorexV3 graph backbone), as well simple inversions. The remaining SVs fall into the ‘complex’ category in which there can be multiple levels of nesting of different variant types and this precluded further, more fine-grained classification. To compute overlap with the SVs from Assemblytics, a custom script was used to extract the variant coordinates from both sets, and bedtools intersect⁶³ was then used to compute their intersection on the basis of a spatial overlap of 70%.

To elucidate the effect of a graph-based reference on short-read mapping, we obtained WGS Illumina reads from five barley samples (Extended Data Fig. 4b) in the European Nucleotide Archive and mapped these onto the whole-genome graph using vg giraffe¹¹². For comparison with the standard approach of mapping reads to a linear single genome reference, we mapped the same reads to the MorexV3 reference genome sequence assembly⁹ with bwa mem¹¹³ v.0.7.17-r1188. Mapping statistics were computed with vg¹⁰⁹ stats and samtools⁷⁸ stats (v.1.9), respectively.

To elucidate tool bias as a confounding factor in the comparison between the mappings, we first produced a linearized version of the pangenome graph using gfatools gfa2fa (https://github.com/lh3/gfatools) and then mapped the WGS reads from all five accessions to this new reference sequence, using BWA mem as before for the cv. Morex V3 reference sequence. This allows a more appropriate comparison between the single cultivar reference sequence and the pangenome sequence without being affected by algorithmic differences between the tools used (BWA/giraffe). Mappings were filtered to retain only reads with zero mismatches, using sambamba¹¹⁴. For the graph mappings, the ‘Total perfect’ statistic from the vg stats output of the GAM files was used.

To investigate the srh1 paths in the pangenome graph, we first extracted all nodes from the graph into a FASTA file and then used the enhancer region identified in cv. Barke as associated with the long-haired srh1 phenotype (chr5H:496,182,748-496,187,020) as query in a BLAST search against the nodes. This recovered five nodes with an identity percentage value of greater than 98%. We then used vg find from the vg toolkit v.1.56.0 (ref. ¹⁰⁹) to extract a subgraph from the full graph (with a graph context of five steps either side) using the node identifiers. The subgraph was then plotted using odgi viz from the ODGI toolkit v.0.8.3-26-gbc7742ed (ref. ¹¹⁰).

To genotype samples from the core800 collection against the srh1 region of the graph, we first identified a small set of four samples each with either the short- or long-haired phenotype, picked at random from a group of core800 samples that all shared the same WGS read depth (5×). These samples were HOR_1102, HOR_17654, HOR_4065, HOR_1264, HOR_14704, HOR_7629, HOR_17678 and HOR_11406. We then mapped their Illumina WGS reads to the full pangenome graph using vg giraffe¹¹² and extracted a subgraph of the mappings with vg chunk¹⁰⁹. The subgraph was then genotyped using vg pack and vg call with cv. Barke as the reference accession, following the approach proposed in ref. ¹¹⁵. Variants in the resulting VCF files were identified using a simple grep command with the identifiers of the five nodes recovered with the Barke sequence as described above. Scripts used here are available at https://github.com/mb47/minigraph-barley/tree/main/scripts/srh1_analysis.

Analysis of the Mla locus

The coordinates and sequences of the 32 genes present at the Mla locus were extracted from the MorexV3 genome sequence assembly⁹. To find the corresponding position and copy number in each of the 76 genomes, we used BLAST⁹⁵ (-perc_identity: 90, -word_size: 11, all other parameters set as default). The expected BLAST result for a perfectly conserved allele is a long fragment (exon_1) of 2,015 bp followed by a gap of approximately 1,000 bp due to the intron and another fragment (exon_2) of 820 bp. To detect the number of copies, first multiple BLAST results for a single gene were merged if two different BLAST segments were within 1.1 kb. Then only if the total length of the input was found, this was counted as a copy. To analyse the structural variation across all 76 accessions, the non-filtered BLAST results were plotted in a region of −20,000 and +500,000 base pairs around the start of the BPM gene HORVU.MOREX.r3.1HG0004540 that was used as an anchor (present in all 76 lines; Supplementary Figs. 5 and 6). To detect the different Mla alleles, three different thresholds of -Perc_identity for the BLAST were used: 100, 99 and 98.

Scan for structurally complex loci

We used a pipeline developed in ref. ²⁷ that performs sequence-agnostic identification of long-duplication-prone regions (henceforth, complex regions) in a reference genome, followed by identification of gene families with a statistical tendency to occur within complex regions. The pipeline assumes that a candidate long, duplication-prone region will contain an elevated concentration of locally repeated sequences in the kb-scale length range. We first aligned the MorexV3 genome sequence assembly⁹ against itself using lastz¹¹⁶ (v.1.04.03; arguments: ‘–notransition –step=500 –gapped’). For practicality purposes, this was done in 2 Mb blocks with a 200 kb overlap, and any overlapping complex regions identified in multiple windows were merged. For each window, we ignored the trivial end-to-end alignment, and, of the remaining alignments, retained only those longer than 5 kb and falling fully within 200 kb of one and another. An alignment ‘density’ was calculated over the chromosome by calculating, at ‘interrogation points’ spaced equally at 1 kb intervals along the length of the chromosome, an alignment density score which is simply the sum of all the lengths of any of the filtered alignments spanning that interrogation point. A Gaussian kernel density (bandwidth 10 kb) was calculated over these interrogation points, weighted by their scores. To allow comparability between windows, the interrogation point densities were normalized by the sum of scores in the window. Runs of interrogation points at which the density surpassed a minimum density threshold were flagged as complex regions. A few minor adjustments to these regions (merging of overlapping regions, and trimming the end coordinates to ensure the stretches always begin and end in repeated sequence) yielded the final tabulated list of complex regions and their positions in the MorexV3 genome assembly (Supplementary Table 8). The method was implemented in R, making use of the package data.table. Genes in each long, duplication-prone region were clustered with UCLUST¹¹⁷ (v.11, default parameters) using a protein clustering distance cutoff of 0.5 and for each cluster the most frequent functional description as per the MorexV3 gene annotation⁹ was assigned as the functional description of the cluster. Self-alignment for characterization of evolutionary variability (Supplementary Fig. 7) was performed using lastz¹¹⁶ (v.1.04.03; settings ‘–self –notransition –gapped –nochain –gfextend –step=50’).

Molecular dating of divergence times of duplicated genes in complex loci

For molecular dating of gene duplications, we used segments of up to 4 kb, starting 1 kb upstream of duplicated genes in complex loci. With this, we presumed only to use intergenic sequences which are free from selection pressure and thus evolve at a neutral rate of 1.3 × 10⁻⁸ substitutions per site per year¹⁰⁶. The upstream sequences of all duplicated genes of the respective complex locus were then aligned pairwise with the program Water from the EMBOSS package⁸¹ (obtained from Ubuntu repositories, https://ubuntu.com). This was done for all gene copies of all barley accession for which multiple gene copies were found. Molecular dating of the pairwise alignments was done as previously described¹¹⁸ using the substitution rate of 1.3 × 10⁻⁸ substitutions per site per year¹⁰⁶.

Amy1_1 analysis in pangenome assemblies

The amy1_1 gene copy HORVU.MOREX.PROJ.6HG00545380 was used for BLAST against all 76 genome assemblies. Full-length sequences with identity over 95% were extracted and used for further analyses. Unique sequences were identified by clustering at 100% identity using CD-Hit⁹⁶ and were aligned using MAFFT¹¹⁹ v.7.490. Sequence variants among amy1_1 gene copies at genomic DNA, coding sequence (CDS) and respective protein level were collected and amy1_1 haplotypes (that is, the combinations of copies) in each genotype assembly were summarized using R¹²⁰ v.4.2.2. A Barke-specific SNP locus (GGCGCCAGGCATGATCGGGTGGTGGCCAGCCAAGGCGGTGACCTTCGTGGACAACCACGACACCGGCTCCACGCAGCACATGTGGCCCTTCCCTTCTGACA[A/G]GGTCATGCAGGGATATGCGTACATACTCACGCACCCAGGGACGCCATGCATCGTGAGTTCGTCGTACCAATACATCACATCTCAATTTTCTTTTCTTGTTTCGTTCATAA) for amy1_1 haplotype cluster ProtHap3 (Supplementary Table 21) was identified and used for KASP marker development (LGC Biosearch Technologies).

Comparative analysis of the amy1_1 locus structure

On the basis of the genome annotation of cv. Morex, 15 gene sequences on either side of amy1_1 gene copy HORVU.MOREX.PROJ.6HG00545440 were extracted. The 31 genes were compared against the 76 genome assemblies using NCBI-BLAST⁹⁵ (BLASTN, word_size of 11 and percent identity of 90, other parameters as default). Alignment plots were generated from the BLAST result coordinates by scaling on the basis of the mid-point between HORVU.MOREX.r3.6HG0617300/HORVU.MOREX.PROJ.6HG00545250 and HORVU.MOREX.r3.6HG0617710/HORVU.MOREX.PROJ.6HG00545670. All BLAST results in the region (±1 Mb) around this mid-point were plotted using R¹²⁰.

Amy1_1 PacBio amplicon sequencing

Genomic DNA from 1-week-old Morex seedling leaves was extracted with DNeasy Plant Mini Kit (QIAGEN). On the basis of the MorexV3 genome sequence assembly⁹, amy1_1 full-length copy-specific primers were designed using Primer3 (ref. ¹²¹) (https://primer3.ut.ee/): 6F: GTAGCAGTGCAGCGTGAAGTC; 80F: AGACATCGTTAACCACACATGC; 82F: GTTTCTCGTCCCTTTGCCTTAA; 82F: GTTTCTCGTCCCTTTGCCTTAA; 33R: GATCTGGATCGAAGGAGGGC; 79R: TCATACATGGGACCAGATCGAG; 80R: ACGTCAAGTTAGTAGGTAGCCC. All forward primers were tagged with bridge sequence (preceding T to primer name) [AmC6]gcagtcgaacatgtagctgactcaggtcac, whereas reverse primers were tagged with [AmC6]tggatcacttgtgcaagcatcacatcgtag to allow annealing to barcoding primers. These bridge sequence-tagged gene-specific primers were used in pairs with each other, targeting 1–2 copies of 3–6 kb amy1_1 genes, including upstream and downstream 500–1000 bp regions: T6F + T33R, T6F + T79R, T80F + T80R and T82F + T80R. A two-step PCR protocol was conducted. The first step PCR reaction was prepared in a 25 μl volume using 2 μl of DMSO, 0.3 μl of Q5 polymerase (New England Biolabs), 1 μl of amy1_1-specific primer pair (10 μM each), 2 μl of gDNA, 0.5 μl of dNTPs (10 mM), 5 μl of Q5 buffer and H₂O. The PCR programme was as follows: initial denaturation at 98 °C/1 min followed by 25–28 cycles of 98 °C/30 s, 58 °C/30 s and 72 °C/3 min for extension, with a final extension step of 72 °C/2 min. The second PCR step (barcoding PCR) was prepared in the same way using 1 μl of the first PCR product as DNA template, barcoding primers (Pacific Biosciences) and the PCR programme reduced to 20 cycles. After quality check on 1% agarose gel, all barcoded PCR products were mixed and purified with AMPure PB (Pacific Biosciences). The SMRT bell library preparation and sequencing were carried out at BGI Tech Solutions. Sequencing data were analysed using SMRT Link v.10.2. To minimize PCR chimeric noise, CCSs were first constructed for each molecule. Second, long amplicon analysis was carried out on the basis of subreads from 50 bp windows spanning peak positions of all CCS length. Final consensus sequences for each amy1_1 were determined with the aid of size estimation from agarose gel imaging.

Amy1_1 SNP haplotype analysis and k-mer-based copy number estimation

SNP haplotypes were analysed in 1,315 PGRs and elite cultivars in the extended amy1_1 cluster region (MorexV3 chr6H: 516,385,490–517,116,415 bp). SNPs with more than 20% missing data among the analysed lines and minor allele frequency less than 0.01 were removed from downstream analyses. The data were converted to 0, 1 and 2 format using VCFtools¹²² and samples were clustered using the pheatmap package (https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf) from R statistical environment⁵⁷. The sequential clustering approach was used to achieve the desired separation. At each step, two extreme clusters were selected and then samples from each cluster were clustered separately. The process was repeated until the desired separation was achieved on the basis of visual inspection.

K-mers (k = 21) were generated from the Morex amy1_1 gene family members’ conserved region using jellyfish¹²³ v.2.2.10. After removing k-mers with counts from regions other than amy1_1 in the Morex V3 genome assembly, k-mers were counted in the Illumina raw reads (Supplementary Table 6) using Seal (BBtools, https://jgi.doe.gov/data-and-tools/software-tools/bbtools/). All k-mer counts were normalized to counts per MorexV3 genome and amy1_1 copy number was estimated as the median count of all k-mers from each accession in R.

Estimation ability was validated by comparing copy number from pangenome assemblies and short-read sequencing data (Extended Data Fig. 8c). For 1,000 PGRs, countries (with at least 10 accessions) were colour-shaded on the basis of their proportions of accessions with amy1_1 copy number greater than 5 on a world map using the R package maptools (https://cran.r-project.org/web/packages/maptools/index.html).

To construct a network from SNP haplotypes, all 371 amy1_1 copies (except ORF 89, 90 and 93; Supplementary Table 14) were aligned using MAFFT¹¹⁹ v.7.490. Median-joining haplotype networks were generated using PopART¹²⁴ with an epsilon value of 0.

Local pangenome graph for amy1_1

The coordinates of amy1_1 copies in 76 genome assemblies were obtained by BLAST searches with the Morex allele of HORVU.MOREX.PROJ.6HG00545380. The genomic intervals surrounding amy1_1 from 10 kb upstream of the first copy to 10 kb downstream of the last copy were extracted from corresponding assemblies and used for further analyses. We applied PGGB (v.0.4.0, https://github.com/pangenome/pggb) for 76 amy1_1 sequences with parameters ‘-n 76 -t 20 -p 90 -s 1000 -N’. The graph was visualized using Bandage¹²⁵ (v.0.8.1). ODGI (v.0.7.3, command ‘paths’)¹¹⁰ was used to get a sparse distance matrix for paths with the parameter ‘-d’. The resultant distance matrix was plotted with the R package pheatmap (https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf). Six representative sequences of amy1_1 were aligned against Morex by BLAST+ (v.2.13.0)⁹⁹.

AMY1_1 protein structure and protein folding simulation

The published protein structure of α-amylase AMY1_1 from accession Menuet, in complex with the pseudo-tetrasaccharide acarbose (PDB: 1BG9; ref. ⁴²), was used to simulate the structural context of the amino acid variants identified in barley accessions Morex, Barke and RGT Planet. The amino acid sequences of the crystalized AMY1_1 protein from Menuet and the Morex reference copy amy1_1 HORVU.MOREX.PROJ.6HG00545380 used in this study are identical. The protein was visualized using PyMol 2.5.5 (Schrödinger). The Dynamut2 webserver¹²⁶ was used to predict changes in protein stability and dynamics by introducing amino acid variants identified in the Morex, Barke and RGT Planet genome assemblies.

Development of diverse amy1_1 haplotype barley NILs

NILs with different amy1_1 haplotypes were derived from crosses between RGT Planet as recipient and Barke or Morex amy1_1 cluster donor parents (ProtHap3, ProtHap4 and ProtHap0, respectively; Supplementary Table 21), followed by two subsequent backcrosses to RGT Planet and one selfing step (BC₂S₁) to retrieve homozygous plants at the amy1_1 locus. A total of four amy_1_1–Barke NILs (ProtHap3) and one amy1_1–Morex NIL (ProtHap0) were developed and tested against RGT Planet (ProtHap4) replicates. Plants were grown in a greenhouse at 18 °C under 16/8-h light/dark cycles. Foreground and background molecular markers were used in each generation to assist plant selection. Respective BC₂S₁ plants were genotyped with the Barley Illumina 15K array (SGS Institut Fresenius, TraitGenetics Section, Germany) and grown to maturity. Grains were collected and further propagated in field plots in consecutive years in various locations (Nørre Aaby, Denmark; Lincoln, New Zealand; Maule, France). Grains from field plots were collected and threshed using a Wintersteiger Elite plot combiner, and sorted by size (threshold, 2.5 mm) using a Pfeuffer SLN3 sample cleaner (Pfeuffer).

Micro-malting and α-amylase activity analysis

Non-dormant barley samples of RGT Planet and respective NILs with different amy1_1 haplotypes (50 g each, graded greater than 2.5 mm) were micro-malted in perforated stainless-steel boxes. The barley samples were steeped at 15 °C by submersion of the boxes in water. Steeping took place for 6 h on day one, 3 h on day two and 1 h on day three, followed by air rests, to reach 35%, 40% and 45% water content, respectively. The actual water uptake of individual samples was determined as the weight difference between initial water content, measured with a Foss 1241 NIT instrument, and the sample weight after surface water removal. During air rest, metal beakers were placed into a germination box at 15 °C. Following the last steep, the barley samples were germinated for 3 d at 15 °C. Finally, barley samples were kiln-dried in an MMK Curio kiln (Curio Group) using a two-step ramping profile. The first ramping step started at a set point of 27 °C with a linear ramping at 2 °C h⁻¹ to the breakpoint at 55 °C using 100% fresh air. The second linear ramping was at 4 °C h⁻¹, reaching a maximum at 85 °C. This temperature was kept constant for 90 min using 50% air recirculation. The kilned samples were then deculmed using a manual root removal system (Wissenschaftliche Station für Brauerei). α-Amylase activity was measured using the Ceralpha method (Ceralpha Method MR-CAAR4, Megazyme) modified for Gallery Plus Beermaster (Thermo Fisher Scientific).

Amy1_1 gene expression of RGT Planet and amy1_1–Barke NIL during micro-malting

Samples (50 g each, graded greater than 2.5 mm) were micro-malted as described in the previous section. During micro-malting, grains were sampled at 24 h, 48 h and 72 h. Grain samples were first freeze-dried at −80 °C and then milled at room temperature. Total RNA was isolated from 20–200 mg of flour using the Spectrum Plant Total RNA Kit (Sigma Aldrich) and cleaned using RNA Clean & Concentrator (ZYMO Research) following a published protocol¹²⁷. For RNA-seq analysis, libraries were prepared and single-end sequenced with a length of 75 bp as described in ref. ¹²⁷. Gene expression was quantified as transcripts per million (TPM) using kallisto¹²⁸ (v.0.48.0) with 100 bootstraps.

Rachilla hair ploidy measurements

Ploidy assessment was performed on rachillae collected from barley spikes at developmental stage¹²⁹ approximately Waddington 9.0. Once isolated, rachillae were fixed with 50% ethanol/10% acetic acid for 16 h after which they were stained with 1 µM DAPI in 50 mM phosphate buffer (pH 7.2) supplemented with 0.05% Triton X100. Probes were analysed with a Zeiss LSM780 confocal laser scanning microscope using a ×20 NA 0.8 objective, zoom ×4 and image size 512 × 512 pixels. DAPI was visualized with a 405 nm laser line in combination with a 405–475 nm bandpass filter. The pinhole was set to ensure the whole nucleus was measured in one scan. Size and fluorescence intensity of the nuclei were measured with ZEN black (ZEISS) software. For data normalization, small, round nuclei of the epidermal proper were used for 2C (diploid) calibration.

Scanning electron microscopy

Sample preparation and recording by scanning electron microscopy were essentially performed as described previously¹³⁰. In brief, samples were fixed overnight at 4 °C in 50 mM phosphate buffer (pH 7.2) containing 2% v/v glutaraldehyde and 2% v/v formaldehyde. After washing with distilled water and dehydration in an ascending ethanol series, samples were critical-point‐dried in a Bal‐Tec critical-point dryer (Leica Microsystems, https://www.leica-microsystems.com). Dried specimens were attached to carbon‐coated aluminium sample blocks and gold‐coated in an Edwards S150B sputter coater (Edwards High Vacuum, http://www.edwardsvacuum.com). Probes were examined in a Zeiss Gemini30 scanning electron microscope (Carl Zeiss, https://www.zeiss.de) at 5 kV acceleration voltage. Images were digitally recorded.

Linkage mapping of SHORT RACHILLA HAIR 1 (HvSRH1)

Initial linkage mapping was performed using GBS data of a large ‘Morex’ x ‘Barke’ F₈ recombinant inbred line (RIL) population⁴⁷ (European Nucleotide Archive project PRJEB14130). The GBS data of 163 RILs, phenotyped for rachilla hair in the F₁₁ generation, and the two parental genotypes were extracted from the variant matrix using VCFtools¹²² and filtered as described previously³ for a minimum depth of sequencing to accept heterozygous and homozygous calls of 4 and 6, respectively, a minimum mapping quality score of the SNPs of 30, a minimal fraction of homozygous calls of 30% and a maximum fraction of missing data of 25%. The linkage map was built with the R package ASMap¹³¹ using the MSTMap algorithm¹³² and the Kosambi mapping function, forcing the linkage group to split according to the physical chromosomes. The linkage mapping was done with R/qtl¹³³ using the binary model of the scanone function with the expectation maximization method¹³⁴. The significance threshold was calculated running 1,000 permutations and the interval was determined by a logarithm of the odds drop of 1. To confirm consistency between the F₈ RIL genotypes and F₁₁ RIL phenotypes, three PCR Allele Competitive Extension (PACE) markers were designed through the 3CR Bioscience free assay design service, using polymorphisms between the genome assemblies of the two parents (Supplementary Table 24), and PACE genotyping was performed as described earlier¹³⁵. To reduce the Srh1 interval, 22 recombinant F₈ RILs were sequenced by Illumina WGS, the sequencing reads were mapped on the MorexV3 reference genome sequence assembly⁹ and the SNP was called. The 100 bp region around the flanking SNPs of the Srh1 interval as well as the sequence of the candidate gene HORVU.MOREX.r3.5HG0492730 were compared with the pangenome assemblies using BLASTN⁹⁹ to identify the corresponding coordinates and extract the respective intervals for comparison. Gene sequences were aligned with Muscle5 (ref. ¹³⁶). Structural variation between intervals was assessed with LASTZ¹¹⁶ v.1.04.03. The motif search was carried out with the EMBOSS⁸¹ 6.5.7 tool fuzznuc.

Cas9-mediated mutagenesis

Guide RNA (gRNA) target motifs in the ‘Golden Promise’ HvSrh1 candidate gene HORVU.GOLDEN_PROMISE.PROJ.5HG00440000.1 were selected by using the online tool WU-CRISPR¹³⁷ to induce translational frameshift mutations by insertion/deletion of nucleotides leading to loss-of-function of the gene. One pair of target motifs (gRNA1a: CCTCGCTGCCCGCCGACGC; gRNA1b: GACAAGACGAAGGCCGCGG) was selected within the HvSrh1 candidate gene on the basis of their position within the first half of the coding sequence and the two-dimensional minimum free energy structures of the cognate single-gRNAs (NNNNNNNNNNNNNNNNNNNNGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU) as modelled by the RNAfold WebServer¹³⁸ and validated as suggested in ref. ¹³⁹. gRNA-containing transformation vectors were cloned using the modular CasCADE vector system (https://doi.org/10.15488/13200). gRNA-specific sequences were ordered as DNA oligonucleotides (Supplementary Table 25) with specific overhangs for BsaI-based cloning into the gRNA-module vectors carrying the gRNA scaffold, driven by the Triticum aestivum U6 promoter. Golden Gate assembly of gRNAs and the cas9 module, driven by the Zea mays Polyubiquitin 1 (ZmUbi1) promotor, was performed according to the CasCADE protocol to generate the intermediate vector pHP21. To generate the binary vector pHP22, the gRNA and cas9 expression units were cloned using SfiI into the generic vector¹⁴⁰ p6i-2x35S-TE9 which harbours an hpt gene under control of a double-enhanced CaMV35S promoter in its transfer-DNA for plant selection. Agrobacterium-mediated DNA transfer to immature embryos of the spring barley Golden Promise was performed as previously described¹⁴¹. In brief, immature embryos were excised from caryopses 12–14 d after pollination and co-cultivated with Agrobacterium strain AGL1 carrying pHP22 for 48 h. Then, the explants were cultivated for further callus formation under selective conditions using Timentin and hygromycin, which was followed by plant regeneration. The presence of T-DNA in regenerated plantlets was confirmed by hpt– and cas9-specific PCRs (primer sequences in Supplementary Table 25). Primary mutant plants (M₁ generation) were identified by PCR amplification of the target region (primer sequences in Supplementary Table 25) followed by Sanger sequencing at LGC Genomics. Double or multiple peaks in the sequence chromatogram starting around the Cas9 cleavage site upstream of the target’s protospacer-adjacent motif were considered as an indication for chimeric and/or heterozygous mutants. Mutant plants were grown in a glasshouse until the formation of mature grains. M₂ plants were grown in a climate chamber under speed breeding conditions (22 h light at 22 °C and 2 h dark at 19 °C, adapted from ref. ¹⁴²) and genotyped by Sanger sequencing of PCR amplicons as given above. M₂ grains were subjected to phenotyping.

FIND-IT library construction

We constructed a FIND-IT library in cv. ‘Etincel’ (6-row winter malting barley; SECOBRA Recherches) as described in ref. ⁵⁰. In short, we induced mutations by incubating 2.5 kg of ‘Etincel’ grain in water overnight at 8 °C following an incubation in 0.3 mM NaN₃ at pH 3.0 for 2 h at 20 °C with continuous application of oxygen. After thoroughly washing with water, the grains were air-dried in a fume hood for 48 h. Mutagenized grains were sown in fields in Nørre Aaby, Denmark, and collected in bulk using a Wintersteiger Elite plot combiner. In the following generation, 2.5 kg of grain was sown in fields in Lincoln, New Zealand, and 188 pools of approximately 300 plants each were hand-harvested and threshed. A representative sample, 25% of each pool, was milled (Retsch GM200), and DNA was extracted from 25 g of the flour by LGC Genomics.

FIND-IT screening

The FIND-IT ‘Etincel’ library was screened as described in ref. ⁵⁰ using a single assay for the isolation of srh1^P63S variant (ID no. CB-FINDit-Hv-014). Forward primer 5′ AATCCTGCAGTCCTTGG 3′, reverse primer 5′ GAGGAGAAGAAGGAGCC 3′, mutant probe 5′6-FAM/CGTGGACGT/ZEN/CGACG/3’IABkFQ/wild-type probe/5′SUN/ACGTGGGCG/ZEN/TCGA/3′IABkFQ/ (Integrated DNA Technologies).

4K SNP chip genotyping

Genotyping, including DNA extraction from freeze-dried leaf material, was conducted by TraitGenetics. srh1^P63S mutant, the corresponding wild-type ‘Etincel’ and srh1 pangenome accessions Morex, RGT Planet, HOR 13942, HOR 9043 and HOR 21599 were genotyped for background confirmation. Pairwise genetic distance of individuals was calculated as the average of their per-locus distances¹⁴³ using the R package stringdist¹⁴⁴ (v.0.9.8). Principal coordinate analysis was done with R¹²⁰ (v.4.0.2) base function cmdscale on the basis of this genetic distance matrix. The first two principal components were illustrated by ggplot2 (https://ggplot2.tidyverse.org).

Sanger sequencing

gDNA of the srh1^P63S variant and ‘Etincel’ was extracted from 1-week-old seedling leaves (DNeasy, Plant Mini Kit, Qiagen). Genomic DNA fragments for sequencing were amplified by PCR using gene-specific primers (forward primer 5′ TTGCACGATTCAAATGTGGT 3′, reverse primer 5′ TCACCGGGATCTCTCTGAAT 3′) and Taq DNA Polymerase (NEB) for 35 cycles (initial denaturation at 94 °C/3 min followed by 35 cycles of 94 °C/45 s, 55 °C/60 s and 72 °C/60 s for extension, with a final extension step of 72 °C/10 min). PCR products were purified using the NucleoSpin Gel and PCR Clean-Up Kit (Macherey-Nagel) according to the manufacturer’s instructions. Sanger sequencing was done at Eurofins Genomics Germany using a gene-specific sequencing primer (5′ AGAACGGAGAGGAGAGAAAGAAG 3′).

RNA preparation, sequencing and data analysis

Rachilla tissues from two contrast groups, Morex (short) and Barke (long), and Bowman (long) and BW-NIL-srh1 (short), were used for RNA-seq. The rachilla tissues were collected from the central spikelets of the respective genotypes at rachilla hair initiation (Waddington 8.0) and elongation (Waddington 9.5) stages. Total RNA was extracted using TRIzol reagent (Invitrogen) followed by 2-propanol precipitation. Genomic DNA residues were removed with DNase I (NEB, M0303L). High-throughput paired-end sequencing was conducted at Novogene (Cambridge, UK) with the Illumina NovaSeq 6000 PE150 platform. RNA-seq reads were trimmed for adaptor sequences with Trimmomatic¹⁴⁵ (v.0.39) and the MorexV3 genome annotation was used as reference to estimate read abundance with Kallisto¹²⁸. The raw read counts were normalized to TPM expression levels.

Messenger RNA in situ hybridization

In situ hybridization was conducted in longitudinal sections and cross-sections derived from whole spikelet tissues of Bowman and Morex at rachilla hair elongation developmental stage (Waddington 9.5) with HvSRH1 sense and antisense probes (124 bp). The in situ hybridization was performed as described before¹⁴⁶ with few modifications.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

Releasing a sugar brake generates sweeter tomato without yield penalty

[ad_1]

Plant materials

The 402 tomato accessions used in this study are listed in Supplementary Table 8. Cultivated tomato lines, MM and M82, were used for the transgenic experiments. All of the tomato materials used in this study were grown in a greenhouse. The experimental plant materials in Figs. 2 and 4 and Extended Data Fig. 4b,e,h,i were grown in a greenhouse at the Beijing experimental station (40° 13′ 58.66″ N, 116° 06′ 47.32″ E, Beijing, China) in the autumn of 2018, and the spring of 2020, 2022 and 2023. The experimental plant materials in Extended Data Fig. 4a,d,f,g and Extended Data Fig. 4c were grown in a greenhouse at the Shouguang experimental station (36° 54′ 21″ N, 118° 51′ 46″ E, Shandong, China) in the autumn of 2021 and spring of 2020, respectively. Tomato seedlings were grown in a commercial nursery for 30–40 days and then transplanted to the greenhouse. The tomato plants were provided special care with adequate supply of water and fertilizer, and diseased plants were removed as soon as they were found. If the inflorescence was abnormally developed, a maximum of eight fruits were retained for each inflorescence. For the analyses of fruit yield, we collected all of the fruits from six subsequent individual inflorescences, until they were ripe. The fruit weight was determined per inflorescence. Total yield was the sum of fruit weight from each inflorescence for each plant. Plants that were diseased or grown in guard rows were marked and excluded from the analyses.

For fruit developmental analysis, the harvested fruit was divided into five categories, according to their maturity stage: immature green, mature green, breaker, orange and ripe based on the tomato colour chart USDA Visual Aid TM-L-1 (ref. ⁵⁹). Only fruits that appeared developmentally equivalent were used for analysis. The pericarp of six fruits was excised, immediately frozen in liquid nitrogen, ground by means of a cryogenic mill and stored at −80 °C for further analysis. N. benthamiana used in this study was grown in a growth chamber with a light/dark photoperiod of 8/16 h at 25 °C.

Association mapping

We used the previously reported SNP dataset for the GWAS analysis⁷ with the EMMAX program⁶⁰. The matrix of pairwise genetic distances was used as the variance–covariance matrix for random effects, and the first ten principal components were included as fixed effects. The genome-wide significance threshold was set as P = 1/n, in which n is the effective number of independent SNPs. The effective number of independent SNPs was calculated using Genetic type 1 Error Calculator software⁶¹. The significant P value threshold was P = 1.0 × 10⁻⁶. The Haploview software was used to calculate linkage disequilibrium, with the following parameters: -maxdistance 2,000 -minMAF 0.05 -hwcutoff 0 (ref. ⁶²). Pairwise linkage disequilibrium between the SNPs in the 200-kb interval surrounding the leading SNP was evaluated.

For association analysis of SlCDPK27 PCR amplification data with SSC, a total of 65 variants were generated. An SNP and 3-bp deletion, which lead to nonsynonymous mutation, and a 12-bp insertion, which could be recognized by RAV transcriptional repressor^30,31, were retained for further analysis. Information on the variants is listed in Supplementary Table 2. For allelic variation analysis of SlCDPK27, a dataset of the three key variants was obtained through PCR amplification. The SNP, 12-bp insertion and 3-bp deletion were used to access haplotypes of SlCDPK27; only haplotypes with a total number of accessions of ≥3 were analysed. All primers used in this study are shown in Supplementary Table 9.

Transgenic functional validation

The single-guide RNAs for CRISPR–Cas9 constructs were designed using the CRISPRdirect tool (http://crispr.dbcls.jp/). The CRISPR–Cas9 binary vectors (pKSE402) were revised from the pKSE401 vector by replacing AtU6p with SlU6p (ref. ⁶³). The recombinant pKSE402 vectors were designed to produce mutagenesis within the coding sequence of SlCDPK27 and SlCDPK26, using single-guide RNAs in combination with the Cas9 endonuclease gene (Fig. 2a and Extended Data Fig. 3a for the single-guide RNAs used in this study). Vectors with the correct insertion were introduced into Agrobacterium tumefaciens strain AGL1 competent cells, and tomato transformation was performed as described previously⁶⁴. The transgenic lines were confirmed by PCR and sequencing. All experiments were performed using homozygous lines without T-DNA integration.

Physicochemical analysis

More than six red ripe fruits were collected from each line, and each fruit was measured for fruit weight and total SSC. The SSC was determined using a digital refractometer (PAL-1, ATAGO), adjusted and calibrated at 20 °C with distilled water and expressed as degrees Brix.

Content analysis of sugars

More than six red ripe fruits were collected from each line for sugar analysis. The mixed fruit pericarp was ground in liquid nitrogen, and then 200 mg of ground powder was diluted in 1.4 ml of extraction buffer, with internal standard (8 mg arabinose). After sonication for 10 min and centrifugation (13,000 r.p.m.) for 10 min, the supernatants were filtered through a 0.22-μm polyethersulfone ultrafiltration membrane, twice, and then added to a solution of 100 μl extraction buffer, 895 μl acetonitrile and 5 μl 20% ammonia water for analysis. The content was measured by ultra-performance liquid chromatography with MS/MS (ACQUITY UPLC I-Class-Xevo TQ-S Micro, Waters). The detection was performed as described previously⁶⁴.

For sugar analysis, an ACQUITY UPLC BEH Amide 1.7-μm column was used (2.1 × 100 mm; Waters). The mobile phase was composed of acetonitrile as solvent A, and 0.1% ammonia water as solvent B. The temperatures of the column and autosampler were 60 °C and 4 °C, respectively. Each sugar was separated by increasing solvent B from 10% to 20% in 2 min, keeping at 20% for 6 min, changing to 25% in 0.1 min, keeping at 25% for 1.9 min, then changing to 20% in 1 min and keeping at 20% for 2 min. The flow rate was 0.2 ml min⁻¹. Data analysis was performed using MassLynx V4.1 (Waters).

RNA isolation and qRT–PCR

Total RNA was extracted from the fruit pericarp harvested at the ripening stages, using the RNA extraction kit (catalogue no. 0416-50, HUAYUEYANG Biotechnology), and the RNA was reverse transcribed, using GoScript Reverse Transcriptase (catalogue no. A5003; Promega), according to the manufacturer’s instructions. qPCR was performed using GoScript qPCR Master Mix (catalogue no. A6001; Promega) and the Bio-Rad CFX-96 real-time PCR with CFX Maestro 1.1 software (Bio-Rad). The relative expression levels of each gene were calculated using the 2^−ΔCt method. Three technical replicates were used to calculate the C_T value, and three to five biological replicates were analysed. The tomato ACTIN gene (Solyc03g078400) was used as the internal reference.

Histochemical GUS staining

To examine the SlCDPK27 expression pattern by GUS staining, the 2,452-bp SlCDPK27 promoter region upstream of the ATG was amplified from genomic DNA. Then, the products were cloned into pENTR/D-TOPO to generate pENTR-SlCDPK27pro. SlCDPK27pro-GUS was generated by an LR reaction between pKGWFS7 and pENTR-SlCDPK27pro. SlCDPK27pro-GUS vector was then introduced into A. tumefaciens strain AGL1 competent cells, and tomato transformation was performed as described previously⁶⁴.

Different tomato tissues from the SlCDPK27pro-GUS transgenic lines were collected and incubated in GUS staining buffer containing 5-bromo-4-chloro-3-indolylb-d-glucuronide (X-gluc) as a substrate. Samples were incubated at 37 °C for 1 h. After incubation, the staining buffer was then changed to 70% ethanol for decolourizing.

Subcellular localization assay

To detect the subcellular localization of SlCDPK27 protein, full-length cDNAs of SlCDPK27 and SlCDPK27-CR1 were amplified from MM and the MM-CDPK27-CR1 mutant. The amplified fragments were cloned into pDONR/Zeo (Invitrogen) to generate pENTR-SlCDPK27 or pENTR-SlCDPK27-CR1, respectively. The SlCDPK27–GFP and SlCDPK27-CR1–GFP constructs were generated by LR reactions between pK7FWG2 and pENTR-SlCDPK27 or pENTR-SlCDPK27-CR1, respectively. Then, SlCDPK27–GFP and SlCDPK27-CR1–GFP were transformed into A. tumefaciens strain GV3101, and the agrobacteria harbouring the constructs were infiltrated into N. benthamiana leaves. The plants were then grown in the dark for 24 h, followed by 48 h in a greenhouse under normal conditions. The transient GFP fluorescence in N. benthamiana leaf cells was observed under a Leica SP8 confocal microscope.

Sensory evaluation of sweetness

The sensory evaluation panel was organized twice (one in Shenzhen in March 2022, and the other in Beijing in July 2022). In this study, approximately 100 participants (aged 20–59 years) were selected for each of the sensory tests. These participants were required to be healthy and without any known oral diseases. The sensory test followed the Declaration of Helsinki, and the experimental protocol was approved by the Ethical Committee of Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences. All participants were informed about the sources of the genome-edited tomato materials and the sensory procedure beforehand, and signed informed consent forms before the sensory tests.

The two-alternative forced-choice test was performed to compare the difference in the sensory properties of the sweetness between fruits harvested from SlCDPK-knockout mutants and wild-type MM plants. A paired sample, consisting of a wild-type MM (control sample) and a SlCDPK-knockout mutant (test sample) fruit, was evaluated, according to ISO 5495:2005 (ref. ⁶⁵). A total of three test samples were evaluated in this study, including MM-CDPK27-CR1, MM-CDPK27-CR2 and MM-CDPK27-CR2/MM-CDPK26-CR1 double-mutant plants. Each test sample was evaluated three times, and each time fruit was picked from three different plots. Each sample was placed in a transparent tasting cup labelled with random codes, and the paired samples were presented to the sensory assessors, for sensory evaluation, in a balanced order.

For each panel, about 100 volunteer assessors were asked to perform all of the tests. They were asked to evaluate pairs of samples and were required to indicate the sweeter sample of the paired tomato fruits. A 20–30 s break was provided between the different samples, and the assessors were requested to thoroughly rinse their mouths with purified water. Binomial distribution was used for the statistical analysis of paired comparison tests (ISO 5495:2005)⁶⁵. If the number of correct responses was greater than or equal to the minimum number of correct responses required, at a specified significance level (in this study, α = 0.05), it can be concluded that the SlCDPK-knockout mutant sample has a higher sweetness than the wild-type MM sample. Otherwise, the difference was not significant. After the sensory evaluation, every six remaining fruits of each type were mixed into one sample for further glucose and fructose content analysis. Owing to COVID-19, we did not collect fruits for the sensory evaluation panel organized in Shenzhen in March 2022.

SlCDPK27 antibody production

A peptide containing 15 amino acids from the 6th to the 20th amino acids of SlCDPK27 (⁶GTPGNSSENKKKKNK²⁰) was synthesized and used as an antigen to produce polyclonal antibodies in rabbits. The antibodies were purified by an antigen‐specific affinity approach by Shanghai Youke Biotechnology Co., Ltd.

Immunoprecipitation coupled with MS

For identification of the interaction proteins of SlCDPK27 by immunoprecipitation coupled with MS, fruits of MM at the red ripe stage were collected and ground in liquid nitrogen with an IKA A11 basic analytical mill (A11BS025). Total proteins were extracted with extraction buffer (300 mM Tris-HCl pH 8.0, 600 mM NaCl, 4 mM MgCl₂, 0.5% Triton X-100, complete EDTA-free protease inhibitor cocktail (Roche, cOmplete) and phosphatase inhibitor cocktail (Roche, PhosSTOP)). The extracts were centrifuged at 12,000 r.p.m. for 20 min, and the supernatants were incubated with anti-SlCDPK27 (1:100 dilution) at 4 °C overnight, with the supernatant incubated with IgG used as the control, and then Pierce protein A/G magnetic beads (Thermo Fisher Scientific, 88802) were used to immunoprecipitate the SlCDPK27 protein. The immunocomplex was washed three times and resuspended with 100 µl extraction buffer. A 10 µl volume of sample was used for western blot analysis with SlCDPK27 antibodies (1:2,000 dilution), and horseradish peroxidase-conjugated goat anti-rabbit IgG (H + L) (ZSGB-BIO, catalogue no. ZB-2301, 1:10,000 dilution) was used as the secondary detection antibody. The remaining magnetic beads were separated on 10% SDS–PAGE gels, and then sent to PTM BIO Co., Ltd (Hangzhou, China) for analysis by liquid chromatography coupled with MS/MS.

Yeast two-hybrid assay

The yeast two-hybrid assays were performed using DUAL membrane starter kits (P01401-P01429, Shanghai OE Biotech) according to the manufacturer’s instructions. The ubiquitin moiety was split into two halves, and the N-terminal half with I13 substituted to glycine was used to prevent nonspecific binding to the C-terminal half of ubiquitin. The full-length cDNA of SlCDPK27 was cloned into the pBT3-STE bait vector to generate NubG–SlCDPK27 and the full-length cDNA of SlSUS3 was cloned into the pPR3-N prey vector to generate Cub–SlSUS3, by gateway cloning. The fusion plasmids were co-transformed into yeast strain NMY51 and the yeast transformants were screened on SD/−Leu/−Trp and SD/−Leu/−Trp/−His selective medium.

Firefly LUC complementation imaging assay

The full-length cDNAs of SlCDPK27, SlCDPK27-CR1, SlCDPK27-CR2 and SlCDPK26 were cloned into the pCAMBIA-nLUC-GW vector to generate SlCDPK27–nLUC, SlCDPK27-CR1–nLUC, SlCDPK27-CR2–nLUC and SlCDPK26–nLUC, respectively. Then the full-length cDNA of SlSUS3 was cloned into the pCAMBIA-cLUC-GW vector to generate cLUC–SlSUS3, by gateway cloning⁶⁶. The plasmids were transformed into A. tumefaciens strain GV3101. Different combinations shown in Figs. 5b,c and 6a,b were co-infiltrated into N. benthamiana leaves. The plants were placed in the dark for 24 h, followed by 48 h in a growth chamber under long-day conditions (16 h light and 8 h dark). Then, the infiltrated N. benthamiana leaves were sprayed with 1 mM luciferin, in 0.01% Triton X-100 solution, and kept in darkness for 5 min to quench the fluorescence. A deep cooling CCD imaging apparatus (LB985 Night SHADE) was used to capture the fluorescence image in Fig. 5b, and the Tanon-5200 image system (Tanon, Shanghai, China) was used to capture the fluorescence images in Figs. 5c and 6a,b.

Recombinant protein production and purification

The full-length cDNAs of SlSUS3 were cloned into the pCold-TF vector to express His–TF-tagged recombinant protein. The site-specific mutation in SlSUS3^S11A was introduced by PCR, with the primers listed in Supplementary Table 9. The mutated fragments, confirmed by Sanger sequencing, were cloned into the pCold-TF vector to express His–TF-tagged recombinant protein. The full-length cDNAs of SlCDPK27, SlCDPK27-CR1, SlCDPK26, SlSUS3 and SlSUS3^S11A were cloned into the pET28GW vector to express His-tagged recombinant protein, by gateway cloning. The full-length cDNA of SlCDPK27-CR1 was also cloned into the pGEX-4T-1 vector to express GST-tagged recombinant protein for the in vitro kinase competition experiment. His–SlCDPK27, His–SlCDPK27-CR1, GST–SlCDPK27-CR1, His–SlCDPK26, His–SlSUS3, His–SUS3(S11A), His–TF–SlSUS3, His–TF–SlSUS3(S11A) and GST protein were expressed in Escherichia coli Rosetta (DE3) (catalogue no. EC1010, Shanghai Weidi Biotechnology), following induction with 1 mM isopropyl β–d-1-thiogalactopyranoside (IPTG) at 16 °C for 16 h, and then were purified using Ni-NTA resin (GE-Healthcare) and glutathione Sepharose 4B (GE-Healthcare) according to the manufacturer’s instructions.

In vitro phosphorylation assays

For the in vitro kinase assays, purified recombinant kinases (His–SlCDPK27, His–SlCDPK27-CR1 or His–SlCDPK26) and substrate (His–TF–SlSUS3 or His–TF–SlSUS3(S11A)) were incubated at 30 °C in a kinase reaction buffer (50 mM Tris-HCl, pH 7.5, 10 mM MgCl₂, 1 mM dithiothreitol (DTT), 10 µCi [γ-³²P]ATP) for 30 min. The reactions were then stopped by adding 5× SDS loading buffer (250 mM Tris-HCl, pH 6.8, 10% (w/v) SDS, 0.5% bromophenol blue, 50 mM DTT and 50% glycerol) and boiled for 5 min. The samples were then separated on 10% SDS–PAGE gels. After electrophoresis, the gels were stained with Coomassie brilliant blue as a loading control, and the phosphorylated proteins were visualized by autoradiography.

For the in vitro kinase competition experiment, GST–SlCDPK27-CR1 and GST proteins with various concentration gradients were first incubated with His–TF–SlSUS3 at 30 °C for 1 h. Then, His–SlCDPK26 was added to the kinase reaction buffer for another 30 min. The reactions were stopped by adding 5× SDS loading buffer and boiled for 5 min. The samples were separated on 10% SDS–PAGE gels and visualized by autoradiography.

Identification of phosphorylation sites of SlSUS3 by SlCDPK27

For analysis of phosphorylation sites of SlSUS3 in vitro, recombinant His–SlCDPK27 and His–TF–SlSUS3 proteins were incubated in kinase reaction buffer (50 mM Tris-HCl, pH 7.5, 10 mM MgCl₂, 1 mM DTT, 0.25 mM ATP) at 30 °C for 30 min. Next, the proteins were separated on 10% SDS–PAGE gel, and the bands of His–TF–SlSUS3 were cut out. Then, the His–TF–SlSUS3 protein was digested with trypsin and subjected to analysis by liquid chromatography with MS/MS by the biological MS laboratory at the College of Biological Sciences at China Agricultural University.

Cell-free protein degradation assay

The cell-free protein degradation assay for SlSUS3 and SlSUS3(S11A) was performed as described previously^67,68. Briefly, total proteins were extracted from full-red ripe fruits of wild-type, MM-CDPK27-CR1, MM-CDPK27-CR2, MM-CDPK26-CR1 and MM-CDPK27-CR2/MM-CDPK26-CR1 double mutant plants, in degradation buffer (300 mM Tris-HCl at pH 8.0, 600 mM NaCl, 4 mM MgCl₂ and 20 mM DTT). Recombinant His–SlSUS3 and His–SUS3(S11A) protein and 5 mM ATP were added to the extracts. The mixtures were incubated at room temperature (25 °C) for 1, 2 and 4 h. The reactions were stopped by the addition of SDS sample buffer. His–SlSUS3, His–SUS3(S11A) and actin proteins were detected by immunoblotting with anti-His (MBL, catalogue no. D291-3, 1:3,000 dilution) and anti-actin (Sigma, catalogue no. A0480, 1:10,000 dilution) antibodies, and horseradish peroxidase-conjugated goat anti-mouse IgG (H + L) (ZSGB-BIO, catalogue no. ZB-2305, 1:10,000 dilution) was used as the secondary detection antibody. Protein gel blot images were scanned, and the intensity of the images was quantified by ImageJ (National Institutes of Health).

Gas exchange and ¹³CO₂ labelling of tomato

In vivo isotopic labelling with ¹³CO₂ and flux estimation throughout leaf photosynthetic metabolism were performed as described previously⁶⁹. A 30-l positive-pressure environmental chamber set at 28 °C, 50% humidity and around 200 μmol m⁻² s⁻¹ light intensity was applied for labelling. After displacing CO₂ with premixed gas with a N₂/O₂ ratio of 78/22, 4-week-old plants were labelled using premixed gas containing ¹³CO₂ (Cambridge Isotope Laboratories) at a ¹³CO₂/N₂/O₂ ratio of 0.33:78:21.967. Then, samples labelled for 0, 5, 10 and 20 min were collected and immediately frozen with liquid nitrogen, and then stored at −80 °C for further analysis.

Analysis of metabolite labelling

A 100 ± 3 mg quantity of ground powder of each sample was used for analysis of labelled metabolites. A 1.2 ml volume of extraction buffer (dichloromethane/methanol 2:1) was added to 100 ± 3 mg ground powder of each sample, and 300 μl water was added after vortexing five times. Then, 600 μl supernatant was transferred to a new tube and dried under nitrogen after centrifugation at 12,000g for 10 min. After resuspension with 80 μl water, the samples were incubated in a refrigerator at 4 °C for 10 min, and the supernatants were collected for analysis by liquid chromatography with MS after 10-min centrifugation twice at 12,000g to remove the precipitates.

A Dionex Ultimate 3000 UPLC system coupled with a TSQ Quantiva Ultra triple-quadrupole mass spectrometer (Thermo Fisher) equipped with a heated electrospray ionization probe was used for detection. Extracts were separated by a Synergi Hydro-RP column (2.0 × 100 mm, 2.5 μm, Phenomenex). A binary solvent system was used, with mobile phase A consisting of 10 mM tributylamine adjusted with 15 mM acetic acid in water, and mobile phase B consisting of pure methanol, using a 25-min gradient with mobile B gradually increased from 5% to 90%. Data were acquired in selected reaction monitoring mode for metabolites in positive-negative ion switching mode (Supplementary Table 10). The resolutions for Q1 and Q3 are both 0.7 full-width at half-maximum. The source voltage was 3,500 V for positive and 2,500 V for negative ion mode. The source parameters were as follows: capillary temperature, 350°C; heater temperature, 300 °C; sheath gas flow rate, 35; auxiliary gas flow rate, 10. Tracefinder 3.2 (Thermo) was used for metabolite identification and peak integration.

Genetic statistics and estimation of inbreeding

The variation map was constructed using previously published data⁷. Subsequently, the phylogeny was used to examine the population structure on the basis of the genome-wide SNPs using the IQTREE program (version 2.1.4)⁷⁰. The 12-bp insertion and 1,406-bp deletion were combined to investigate the frequency of haplotypes of SSC11.1 and fw11.3 among PIM and BIG populations. To clarify the selective pattern of SSC11.1 (ch11_51.180–51.205 Mb; SL2.50) and fw11.3 (ch11_55.250–55.290 Mb; SL2.50), the independent phylogenies were constructed with the ‘GTR + I + G’ model using IQTREE⁷⁰.

The genetic statistics were combined to examine the selection in the large region containing SSC11.1 and three fruit weight (fw) loci (ch11_48.0–56.3 Mb; SL2.50). Tajima’s D test compares the observed distribution of pairwise nucleotide differences to the expected distribution under a neutral model of evolution. A negative value suggests an excess of low-frequency mutations, which could be indicative of directional selection, population contraction or genetic hitchhiking⁷¹. The pairwise differentiation (F_ST), known as the fixation index, is a measure of genetic differentiation between populations. High F_ST values typically suggest high divergence between populations⁷¹. Tajima’s D value was analysed by VCFtools (version 0.1.16), and the F_ST value was calculated using the Python script popgenWindows.py as described previously^72,73. The recombination rate could be estimated on the basis of demographic history and the variation map, and SMC++ was used to infer the demographic history of BIG and PIM populations⁷⁴. In addition, the variation map of PIM and BIG was integrated in Pyrho to evaluate the genome-wide recombination rate⁷⁵.

Runs of homozygosity (ROHs) are extended stretches of homologous segments within genomes that reveal population history and trait architecture⁷⁶. These regions indicate a lack of genetic variation and are often associated with inbreeding or recent ancestry. We performed the genome-wide ROH analysis on the basis of the variation map using the PLINK program (version 1.90b6.21)⁷⁷ with the following parameters: –homozyg-kb 1,000 –homozyg-snp 10 –homozyg-window-het 3. Increased inbreeding levels are associated with both longer and more numerous ROHs. Furthermore, recent inbreeding events tend to result in longer ROH segments. To visualize these patterns, we used ggplot2 (version 3.4.4)⁷⁸ to plot ROH length and number distributions across wild (PIM) and cultivated (BIG) populations.

Phylogenetic analysis of SlCDPK27 and SlCDPK26

Conserved domains and their corresponding Pfam-formatted HMM models (PF13499 and PF00069) were identified for SlCDPK26 and SlCDPK27 proteins by leveraging the InterPro database. Subsequently, genome sequence and annotation files encompassing a diverse array of 12 genomes were used for phylogenetic and sequence analyses, including a fern (Azolla filiculoides; https://fernbase.org/ftp/), a gymnosperm (G. biloba; https://ginkgo.zju.edu.cn/genome/ftp/; version-2021) and 10 angiosperm crops: S. lycopersicum (https://solgenomics.sgn.cornell.edu/organism/Solanum_lycopersicum/genome; SL2.50), Solanum melongena (https://solgenomics.sgn.cornell.edu/organism/Solanum_melongena/genome; HQ-1315), Solanum tuberosum (https://genome.jgi.doe.gov/portal/; DMv6.1), Arabidopsis thaliana (https://genome.jgi.doe.gov/portal/; TAIR10), Capsicum annuum (http://www.pepperbase.site/node/3; CaT2T), Malus domestica (https://genome.jgi.doe.gov/portal/; v1.1), Manihot esculenta (https://genome.jgi.doe.gov/portal/; v8.1), Oryza sativa (https://genome.jgi.doe.gov/portal/; v7.0), Citrus sinensis (http://citrus.hzau.edu.cn/download.php; v3.0) and Citrullus lanatus (http://cucurbitgenomics.org/organism/21; 97103, v2). The names of protein annotations are assigned abbreviations derived from their scientific names for consistency. Homologous protein annotation was performed utilizing the hmmer program (version 3.4)⁷⁹ and the previously identified Pfam-formatted HMM models. The annotated protein sequences were subjected to multiple sequence alignment using MAFFT (version 7.525)⁸⁰ and phylogenetic tree construction using the neighbour-joining method in the MEGA program (version 11.0.10)⁸¹. Sequence alignment of CDPK proteins in clade III was visualized using the ggmsa (version 3.19)⁸² package in R.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

Tag: Plant breeding

Pan-genome bridges wheat structural variations with habitat and breeding

Structural variation in the pangenome of wild and domesticated barley

Plant growth and high-molecular-weight DNA isolation

DNA library preparation and PacBio HiFi sequencing

Hi-C library preparation and Illumina sequencing

Genome sequence assembly and validation

Single-copy pangenome construction

Illumina resequencing

SNP and SV calling

Linkage disequilibrium in the Barke x HID055 population

Preparation and Illumina sequencing of narrow-size whole-genome sequencing libraries for core50

Contig assembly of core50 sequencing data

Pangenome accessions in diversity space

Haplotype representation

Transcriptome sequencing for gene annotation

De novo gene annotation

Gene projections

Definition of core, cloud and shell genes

Annotation of TEs

Whole-genome pangenome graphs

Analysis of the Mla locus

Scan for structurally complex loci

Molecular dating of divergence times of duplicated genes in complex loci

Amy1_1 analysis in pangenome assemblies

Comparative analysis of the amy1_1 locus structure

Amy1_1 PacBio amplicon sequencing

Amy1_1 SNP haplotype analysis and k-mer-based copy number estimation

Local pangenome graph for amy1_1

AMY1_1 protein structure and protein folding simulation

Development of diverse amy1_1 haplotype barley NILs

Micro-malting and α-amylase activity analysis

Amy1_1 gene expression of RGT Planet and amy1_1–Barke NIL during micro-malting

Rachilla hair ploidy measurements

Scanning electron microscopy

Linkage mapping of SHORT RACHILLA HAIR 1 (HvSRH1)

Cas9-mediated mutagenesis

FIND-IT library construction

FIND-IT screening

4K SNP chip genotyping

Sanger sequencing

RNA preparation, sequencing and data analysis

Messenger RNA in situ hybridization

Reporting summary

Releasing a sugar brake generates sweeter tomato without yield penalty

Plant materials

Association mapping

Transgenic functional validation

Physicochemical analysis

Content analysis of sugars

RNA isolation and qRT–PCR

Histochemical GUS staining

Subcellular localization assay

Sensory evaluation of sweetness

SlCDPK27 antibody production

Immunoprecipitation coupled with MS

Yeast two-hybrid assay

Firefly LUC complementation imaging assay

Recombinant protein production and purification

In vitro phosphorylation assays

Identification of phosphorylation sites of SlSUS3 by SlCDPK27

Cell-free protein degradation assay

Gas exchange and 13CO2 labelling of tomato

Analysis of metabolite labelling

Genetic statistics and estimation of inbreeding

Phylogenetic analysis of SlCDPK27 and SlCDPK26

Reporting summary

Promises and challenges of crop translational genomics

Gas exchange and ¹³CO₂ labelling of tomato