Clinical cohorts
All patients in the M0/M1 case–control cohort were diagnosed with colorectal adenocarcinoma at the Institute of Pathology, Faculty of Medicine, LMU Munich, and were subsequently identified through a systematic database search in collaboration with the Munich Cancer Registry. Patients in the M1 group were histologically or radiologically diagnosed with local recurrence or distant metastasis to any site within a 5-year observation period after resection of a colorectal adenocarcinoma. To build the M0 group, a patient who did not progress within the same time frame was pair-matched to each M1 patient according to sex, age, disease stage, tumour grade and primary tumour site. Available patient and tumour characteristics, as well as survival data, were collected. Patients with secondary malignancies were excluded. The availability of sufficient analysable tumour tissue limited patient numbers to 82 per group. Clinicopathological details are listed in Supplementary Table 1. For the UICC stage II cohort, anonymized CRC specimens from patients who underwent surgical resection at LMU Munich between 1994 and 2017 were obtained from the archives of the Institute of Pathology LMU. Follow-up data were recorded prospectively by the Munich Cancer Registry. Specimens were anonymized. Clinicopathological details are listed in Supplementary Table 2. The studies were approved by the Institutional Ethics Committee of the Medical Faculty of the LMU (approval no. 18-105-UE) and carried out according to the Declaration of Helsinki. For both cohorts, TROP2 expression was detected on 4-µm-thick formalin-fixed paraffin-embedded sections using anti-TROP2 antibody (Abcam; ab227691; clone SP295) at a 1:100 dilution on a Ventana BenchMark ULTRA autostainer instrument. Normal human tonsil epithelia were included as positive controls in each staining run.
PDOs and tissues were established from tumours or normal mucosa obtained at the University Hospital Heidelberg (ethical approval nos. S-136/2021 and S-871/2021), in accordance with ethical guidelines and regulations. All patients provided written informed consent. Clinical features and genetic driver mutations (identified by whole-exome sequencing) are provided in Supplementary Table 4.
Kaplan–Meier plots of disease-free survival and relapse-free survival hazard ratios in 1,336 patients with CRC from 16 publicly available datasets were generated using the Kaplan–Meier Plotter tool (https://kmplot.com/analysis/). The automatic best cut-off was used, incorporating data from a previous study53. Results shown in Extended Data Fig. 1h,i were generated using SubtypeExploreR (https://subtypeexplorer.qub.ac.uk/).
Animal experiments
All mouse experiments were approved by the local authorities of Regierungspraesidium Karlsruhe under animal protocols G-148-20, G-27-22, G-159-22 and G-164-22. Tumour growth was monitored regularly, and mice were euthanized when tumours reached the maximum permitted tumour size or when humane end point criteria were met. These limits were defined in accordance with the approved animal protocols and were not exceeded in any of the experiments. The mice were housed at the DKFZ animal facilities in accordance with the local and latest standards with a 12 h dark and light cycle, a constant temperature (20–24 °C) and humidity (45–65%). They were provided with a rodent-specific diet and water ad libitum.
Tacstd2
CreERT2 transgene generation
The TAG stop codon of the Tacstd2 gene was replaced with a P2A-CreERT2-FRT-NeoR-FRT cassette. This CreER knock-in allele was generated by homologous recombination of a bacterial artificial chromosome clone in embryonic stem cells. Embryonic stem cells were selected with neomycin for 12 days. Insertion of the donor sequence was verified by PCR with the KAPA Long Range PCR Kit (Roche; KK3502) according to the manufacturer’s guidelines. The PCR primers are detailed in Supplementary Table 5. The positive embryonic stem clone was injected into C57BL/6N blastocysts, followed by embryo transfer into RjOrl:SWISS recipient female mice. The neomycin resistance cassette was removed by crossing the founder mouse with the FLPe mouse strain.
Virus production
Lenti-X 293T cells (Takara Bio; 632180) were cultured in Dulbecco’s Modified Eagle Medium (DMEM) GlutaMAX (Thermo Fisher Scientific; 31966021) with 10% fetal bovine serum(Thermo Fisher Scientific; A5256701) and 1% penicillin–streptomycin (Sigma; P4458-100ML) up to a confluency of 70%. The Lenti-X cells were transfected using polyethyleneimine (PolyScience; 23966-1) with plasmid of interest (Supplementary Table 7), pMD2.G and psPAX2 plasmids in a molar ratio of 1:1:1. Supernatant was changed 12–24 h after transfection. After centrifugation, the supernatant was removed, the lentiviral pellets were resuspended in ice-cold phosphate buffered saline (PBS) and aliquots were stored at −80 °C.
Organoid transduction and selection
PDOs were dissociated with TrypLE (Gibco; 12605010) and resuspended in 500 µl of complete medium containing 10 µg ml−1 of polybrene (Sigma; TR-1003). Subsequently, 50 µl of the virus was added to the organoid suspension, which was spun down at 32 °C for 30 min and then incubated at 37 °C for 1 h. Transduced PDOs were plated in 70% basement membrane extract (BME). Four to five days after transduction, transduced organoids were selected based on the marker present in the plasmid of interest: FACS or blasticidin (10 µg ml−1) (Santa Cruz Biotechnology; sc-495389).
Mouse lines
Male C57BL/6J mice (7–10 weeks old) were purchased from Janvier Labs. NOD scid IL2Rγ null (NSG) mice were provided by the animal facility at DKFZ. Lgr5EGFP–IRES–CreERT2 (RRID:IMSR_JAX:008875), Rosa26LSL–tdTomato (B6.Cg-Gt(ROSA)26Sortm14(CAG–tdTomato)Hze/J; RRID:IMSR_JAX:007914), FLPe (B6.Cg-Tg(ACTFLPe)9205Dym/J; RRID:IMSR_JAX:005703) and the Tacstd2CreERT2 allele was generated as described above. All GEMMs were bred on a C57BL/6J background. Lgr5eGFP–IRES–CreERT2 and Tacstd2CreERT2 alleles were kept in heterozygosity, whereas Rosa26LSL–tdTomato and the FLPe alleles were used in either homozygosity or heterozygosity. The Tacstd2CreERT2 allele was used only in experiments in which the neomycin resistance cassette had been removed.
Subcutaneous injection and tumour monitoring
Mice were anaesthetized with isoflurane, and single cells in a final volume of 100 µl (50-µl PBS and 50-µl BME) were injected into the left flank of the mice. Tumour growth was monitored three times a week. The experimental end point was defined by tumour size less than 1.5 cm, tumour ulceration, animal distress or end of specific treatment.
To assess the tumour-initiating capacity of tumour cell populations, a defined number of viable single cells from dissociated allografts or organoids were sorted and transplanted subcutaneously into recipient mice. Experiments were ended after 7–9 weeks for PDOs or once tumours reached the end point for MDOs.
Tumour-initiating capacity was assessed by limiting dilution analysis using the web tool ELDA54. Active stem cell frequency was estimated based on the dilution series under a single-hit Poisson model using a generalized linear model and compared across conditions using a likelihood-ratio test.
Intra-splenic injection
Single cells were injected into the spleen of mice in a final volume of 50-µl PBS. An incision was made on the left flank of the mouse to access the spleen. Metamizole (200 mg kg−1) was subcutaneously administered before surgery. Lidocaine (10 mg kg−1) and bupivacaine (3 mg kg−1) were administered subcutaneously close to the incision site before surgery. Metamizole was administered in the drinking water for 3 days after surgery. The experimental end point for each experiment is stated in the figure legends.
Tissue processing and histological analysis
Human and murine tissues were fixed overnight in 10% formalin at 4 °C. Paraffin blocks were cut into sections with a thickness of 3–5 μm. Sections were deparaffinized by two consecutive washes with xylene for 3 min each, followed by rehydration.
Metastatic burden was evaluated on haematoxylin-and-eosin-stained sections. The area of each individual metastatic nodule was delineated and annotated. Metastatic burden was calculated as the percentage of liver area occupied by metastatic nodules.
Immunofluorescence
After tissue processing, antigen retrieval was performed in a steamer using boiling target retrieval solution (Dako; S169984) for 30 min. Next, slides were washed for 5 min in Tris-buffered saline with Tween-20 (TBS-T). Samples were blocked in Tris–NaCl blocking buffer (0.1 M Tris–HCl (pH 7.5) and 0.15 M NaCl with 0.5% w/v blocking reagent (PerkinElmer; FP1020)) for 1–2 h at room temperature and incubated with specific primary antibodies (Supplementary Table 7) either overnight at 4 °C or for 1.5 h at room temperature. After incubation, the slides were washed twice with TBS-T and incubated with the specific secondary antibody (Supplementary Table 7) and DAPI (1:1,000) for 30 min at room temperature. Images were acquired using a ZEISS LSM 710 microscope and analysed with Fiji (ImageJ) or QuPath v.0.4.3.
In situ hybridization
In situ hybridization was performed using the manual RNAscope 2.5 HD Duplex Detection Kit (Advanced Cell Diagnostics; 322500) strictly according to the manufacturer’s instructions. Staining was performed on formalin-fixed paraffin sections (3–5 μm), which were cut and placed in a 60 °C oven for 2 h before staining. To ensure the quality and integrity of RNA in tissues, a positive control probe (PPIB) and a negative control probe (DAPB) were used. The following probes were used: Mm-Lgr5 (312171), Mm-Tacstd2-C2 (471751-C2), Hs-Tacstd2-C2 (405471), Hs-Lgr5 (311021), Mm-PPIB-C1/POLR2A-C2 (321651) and Hs-PPIB-C1/POLR2A-C2 (321641).
Western blot
Cell pellets from PDOs were washed with ice-cold PBS, lysed in ice-cold radioimmunoprecipitation assay buffer (Cell Signaling Technology; 9806S) supplemented with EDTA-free protease inhibitor cocktail (Roche; 4693132001) and phosphatase inhibitor cocktail (Roche; 4906837001) and finally incubated for 30 min on ice. Protein concentration was determined with the Pierce BCA Protein Assay Kit (Thermo Fisher Scientific; 23225), using BSA (Thermo Fisher Scientific; 23209) as the standard reference. For protein denaturalization, protein lysates were then boiled for 10 min at 70–95 °C with NuPAGE LDS buffer (Invitrogen; NP0007) and NuPAGE Sample Reducing Agent (Invitrogen; NP0009). Denaturalized proteins were loaded in a precast SDS–PAGE gel (Bio-Rad; 5671084). Next, proteins were transferred to a nitrocellulose (Bio-Rad; 1704159) or a polyvinylidene difluoride (Bio-Rad; 1704157) membrane using a semi-dry transfer method with the Trans-Blot Turbo Transfer System (Bio-Rad; 1704150) following the ‘1.5 MM Gel protocol’. Primary and secondary antibodies are listed in Supplementary Table 7. Finally, membranes were developed by chemiluminescence using horseradish peroxidase substrate (Bio-Rad; 1705060 or 1705062) and imaged with the ChemiDoc Imaging System (Bio-Rad).
Immunohistochemistry
Following tissue processing, antigen retrieval was performed using Target Retrieval Solution (Dako; S1699). Slides were cooled and incubated in 3% hydrogen peroxide (Sigma; 95321). Sections were incubated in normal swine serum (BIOZOL; LIN-ENP9010-10) at room temperature. Primary antibodies (Supplementary Table 7) were applied either overnight at 4 °C or for 1 h at room temperature. After incubation, the slides were rinsed with 0.1% TBS-T and incubated for 30 min to 1 h at room temperature together with the appropriate biotin-conjugated secondary antibody (Supplementary Table 7). The slides were washed with 0.1% TBS-T and incubated with the VECTASTAIN ABC Kit (Vector Laboratories; PK-6100) for 30 min at room temperature. The slides were washed again with 0.1% TBS-T. Signal detection was conducted with DAB Chromogen (Agilent Dako; GV82511-2) following the manufacturer’s instructions.
Quantitative real-time PCR
RNA from a minimum of 40,000 FACS-sorted cells was isolated using the Arcturus PicoPure RNA Isolation Kit (Applied Biosystems; KIT0204) following the manufacturer’s guidelines. Tissue samples or organoids preserved in RNAlater (Invitrogen; AM7021) were homogenized with the gentleMACS homogenizer with RLT buffer in gentleMACS M Tubes (Miltenyi Biotec; 130-096-335). RNA from cell pellets or homogenized tissue was extracted using QIAGEN RNeasy Kit (74104) according to the manufacturer’s instructions.
Complementary DNA was synthesized by reverse transcription of the isolated RNA using the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems; 4374966) or the SuperScript VILO Master Mix (Thermo Fisher Scientific; 11755050) according to the manufacturer’s guidelines. Quantitative real-time PCR was performed using the PowerUp SYBR Green Master Mix (Thermo Fisher Scientific; A25778) following the manufacturer’s guidelines. Primer sequences for targeted genes are detailed in Supplementary Table 8.
Tumour dissociation
For tumour dissociation into single cells, tumour fragments were thawed and enzymatically digested with the Tumour Dissociation Kit (Miltenyi Biotec; 130-095-929) following the manufacturer’s instructions. In brief, 5 ml of DMEM GlutaMAX (Thermo Fisher Scientific; 31966021) was supplemented with 500 µl of the enzyme mix and placed in the gentleMACS C Tube (Miltenyi Biotec; 130-096-334) together with the tumour fragments. The tubes were placed in the gentleMACS Octo Dissociator with Heaters (Miltenyi Biotec; 130-096-427) and dissociated using the default programs 37C_m_TDK_1 (for murine tissue) and 37C_h_TDK_1 (for human tissue). Single cells were stained with different specific antibodies.
FACS and flow cytometry analysis
Single-cell samples from human or mouse organoids and tumours were generated as previously described. Single cells were stained with fluorescent antibody conjugates (Supplementary Table 7) for 30 min at 4 °C in the dark. Next, samples were washed with 5-ml FACS buffer (PBS with 1% fetal calf serum) and resuspended in 200-µl FACS buffer with DAPI (BioLegend; 422801) or Zombie NIR (BioLegend; 423105), which were used as death exclusion markers. The samples were analysed on a BD LSRFortessa and further analysed on FlowJo v.10.7.1. For FACS, the Fusion Cell Sorter was used.
In vitro drug treatment
FOLFIRI chemotherapy treatment was performed by treating MDOs or PDOs with 5 µM 5-FU (Selleckchem; S1209) and 5 µM irinotecan (Selleckchem; S2217) for 96 h. FOLFOX chemotherapy treatment was performed by treating PDOs with 5 µM 5-FU and 200 nM oxaliplatin (Selleckchem/BIOZOL; S1224) for 96 h. 5-FU treatment was performed by treating PDOs with 5 µM 5-FU for 96 h. Oxaliplatin treatment was performed by treating PDOs with 200 nM oxaliplatin for 96 h. Irinotecan treatment was performed by treating PDOs with 5 µM irinotecan for 96 h. SN-38 treatment was performed by treating PDOs with 1 µM SN-38 (TargetMol; T1703) for 96 h. SG treatment was performed by treating PDOs with 5 µM SG (MedChemExpress; HY-132254) for 3 h or 96 h. Sac-TMT treatment was performed by treating PDOs with 5 µM Sac-TMT (MedChemExpress; HY-164789) for 3 h or 96 h. IgG1–SN-38 treatment was performed by treating PDOs with 5 µM ADC control human IgG1–SN-38 (MedChemExpress; HY-164668) for 3 h or 96 h. Subsequently, cells were collected to assess the effect of the treatment by flow cytometry or quantitative real-time PCR.
In vivo drug treatment
FOLFIRI chemotherapy treatment was performed by treating mice with 10 mg kg−1 of 5-FU (InvivoGen; sud-5fu-4) in 0.9% NaCl 5 days a week and 45 mg kg−1 of irinotecan (Merck; PHR2717) in 0.9% NaCl with 5% DMSO twice a week through intraperitoneal injections. FOLFOX chemotherapy treatment was performed by treating the mice with 10 mg kg−1 of 5-FU in 0.9% NaCl 5 days a week and 2 mg kg−1 of oxaliplatin (Selleckchem/BIOZOL; S1224) in 0.9% NaCl once a week through intraperitoneal injections. 5-FU treatment was performed by treating the mice with 10 mg kg−1 of 5-FU in 0.9% NaCl 5 days a week through intraperitoneal injections. SG (MedChemExpress; HY-132254) treatment was performed by treating the mice with 25 mg kg−1 of SG in 0.9% NaCl twice a week through intraperitoneal injections. Sac-TMT (MedChemExpress; HY-164789) treatment was performed by treating the mice with 5 mg kg−1 of Sac-TMT in 0.9% NaCl twice a week through intraperitoneal injections. IgG1–SN-38 (MedChemExpress; HY-164668) treatment was performed by treating the mice with 25 mg kg−1 of ADC control human IgG1–SN-38 in 0.9% NaCl once through intraperitoneal injection. The starting point and duration of the treatment are indicated for each experiment in the figure legends.
Lineage tracing
To induce in vitro recombination of the Rosa26LSL–tdTomato allele, organoids were treated with 10 µM of 4-OHT for 24 h. 4-OHT was removed from the medium, and cells were analysed by flow cytometry to evaluate tdTomato signals at the time points specified in the figure. For lineage tracing of TROP2+ cells after chemotherapy treatment, organoids were treated with 10 µM of 4-OHT for 24 h followed by 96 h of FOLFIRI treatment. Cells were analysed by flow cytometry to evaluate tdTomato signals at specified time points. To induce in vivo recombination of the Rosa26LSL–tdTomato allele, TcT–AKP organoids were dissociated as described, and single cells were used for intra-splenic injections as described above. Mice were treated once with 5 mg kg−1 of tamoxifen 2 weeks after MDO injection. Two days after the tamoxifen treatment, the mice were treated with either FOLFIRI (as described before) or vehicle for 2 weeks. To assess the presence of tdTomato clones in the liver metastases of the mice, livers were sampled as indicated in the figure.
Clonogenicity assays
MDOs or PDOs were dissociated into single cells as previously described. Subsequently, 1,000 MDO cells were seeded in 20-µl BME (R&D Systems; 3433-005-01) in 24-well plates and incubated for 5 days until organoids were formed, and 5,000 PDO cells were seeded in 20-µl BME (R&D Systems; 3433-005-01) in 24-well plates and incubated for 10 days until organoids were formed. The number of organoids per well was quantified under a light microscope, and organoid size was quantified for one representative image per well using the Fiji software. The seeding efficiency of both MDOs and PDOs was calculated by dividing the number of the resulting organoids by the initial number of seeded cells.
CRISPR–Cas9 knock-in vector generation
A single guide RNA (sgRNA) was designed using the CHOPCHOP web tool (https://chopchop.cbu.uib.no/). SgRNA sequences are detailed in Supplementary Table 9. Cloning of the sgRNA sequence into the pSpCas9(BB)–2A-GFP (PX458) plasmid (Addgene; 48138) was performed as previously described55. Insertion of the sgRNA of interest in the backbone was verified by Sanger sequencing (Supplementary Table 9).
To introduce the tdTomato cassette into the Lgr5 locus, the HR180–LGR5 vector (Addgene; 129094) was used. To that end, the 5′ and 3′ homology arms for the human LGR5 locus in the HR180–LGR5 original vector were replaced for the homology arm specific for the mouse Lgr5 locus. The homology arms were amplified from C57BL/6J genomic DNA (gDNA) with primers containing restriction-site-specific overhangs (Supplementary Table 9). The 5′ homology arm and vector were digested with NdeI (Thermo Fisher Scientific; ER0582) and SacI-HF (New England Biolabs; R3156S) (for Lgr5) or with NdeI and NsiI-HF for 1 h at 37 °C. The digested backbone was gel-purified, and the digested 5′ homology arm was PCR-purified for a sequential ligation using the T4 ligase as previously described. Next, the 3′ homology arm and vector were digested with AflII and SphI restriction enzymes for 1 h at 37 °C. The digested backbone was gel-purified (QIAGEN; 28706), and the digested 3′ homology arm was PCR-purified (QIAGEN; 28106) for a sequential ligation using the T4 ligase as previously described. The correct insertion of homology arms was verified by Sanger sequencing (Supplementary Table 9).
CRISPR–Cas9 knock-in generation and selection
VKPN and VAKPS MDOs were electroporated with 5 µg of the plasmid pSpCas9(BB)–2A-GFP (PX458) (Addgene; 48138) and 5 µg of the donor plasmid. After 72 h, electroporated organoids were selected with puromycin (10 µg ml−1) (Sigma-Aldrich; P9620) for 4–5 days, and Ruby+ clones were isolated with a pipette under a fluorescent microscope and expanded. To validate integration of the donor vector, gDNA was extracted using the DNeasy Blood & Tissue Kit (QIAGEN; 69506) according to the manufacturer’s instructions. The region of interest was PCR-amplified with DreamTaq Green PCR Master Mix (Thermo Fisher Scientific; K1082) following the manufacturer’s instructions. For the 5′-specific integration, a forward primer located upstream of the 5′ homology arm sequence and a reverse primer located in the integrated cassette were used. For the 3′-specific integration, a forward primer located in the integrated cassette and a reverse primer located downstream of the 3′ homology arm sequence were used (primer sequences are detailed in Supplementary Table 9). For the removal of the Ruby–puromycin resistance cassette, the clones were electroporated with the AAV–CAG–GFP–Cre vector56. After 72 h, electroporated organoids were dissociated into single cells and isolated for GFP+. GFP+ cells were plated and incubated for 1 week in complete medium supplemented with 10 µM Y-27632 (Hölzel Biotech; M1817). Once single-cell-derived organoids were formed, isolation was performed with a pipette under a light microscope. To confirm the floxing of the Ruby–puromycin resistance cassette, gDNA was extracted and the locus was PCR-amplified (primer sequences are detailed in Supplementary Table 9).
CRISPR–Cas9 knockout generation and selection
For the generation of Apc or Trp53 knockout, murine small intestine-derived organoids were electroporated with the pSpCas9(BB)–sgRNA–2A-GFP vector. To select Apc mutant KPNLT cells, GFP+ cells were sorted 48–72 h after electroporation with the pSpCas9(BB)–sgRNA(Apc)–2A-GFP vector. Single clones were isolated and expanded. To select TcT small intestine organoids with mutations in Apc, R-spondin (AVSBio; R001-1) was withdrawn from the medium. To select organoids with mutations in Trp53, organoids were treated with 5 ng ml−1 of nutlin-3 (MedChemExpress; HY-10029) for 1 week. To validate the knockout efficiency, gDNA was extracted. Respective loci with matched sgRNA sequence annealing were PCR-amplified (primer sequences are detailed in Supplementary Table 9). The amplified region was analysed by Sanger sequencing, and the genome editing efficiency was determined with Synthego (https://ice.synthego.com/#/).
Mouse-derived organoids
MDOs were generated from intestinal tumours of GEMMs. MDOs were cultured in Cultrex reduced growth factor basement membrane (BME; R&D Systems; 3433-005-01) and grown in Advanced DMEM/F-12 (Gibco; 12634028) supplemented with 1% l-glutamine (Thermo Fisher Scientific; 25030024), 1% HEPES (Sigma; h0887-100), B-27 (Gibco; 12585010), N-2 (Gibco; 17502048) and 1% penicillin–streptomycin (Sigma; P4458-100ML). Depending on the mutational background of the MDOs, the basic medium was additionally supplemented with 50 ng ml−1 of epidermal growth factor (EGF) (PeproTech; AF-100-15-1000), 100 ng ml−1 of Noggin (PeproTech; 250-38-100) or 100 ng ml−1 of R-spondin conditioned medium (AVSBio; R001-1) (Supplementary Table 10).
Small intestine organoids
Organoids derived from the small intestine were generated as previously described57. Small intestinal organoids were maintained in basic medium supplemented with 50 ng µl−1 of EGF (PeproTech; AF-100-15-1000), 100 ng ml−1 of Noggin (PeproTech; 250-38-100) and 1 µg µl−1 of recombinant R-spondin conditioned medium (AVSBio; R001-1).
Patient-derived organoids
CRC and normal colon PDOs were obtained from tumours surgically removed at the University Hospital Heidelberg. Clinical features and genetic driver mutations (identified by whole-exome sequencing) are described in Supplementary Table 4. Tumour-derived cells were grown as organoids in 70% BME and cultured in 50% Advanced DMEM/F-12 (Gibco; 12634028) supplemented with 1% l-glutamine (Thermo Fisher Scientific; 25030024), 1% HEPES (Sigma; h0887-100), 1% penicillin–streptomycin (Sigma; P4458), 50% WRN conditioned medium (generated as previously described58), B-27 (Thermo Fisher Scientific; 12587010), N-2 (Thermo Fisher Scientific; 17502048), 1 mM N-acetyl-l-cysteine (Sigma; A7250), 10 nM gastrin I (Sigma; G9145), 50 ng µl−1 EGF (PeproTech; AF-100-15), 10 µM Y-27632 (Hölzel Biotech; M1817), 0.5 µM A83-01 (Sigma; SML0788), 100 ng µl−1 of IGF-1 (BioLegend; 590908), 50 ng ml−1 of basic fibroblast growth factor (PeproTech; 100-18b), 4 mM nicotinamide (Sigma; N0636) and 2 µl ml−1 Primocin (InvivoGen; ant-pm-2). The medium was changed every other day. PDOs were split once a week using TrypLE (Gibco; 12605010). Normal colon cells were grown as organoids in 70% BME and cultured in 50% Advanced DMEM/F-12 (Gibco; 12634028), supplemented with 1% GlutaMAX Supplement (Thermo Fisher Scientific; 35050061), 1 M HEPES (Thermo Fisher Scientific; 15630080), 1% penicillin–streptomycin (Sigma; #P4458), LRSPO3-Fc fusion protein conditioned medium (AVSBio; R001-1), LNoggin-Fc fusion protein conditioned medium (AVSBio; N002-1), 1× B-27 without vitamin A (Thermo Fisher Scientific; 12587010), 1 mM N-acetyl-L-cysteine (Sigma; A9165), 10 mM nicotinamide (Sigma; N0636), 50 ng ml−1 of EGF (PeproTech; AF-100-15), 0.5 mM A83-01 (Tocris; 2939), 10 µM SB202190 (Tocris; 1264), 1 µM PGE2 (Tocris; 2296), 0.5 nM WNT Surrogate-Fc Fusion Protein (AVSBio; N001-1) and 2 µl ml−1 of Primocin (InvivoGen; ant-pm-2).
Whole-exome sequencing
Whole-exome sequencing libraries of paired tumour and healthy organoids were generated using the SureSelectXT Automation Reagent Kit and SureSelectXT Human All Exon V7 Capture Library (Agilent Technologies) following the manufacturer’s instructions. In brief, 200 ng of gDNA was fragmented to approximately 150 bp using a Covaris LE220 ultrasonicator. Subsequently, library preparation was performed on a Bravo Automated Liquid Handler (Agilent Technologies), including end-repair, A-tailing, adaptor ligation and amplification. The concentration of amplified, adaptor-ligated DNA library was determined using the TapeStation (Agilent Technologies). In the subsequent steps, 750 ng of amplified, adaptor-ligated DNA library was used for the hybridization reaction with the SureSelectXT All Exon V7 bait set. The DNA library/bait hybrids were captured using streptavidin-coated magnetic beads (Dynabeads MyOne Streptavidin T1; Thermo Fisher Scientific). Index tags were added in the course of PCR amplification of the captured libraries. The final libraries were validated using Qubit (Invitrogen) and TapeStation (Agilent Technologies). Paired-end sequencing (2×100 bp) was performed on an Illumina NovaSeq 6000 according to the manufacturer’s protocol.
The resulting datasets were aligned and called for single-nucleotide variants and indels as described before59. In brief, alignment was performed by BWA-MEM. Single-nucleotide variants were called using Samtools, and indels were identified with Platypus using public workflows60 (https://github.com/DKFZ-ODCF/IndelCallingWorkflow and https://github.com/DKFZ-ODCF/SNVCallingWorkflow). PDOs were paired against normal tissues from the same donor for somatic mutation calling. Variant allele frequencies were calculated, and variants were classified as somatic, somatic_functional, somatic_functional_ncRNA or germline_functional. Variants in the genes APC, TP53, KRAS, PIK3CA, FBXW7, SMAD4, TCF7L2, BRAF, NRAS, ARID1A, ARID2, ERBB3, AXIN2, SMAD2, CTNNB1, POLE, MSH2, PTEN, PTPRT, ATRX and ATM (in analogy to a previous study61), classified as somatic or somatic_functional, were extracted and displayed as the most interesting candidates.
10× scRNA-seq
Tumour samples were dissociated into single cells as previously described. Samples were multiplexed using hashing antibodies (TotalSeq-B; BioLegend) according to the manufacturer’s protocol. In brief, single cells were resuspended in Fc receptor blocking solution, and the antibodies were added at the recommended concentration. Patient and murine CRC samples were sorted for live cells, whereas in PDOX samples epithelial cells were extracted. Multiplexed samples were pooled and sequenced following the Chromium Next GEM Single Cell 3′ Reagent Kit 3.1 (dual index) protocol.
Aligning and demultiplexing single-cell RNA-seq reads
Sequencing reads were mapped to the respective reference genome (human and mouse genome assemblies refdata-gex-GRCh38-2020-A and refdata-gex-mm10-2020-A, respectively) using 10x Genomics Cell Ranger62 (v.7.1.0) with standard settings. The Cell Ranger multi pipeline, with an assignment confidence of 0.9, was used to demultiplex the reads into their original samples unless indicated otherwise.
The Cell Ranger count pipeline (v.6.1.0) was used to align data from the following experiments while providing hashtag oligo (HTO) barcodes as a feature library: VAKPS–VPKN and PDO FOLFIRI time course. For those datasets, demultiplexing was performed in R using Seurat’s implementation of HTODemux with default settings on the centred log-ratio normalized HTO counts63,64. Cells were classified using HTO_max labels, and singlets that could be assigned to a sample were kept for downstream analysis.
Preprocessing of data
Sample processing and analysis were performed in Python (v.3.10.9) using the scanpy package65 (v.1.9.3) unless indicated otherwise. Quality control was performed at the sample level, filtering cells based on their gene count (lower cut-off between 200 and 750; upper cut-off between the 0.95 and 0.995 percentiles of distribution) and mitochondrial content (lower than 15–30%). Genes were filtered based on their expression abundance across cells (minimum expression in three cells). Putative doublets were identified using scrublet (v.0.2.3) and subsequently removed (cut-off at 0.15–0.2). Count matrices were merged, and highly variable genes were identified in a batch-aware manner using scanpy’s highly_variable_genes function with flavour = seurat, choosing the top 2,000 genes that are highly varying across most samples. The MDO data on VAKPS and VKPN were demultiplexed in R, transferred to Python and subsequently preprocessed at the library level.
For the following datasets, sample preprocessing was performed in an analogous manner in R using Seurat: PDO FOLFIRI time course. Preprocessing was done at the library level. Cells with mitochondrial content greater than 20%, feature counts below 2,000 or total counts below 20,000 were removed from the analysis. Cell cycle inference was performed using Seurat’s built-in function CellCycleScoring and subsequently regressed out during the scaling step ScaleData. Gene set scoring was done for gene sets with 5–200 genes detected in the dataset, keeping genes expressed in at least 30% of all cells and using the function AddmoduleScore_Ucell (Extended Data Fig. 10l).
Dimension reduction, clustering and visualization
Principal component analysis was performed on scaled normalized counts of highly variable genes, followed by computation of a k-nearest-neighbour graph through scanpy’s built-in function with default settings. Cell clusters were defined using the Leiden algorithm (v.0.10.1), and data were visualized after running UMAP. As indicated, missing values were imputed using MAGIC (v.3.0.0; ref. 66).
Data integration
To minimize the influence of batch effects on the analysis, data integration was performed where indicated as described below. The resulting latent space was subsequently used for neighbour detection and clustering as described above.
The following datasets were integrated using harmonypy (v.0.0.9), with the sample ID as batch variable and a maximum of 30 iterations: biobank PDOX data (related to Fig. 1f), biobank tissue data (related to Supplementary Fig. 1) and SG treatment experiment (related to Fig. 4i). On the biobank PDOX data, harmonypy was run twice: first at tissue level for the identification of tumour cells and second on the subset of all epithelial cells.
Cell type and cell state annotation
Annotations were based on gene set and marker gene expression patterns. For this, differentially expressed genes were calculated for each cluster compared with all others using scanpy’s rank_genes_group function while using a Wilcoxon rank-sum test with Benjamini–Hochberg correction for multiple testing. The cell cycle state was inferred using marker genes for S and G2/M phases as previously described67. For datasets containing non-epithelial cells, cell type annotation was additionally guided by label predictions obtained through CellTypist68 (v.1.6.3, models (mouse data, Adult_Mouse_Gut.pkl; human data, Adult_Human_Intestine.pkl and Cells_Intestinal_Tract.pkl)). Based on this annotation and the expression pattern of epithelial cell adhesion molecule, the datasets were subsetted to tumour cells for further analysis. Furthermore, highly variable genes were redefined in the epithelial subsets (N = 2,000–3,000), followed by dimension reduction, data integration and clustering as described above.
Composition analysis of cancer subpopulations
Cells were assigned to a certain subpopulation if the respective marker was expressed (counts greater than 0). The relative abundance of subpopulations across all tumour cells was represented as a bar graph.
Differential expression analysis
Differential expression analysis was performed after generating pseudobulks from the single-cell data, considering the sample origin and the desired contrast conditions for the grouping. Pseudobulks were generated as a sum of counts using Decoupler69 (v.1.8.0) while removing genes expressed in less than ten cells or with less than 15 total counts. Contrasting was performed using Wald tests with a significance threshold of 0.05 (Benjamini–Hochberg false discovery rate) in PyDESeq2 (v.0.4.10; ref. 70).
Gene sets, scores and enrichment analysis
Gene set scores were computed at the single-cell level using the scanpy.tl.score_genes function. Conversely, gene set enrichment analysis was performed at the pseudobulk level on the statistics of differential gene expression results using the Decoupler implementation decoupler.get_gsea_df.
Public gene sets registered in the Molecular Signatures Database71 were queried through the previous knowledge database OmniPath72 (v.1.0.8). In addition, gene sets for the following attributes were taken from the literature as provided in the respective study (Supplementary Table 11): fetal33, ISC/LGR5 (ref. 52), high relapse (coreHR/epiHR/allHR/tmeHR) and 5-gene LGR5 (ref. 22), label retaining cell50, crypt proliferation73, proliferation signatures49,74, Wnt-On and POLR1A51, YAP48, YAP_22 (ref. 75), LGR5Hi MEX3AHi/-Low19, squamous and neuroendocrine24, revival SCs SSC2c76, immature enterocytes77 and basal-like pancreatic cancer78. The YAP signature48 was extracted as a set of genes that responds both to YAP overexpression and inversely to YAP knockout with a minimum average log2 fold change of 1.2.
Where necessary, gene homologues of the target species were derived in R (v.4.3.1) as follows: gene symbol to ENTREZ ID mapping was performed using the organism databases org.Mm.eg.db (v.3.17.0) and org.Hs.eg.db (3.17.0), whereas orthologue mapping was executed through Orthology.eg.db (v.3.17.0).
Treatment response at cluster level
The effect of treatment on cluster composition was assessed as follows: the relative number of cells was calculated for each condition (treatment or time point) and samples across all clusters. Then, for each cluster, the mean across samples was computed. This value was compared across conditions. In the SG treatment data of subcutaneous tumours, clusters that increase (or decrease) in size by 30% were considered responsive and labelled ‘up’ (or ‘down’). For the in vivo SG treatment time course of liver metastasis, the following cut-offs on the log2-transformed fold change were selected: |log2FC | > log2(2) for the 0–48 h dataset and |log2FC | > log2(1.5) for the 48–120 h dataset. The enrichment of cells from specific time points in clusters was tested using Fisher’s exact test, as implemented in the SciPy package (v.1.11.4; ref. 79).
Trajectory inference
Trajectory inference was performed using Palantir (v.1.4.0; ref. 80). To prevent clusters of negligible size or low quality from driving the diffusion map, data subsetting was performed in the following case: SG in vivo treatment time course (to clusters 0–6). The initial state was set within a cluster with significant enrichment at the starting time point based on the outcome of the Fisher’s exact test. The individual cell within this cluster was chosen based on the diffusion map eigenvectors reflecting the temporal resolution as the cell with minimum or maximum value (depending on directionality). Unless indicated otherwise, the terminal states were automatically derived by Palantir. For the following dataset, terminal state selection was guided based on the experimental time points: SG in vivo treatment time course.
Public data
The transcriptome data and clinical metadata of the TCGA-COAD cohort were loaded through the TCGAbiolinks package81,82,83 (v.2.25.3) in R (v.4.2.0). Gene expression differences across stages were analysed through pairwise comparisons using a Wilcoxon rank-sum test with Holm P value correction for multiple testing using the ggpubr::compare_means function (v.0.6.0).
For the comparison of all fetal genes, the fetal gene set33, converted to human homologues, served as a starting point. Genes with significant positive correlation between expression value and clinical stage in at least two out of six comparisons were considered putative progression markers.
The epithelial expression data of the SMC and KUL data cohorts84 were normalized using the Seurat package (v.5.0.1). Subsequent analyses were performed as described above (related to Extended Data Fig. 1m).
Statistics
Statistical analysis related to the analysis of the scRNA-seq data was performed as described in Methods using the respective packages. For further statistical analysis, GraphPad Prism software v.9 was used unless indicated otherwise. Limiting dilution analysis was performed using the web tool ELDA (https://bioinf.wehi.edu.au/software/elda/)54 and visualized in GraphPad Prism. Statistical tests and number of replicates are specified in the figure legends. Data are presented as mean ± s.e.m. Statistical significance is indicated as follows: *P < 0.05, **P < 0.01, ***P < 0.001 and ****P < 0.0001.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.