Tag: Leukaemia

  • Single-cell multi-omics map of human fetal blood in Down syndrome

    [ad_1]

    Ethics and tissue acquisition

    Human fetal bone and liver samples were obtained from 15 fetuses with Ts21 12–20 post-conception weeks (PCW) of age and 5 disomic fetuses 11–19 PCW of age, following termination of pregnancy and informed written consent. The human fetal material was provided by the Joint MRC/Wellcome Trust (grant MR/R006237/1) Human Developmental Biology Resource (http://www.hdbr.org), with maternal informed consent, in accordance with ethical approval by the National Health Service (NHS) Research Health Authority, REC Ref: 18/LO/0822. HDBR is regulated by the UK Human Tissue Authority (HTA; www.hta.gov.uk) and operates in accordance with the relevant HTA Codes of Practice. Sample size was not predetermined but based on sample availability and limited by the time period. Randomization and blinding were not applicable because samples were collected based on their karyotype, and data were analysed following automated computational pipelines.

    Dissociation of fetal tissues

    Fetal livers and femurs were received in L15 media and processed within 3 h from dissection. Livers were cut in smaller pieces with a scalpel and transferred to a tube containing prewarmed digestion media: RPMI (Gibco) supplemented with 10% FBS (Gibco), penicillin–streptomycin (10 U ml−1 penicillin and 100 ng ml−1 streptomycin, Sigma-Aldrich), 2 mM l-glutamine (Thermo Scientific), 1× MEM NEAA (Gibco), 1 mM sodium pyruvate (Gibco) and 1.6 mg ml−1 collagenase IV (Sigma-Aldrich). The tube was vortexed for 10 s, then incubated at 37 °C for 30 min, and vortexed for 10 s every 15 min. The digested tissue was filtered through a 100-μm filter and diluted in cold D-PBS (Gibco). Cells were centrifuged at 300g for 5 min, then aliquoted and cryopreserved in KnockOut Serum Replacement (Gibco) + 5% DMSO (Sigma-Aldrich). For femurs, adherent material was removed, then the epiphyses were removed with a scalpel and the bone marrow flushed with D-PBS. The remaining bone was cut in small pieces, then ground with a mortar and pestle using digestion media and incubated at 37 °C for 30 min, vortexing every 15 min. The digested material and the bone marrow flush were mixed and filtered through a 100-μm filter. Cells were centrifuged at 300g for 5 min, then aliquoted and cryopreserved in KnockOut Serum Replacement (Gibco) + 5% DMSO (Sigma-Aldrich). Cells were stored in liquid nitrogen until further analysis.

    The karyotype for each sample used in this study was determined by quantitative fluorescent PCR (QF-PCR). The analysis was performed by the tissue bank from which the samples were obtained. QF-PCR was performed using chromosome-specific microsatellite markers. The analysis showed normal results with an apparently normal diploid complement for chromosomes 13, 15, 16, 18, 21 and 22 and the sex chromosomes in disomic samples and trisomy for chromosome 21 in Ts21 samples. No mosaicisms were detected.

    FACS sorting for scRNA-seq

    On the day of FACS sorting, cells were rapidly thawed at 37 °C and transferred to complete RPMI media (RPMI (Gibco) supplemented with 10% FBS (Gibco), penicillin–streptomycin (10 U ml−1 penicillin and 100 ng ml−1 streptomycin, Sigma-Aldrich), 2 mM l-glutamine (Thermo Scientific), 1× MEM NEAA (Gibco) and 1 mM sodium pyruvate (Gibco)). Live-cell enrichment was performed using MACS Dead Cell Removal Kit (130-090-101, Miltenyi Biotec) following the manufacturer’s instructions. When depleting for CD235a+ cells, a magnetic negative selection was performed using CD235a Microbeads (130-050-501, Miltenyi Biotec) and MACS LS columns (130-042-401, Miltenyi Biotec) following the manufacturer’s instructions.

    For FACS sorting, cells were stained with Zombie Aqua (Thermo Fisher) to exclude dead cells and the cocktail of antibodies (Supplementary Table 21, sc-sorting panel) for 30 min at 4 °C. Cells were centrifuged for 5 min at 300g at 4 °C, resuspended in a final volume of 500 μl of 5% FBS in PBS, subsequently filtered into polypropylene FACS tubes (352063, Thermo Fisher) and sorted on a BD FACSAria Fusion.

    scRNA-seq

    Each cell suspension was submitted for 3′ scRNA-seq using Single Cell G Chip Kit, chemistry v3.1 (10X Genomics), following the manufacturer’s instructions. Libraries were sequenced on an Illumina NovaSeq S4 targeting 50,000 reads per cell, and mapped to the GRCh38 human reference genome using the Cell Ranger toolkit (v3.0.0).

    Nuclei preparation and multiome sequencing

    Nuclei preparation was performed following 10X genomics recommendations. Live, CD45+-sorted cells were centrifuged at 300g for 10 min at 4 °C. Pellets were resuspended in 45 µl chilled lysis buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20, 0.1% Nonidet P40 substitute, 0.01% digitonin, 1% BSA, 1 mM dithiothreitol and 1 U ml−1 RNase inhibitor in nuclease-free water) and incubated on ice for 5 min. Of chilled wash buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl2, 1% BSA, 0.1% Tween-20, 1 mM dithiothreitol and 1 U ml−1 RNase inhibitor in nuclease-free water), 50 µl was added and nuclei were centrifuged at 500g for 7 min at 4 °C. After removing the supernatant, the nuclear pellets were washed with 45 µl chilled diluted nuclei buffer (1× nuclei buffer, 1 mM dithiothreitol and 1 U ml−1 RNase inhibitor in nuclease-free water) without pipetting. Nuclei were centrifuged at 500g for 10 min at 4 °C. Nuclei were resuspended in 7 µl of chilled diluted nuclei buffer and counted. Fifteen thousand nuclei were targeted for library preparation. Each nuclei suspension was submitted for library preparation using the Chromium Next GEM Chip J Single Cell Kit (10X Genomics), following the manufacturer’s instructions. Libraries were sequenced on an Illumina NovaSeq S4 targeting 50,000 reads per nucleus, and mapped to the GRCh38 human reference genome using the Cell Ranger Arc toolkit (v1.0.1).

    Processing of tissues for 10X Visium spatial transcriptomics

    Tissues were frozen in dry-ice-cooled isopentane and stored in air-tight tissue cryovials at −80 °C. Before undertaking any spatial transcriptomics protocol, the tissues were embedded in an optimal cutting temperature compound (OCT) and tested for RNA quality with RNA integrity number (RIN). Tissues with RIN values > 7 were cryosectioned in a pre-cooled cryostat at 10 μm thickness. Two consecutive sections were cryosectioned at 10 μm thickness in a pre-cooled cryostat and transferred to the four 6.5 mm × 6.5 mm capture areas of the gene expression slide. Slides were fixed in methanol for 30 min before staining with haematoxylin and eosin and then imaged using the Nanozoomer slide scanner. The tissues underwent permeabilization for 6 min. Reverse transcription and second-strand synthesis was performed on the slide with cDNA quantification using quantitative PCR with reverse transcription (qRT–PCR) using the KAPA SYBR FAST-qPCR kit (KAPA Biosystems) and analysed on the QuantStudio (Thermo Fisher). Following library construction, these were quantified and pooled at 2.25 nM concentration. Pooled libraries from each slide were sequenced on NovaSeq SP (Illumina) using 150-bp paired-end dual-indexed setup to obtain a sequencing depth of approximately 50,000 reads as per the 10X Genomics recommendations.

    Single-cell in vitro culture

    Single-cell colony-forming unit (sc-CFU) was performed on fetal HSCs, as previously described51. Single, live, Lin, CD34+, CD38, CD62L+, CD52+ cells isolated from the fetal liver of three different fetuses with Ts21 (median of 13 PCW) and disomy (median of 12 PCW) were index-sorted into 96-well plates (Supplementary Table 21, HSC sc-CFU panel) containing StemSpan SFEM (Stemcell Technologies) supplemented with penicillin–streptomycin (10 U ml−1 penicillin and 100 ng ml−1 streptomycin, Sigma-Aldrich), 2 mM l-glutamine (Thermo Scientific), 20 ng ml−1 G-CSF (Peprotech), 20 ng ml−1 SCF (Peprotech), 20 ng ml−1 Flt3-L (Peprotech), 50 ng ml−1 TPO (Peprotech), 20 ng ml−1 IL-3 (Peprotech), 20 ng ml−1 IL-6 (Peprotech), 20 ng ml−1 IL-5 (Peprotech), 20 ng ml−1 M-CSF (Peprotech), 20 ng ml−1 GM-CSF (Peprotech) and 20 U ml−1 EPO (RnD). Cells were cultured for 15 days at 37 °C at 5% CO2. At the end of the culture, colonies were assessed for their lineage output by the expression of CD41a (megakaryocytic), CD235a (erythroid), CD3/CD56 (lymphoid) and CD11b (myeloid) by flow cytometry using a BD LSR-Fortessa analyser (Supplementary Table 21, sc-CFU lineage panel). Colonies were considered positive for a lineage if 30 or more cells were detected in the relative gate. The total number of cells in the colony was determined by Trypan blue exclusion using a Countess II cell counter (Thermo Fisher). To assess differences in the colony output between Ts21 and disomy, we performed a chi-squared test using Ts21 as the observed distribution and disomy as the expected distribution.

    To compare Ts21 erythroid lineage output to disomic erythroid lineage output in the largest colonies, we first subset the Ts21 colonies with a cell count higher than 95% of all disomic colonies (equivalent to 20% of all Ts21 colonies), and the top 20% size colonies in disomic colonies. The output of multi-lineage colonies was binarized to the lineage with the highest number of cells in the relative gate. We then performed a binomial test with n = 17 observed Ts21 erythroid lineages, k = 23 total Ts21 lineages, and P = 0.5 (the proportion of the disomic lineages that are erythroid).

    Fetal liver phenotyping by flow cytometry

    Cells were rapidly thawed in complete RPMI media at 37 °C, then centrifuged at 300g for 5 min and washed again with D-PBS. Cells were then resuspended in D-PBS and LIVE/DEAD blue was added at a 1:800 final concentration. Cells were incubated for 15 min in the dark at room temperature, then washed with D-PBS. Cells were then stained for 30 min in the dark at room temperature with the antibody cocktail (Supplementary Table 21, phenotype panel), in the presence of BD Horizon Brilliant Stain Buffer (final dilution 1:4) and Miltenyi FcR blocking reagent (final dilution 1:5) in a final volume of 200 µl. Cells were washed with D-PBS and immediately acquired on a Cytek Aurora (five lasers setup). Data were analysed on FlowJo (v10.8.2).

    MitoTracker and MitoSOX staining

    MitoSOX is a specific fluorogenic dye for live-cell mitochondria, producing bright green fluorescence upon oxidation by mitochondrial superoxide. MitoTracker green FM accumulates in mitochondria independent of membrane potential and oxidative stress, serving as a reliable tool for mitochondrial mass measurement38. Considering dye efflux bias in HSCs and progenitor cells39, we used verapamil treatment to block xenobiotic efflux pumps and mitigate preferential dye efflux (Supplementary Fig. 16c,e), ensuring a more accurate representation of mitochondrial mass and mtROS levels.

    Cells were rapidly thawed in complete RPMI media at 37 °C, then centrifuged at 300g for 5 min and washed again with D-PBS. Immediately before the incubation with the dyes, MitoTracker Green FM reagent was dissolved in DMSO, and MitoSOX green was dissolved in anhydrous N,N-dimethylformamide at a concentration of 1 mM. Cells were resuspended in 1 ml D-PBS and incubated with MitoTracker green FM (final dilution 1:1,000) or 2 µM MitoSOX green for 30 min at 37 °C in the presence of 50 µM verapamil (diluted from an aqueous 10 mM solution). Cells were then washed with D-PBS and stained for 30 min in the dark on ice with the antibody cocktail (Supplementary Table 21, mito panel) in the presence of BD Horizon Brilliant Stain Buffer (final dilution 1:4), Miltenyi FcR blocking reagent (final dilution 1:5) and 50 µM verapamil, in a final volume of 100 µl. Cells were washed again in D-PBS and immediately acquired on a Cytek Aurora (five lasers setup). Data were analysed on FlowJo (v10.8.2). The populations of interest (Lin+, CD38+, CD38, HSCs) were exported as a fsc file and imported in R via flowCore 2.2 to obtain the fluorescence data of each mitochondrial probe for each cell.

    MitoTracker and MitoSOX data analysis

    To test whether Ts21 cells have significantly different mitochondrial mass or mtROS from disomic cells, we fit a Gaussian generalized linear mixed model. At single-cell resolution, we transformed the MitoSOX and mitochondrial mass values using a rank inverse normal transformation and used the transformed values as the response variable. We accounted for age as a fixed effect and sample as a random intercept. We tested for the effect of disease status (a fixed effect) within our model and determined significance using FDR across the eight fitted models (four cell types by two response variables).

    TFR2 overexpression in HUDEP-2 cells

    For the preparation of lentiviruses, HEK293T cells were co-transfected with lentiviral TFR2 expression plasmid or an empty vector (purchased from vectorbuilder) together with psPAX2 packing plasmid and pMD2.G envelope plasmid using Lipofectamine 3000 (Thermo Fisher). The viral supernatants were harvested at 48 h post-transfection and 72 h post-transfection, pooled and concentrated by ultracentrifugation (for 90 min at 90,000g). The pellet was resuspended in StemSpan SFEM II medium (Stemcell Technologies), aliquoted and stored at −80 °C. Functionality and titre of the lentiviruses were determined by serial dilutions on 293T cells and assessing GFP/TFR2 expression 48 h after transduction. HUDEP-2 cells were cultured and differentiated as previously described52. In brief, the cultivation medium of HUDEP-2 is based on StemSpan SFEM II medium (Stemcell Technologies) supplemented with 2% penicillin–streptomycin (20 U ml−1 penicillin and 200 ng ml−1 streptomycin, Sigma-Aldrich), 2 mM l-glutamine (Thermo Scientific), 50 ng ml−1 SCF (Peprotech), 3 U ml−1 EPO (R&D), 1 µM dexamethasone (Sigma) and 1 μg ml−1 doxycycline (Sigma). HUDEP-2 cells were incubated with concentrated lentiviruses (multiplicity of infection of 1) for 6 h, spun down and resuspended in fresh cultivation medium. After 2 days, cells were spun down and resuspended in differentiation medium, which is composed of IMDM (Thermo Fisher), supplemented with 2% penicillin–streptomycin (20 U ml−1 penicillin and 200 ng ml−1 streptomycin, Sigma-Aldrich), 2 mM l-glutamine (Thermo Scientific), 50 ng ml−1 SCF (Peprotech), 3 U ml−1 EPO (R&D), 5% human AB serum (Sigma), 10 μg ml−1 insulin (Sigma), 330 μg ml−1 holo-transferrin (Sigma), 2 U ml−1 heparin (Sigma) and 1 μg ml−1 doxycycline (Sigma). Cells were cultured for 4 days in differentiation medium at 37 °C at 5% CO2. Cells were assessed for TFR2 and erythroid lineage marker expression (CD235a, CD71 and CD36) by flow cytometry using a BD LSR-Fortessa analyzer (Supplementary Table 21, HUDEP panel). HUDEP-2 and HEK293T cells were gifted by the Grønbæk and Issazadeh-Navikas laboratories, respectively (BRIC, University of Copenhagen), and routinely tested for mycoplasma contamination.

    Analysis of the scRNA-seq data

    For each CellRanger output corresponding to a specific technical and biological replicate, we identified low-quality cells or empty droplets by applying the barcodeRanks and emptyDrops functions using the R package DropletUtils53. We then merged all CellRanger outputs into a single Scanpy object54. Following per-sample droplets removal, quality control was applied based on three parameters: the total unique molecular identifier count (lower-upper threshold (750, 110,000)), the number of detected genes (lower-upper threshold (250, 8,500)), and the proportion of mitochondrial gene count per cell (an upper bound of 20%). We further applied Scrublet55 to remove potential doublets.

    Next, we subsetted to samples of the same organ and Ts21 status and merged into a single dataset (for example, a dataset containing only Ts21 liver samples). We reasoned that Ts21 and disomic cell transcriptomes would be influenced heavily by an extra copy of chromosome 21 (for example, 50% higher expression within Ts21 cells that would not be present in disomic cells). As a result, the highly variable genes will be impacted by disomic versus Ts21 differences and insufficiently capture the genes relevant to liver-residing or femur-residing cells. Therefore, we chose to create disomic liver, Ts21 liver, disomic femur and Ts21 femur datasets to most accurately annotate the population of the individual cells in the data. Within each of the four merged datasets, we applied log-normalization, using the scaling factor 10,000 to correct for between-sample differences in library size, and calculated highly variable genes, using the Seurat (v5.0.3) implementation56. We performed principal component analysis (PCA) on the highly variable genes for dimensionality reduction, retaining the top 15 components using the Scree plot elbow rule. Data were batch-corrected using Harmony16 to account for additional technical variations arising between samples that are non-biological in origin.

    We then performed an iterative clustering procedure to identify clusters in the single-cell data. Broadly, our iterative clustering procedure first found initial clusters using the Leiden algorithm, then merged clusters from seemingly identical cell populations, and finally subclustered into further refined populations using K-means clustering. Thus, the iterative clustering allowed us to further refine initial clustering, such that initial clusters containing multiple cell types can be further split into lower-level cell types. This is particularly useful for broad cell types such as erythroid cells that were further split into early and late erythroid cells. Following the between-sample batch correction above, we computed a neighbourhood graph using the uniform manifold approximation and projection (UMAP) approach implemented in Scanpy and subsequently clustered with the Leiden algorithm. For visualization purposes, we used UMAP manifold embedding to capture the global features in two and three dimensions. We identified marker genes for each cluster by performing a Wilcoxon signed-rank test with FDR correction, and we annotated clusters using these marker genes and canonical marker genes. We performed further clustering, by manually choosing clusters to subcluster using K-means clustering, merging clusters of the same cell type and performing marker gene detection. With this approach, we generated four separate annotated scRNA-seq datasets, together with associated marker genes, for the Ts21 (liver and femur) and disomic (liver and femur) datasets.

    Contrasting cell-type abundances between Ts21 and disomic samples

    To compare cell-type abundances, we calculated the proportion of each major cell type group in each sample. We contrasted cell-type proportions between developmental stage-matched Ts21 and disomic samples with the same sorting strategy using a Mann–Whitney U-test. Finally, we corrected for multiple testing using FDR and assessed significance at FDR < 0.1.

    Ligand–receptor analysis using CellPhoneDB

    We inferred statistically significant ligand–receptors and their corresponding cell types using CellPhoneDB on a subsampled Ts21 liver dataset, such that the proportion of cells in the reduced sample recapitulated the proportion in the full Ts21 dataset and corresponded to the number of cells in the disomic dataset. We repeated the same analysis on the full disomic dataset (which now has an identical cell size). We kept any pairs that did not involve HLA or a protein complex, and kept only those that involved a single receptor. Among the significant ligand–receptors (P < 0.001), we selected ligands or receptors identified in HSC/MPPs and used to communicate with vascular endothelial cells, and performed gene set enrichment analysis (GSEA) on those using EnrichR.

    Analysing differential trajectory of the osteo-lineage

    For the disomic and Ts21 femur, we computed partition-based graph abstraction (PAGA) using all annotated stromal cells (with a PAGA threshold of 0.05). We also computed a force-directed diffusion graph using Pegasus57, and overlaid the Pegasus and PAGA outputs.

    Next, we focused on two different cases of osteo-linage transitioning: (1) within CAR cells, LepR+ CAR cells, osteoprogenitors and osteoblasts, and (2) within arterial endothelial cells, transitioning endothelial cells and osteoblasts. We computed pseudotime using Scanpy, and used the pseudotimes as input into PseudotimeKernel in CellRank58 (without usage of RNA velocity information) to obtain generalized Perron cluster cluster analysis (GPCCA) estimators for identifying macrostates and computing transition probabilities among them. We set the terminal state number according to the shape of the force-directed graph. For case (1), we chose two states in disomic cells based on the observation that there are two clear branches splitting between CAR cells and osteoblasts, and we chose one state in Ts21 because we observed a single branch leading to osteoprogenitors. For case (2), we chose two states considering osteoblasts as one end and some transitioning endothelial cells as the other. Next, we plotted the STREAM plot using the scVelo package59 to visualize the cell-type transition matrix. Finally, we correlated gene expression with estimated absorption probabilities (Pearson correlation, as implemented in the CellRank package). We identified the positively or negatively correlated genes at a significance level of FDR < 0.05 separately in Ts21 and the disomic femur. We checked the Gene Ontology terms of the top 500 genes that were most positively or negatively correlated to absorption probabilities using the clusterProfiler R package60.

    Performing differential expression analysis in scRNA-seq

    Within each cell type, four distinct differential expression analyses were performed to identify differentially expressed genes (DEGs) due to disease status (Ts21 or disomic) or the microenvironment (liver or femur).

    1. (1)

      Liver versus femur in disomic samples

    2. (2)

      Liver versus femur in Ts21 samples

    3. (3)

      Ts21 versus disomic in liver samples

    4. (4)

      Ts21 versus disomic in femur samples

    Previous literature has shown that pseudobulk differential expression methods have improved FDRs compared with single-cell differential expression methods61. As a result, our analyses were performed by first computing cell-type-specific pseudobulk profiles for each sample and then analysing pseudobulk RNA-seq profiles using limma62.

    To calculate sample-level pseudobulk profiles, we aggregated the read counts across cells of the same type. We kept samples for analysis that contained at least ten cells, and we used the filterByExpr() function in the edgeR package with default settings to retain genes for differential expression analysis and reduce the burden of multiple test correction, by removing genes with low expression across samples63.

    Next, limma-voom was used to perform a statistical analysis for differential expression. In brief, sample-level weights were calculated by computing normalization factors for transforming count data into log2 counts per million and deriving weights based on a mean-variance relationship (using the calcNormFactors() function in edgeR and the voom() function in limma in R). Log fold changes for each gene were estimated using a linear model with sorting strategy as a covariate. P values were estimated after empirical Bayes shrinkage (lmFit and eBayes() functions in limma). A Benjamini–Hochberg FDR correction was applied across all gene P values, and significance was assessed at FDR < 0.05.

    Analysis of Ts21 versus disomic DEGs

    As we observed an exponential cross-dependency between the proportion of DEGs on chromosome 21 and other chromosomes, we investigated additional factors that could be relevant. We first tested whether cell-type-specific overexpression of a particular gene on chromosome 21 can lead to greater dysregulation on either chromosome 21 or on other chromosomes. As the probability of a gene being differentially expressed is linked to the number of cells tested in the differentially expressed analysis, we tested this using log fold change values. For each chromosome 21 gene and across cell types, we tested the Pearson correlation between log2 fold change of gene expression (from Ts21 versus disomic samples) and (1) chromosome 21 DEG (%) or (2) non-chromosome 21 DEG (%), and determined the significance of the correlation using FDR < 0.1. Second, we reasoned that overall expression of an important chromosome 21 gene could lead to greater dysregulation, as a highly expressed chromatin modifier or transcription factor might have a consistent 50% overexpression in Ts21 across cell types, but the overexpression might matter more in the cell types where the gene is expressed. We tested this for each chromosome 21 gene. Across cell types, we tested the Pearson correlation between average cell-type-specific gene expression (from Ts21 and disomic samples) and (1) chromosome 21 DEG (%) or (2) non-chromosome 21 DEG (%).

    Identification of context-specific DEGs in HSC/MPPs

    It is difficult to ascertain whether a gene is commonly or uniquely upregulated in single-cell data (for example, a gene upregulated in Ts21 liver HSCs compared with Ts21 femur HSCs, but not disomic liver HSCs compared with disomic femur HSCs). The presence of a DEG in one cell type and the absence in another may be a result of differences in population size, and thus purely statistical.

    As there are sample and cell count differences between datasets, we could not directly take the Ts21 liver versus femur DEGs as the Ts21 population was much larger than the disomic datasets. Instead, we identified DEGs specific to disease status (in liver versus femur analyses) and the microenvironment (in Ts21 versus disomic analyses) in HSC/MPPs by using a subsampling procedure. Downsampling allows the ability to compare two analyses from distinct datasets that are confounded by differences in size. We did not repeat the same procedure across additional cell populations to conclude whether genes are differentially expressed in specific cell populations, as this would require to downsample to the smallest population sizes. This would erode statistical power and be computationally expensive.

    Within liver versus femur analyses, we downsampled the Ts21 liver and Ts21 femur dataset to have the same number of fetuses contributing the same number of samples with the same number of HSC/MPPs as the disomic liver and disomic femur data. As a result, the Ts21 data matched the disomic data in terms of fetus sample cell counts. Similarly, within Ts21 versus disomic analyses, we downsampled the Ts21 and disomic liver data based on fetus sample cell counts in the Ts21 and disomic femur data. As an additional restriction in our downsampling, we ensured that fetuses present in both liver and femur data, with equal or greater number of cells and samples in the liver than in the femur, would still be selected in the downsample. The downsampling routine was repeated 100 times, such that 100 new datasets were created that match the smaller dataset. Differential expression analysis was performed identically to the full data using sample-level pseudobulks and limma-voom. The median nominal P value for each DEG was calculated across 100 iterations. We verified the robustness of this choice of 100 iterations by visualizing the variability of the median P value across iterations to assess its stability.

    Next, we used differential expression analyses in the full data and in the downsampled data to categorize the context dependence of DEGs. In the liver versus femur analysis, we implicate environment-driven DEGs, Ts21-induced DEGs and Ts21-reverted DEGs.

    1. (1)

      Environment-driven DEGs are genes with an adjusted FDR < 0.05 in either Ts21 or disomic samples, and nominal P < 0.05 in the other dataset.

    2. (2)

      Ts21-induced DEGs were discovered by identifying genes with adjusted FDR < 0.05 and median P value across 100 subsamples of P < 0.05 in the Ts21 data and a nominal P > 0.05 in the disomic data.

    3. (3)

      Ts21-reverted DEGs are those discovered in only the disomic dataset: an adjusted FDR < 0.05 in disomic samples but nominal P > 0.05 in the Ts21 dataset.

    4. (4)

      Environment-driven or Ts21-induced DEGs are those genes that have FDR < 0.05 in the full Ts21 dataset, but have a median nominal P value across 100 Ts21 subsamples of P > 0.05 and FDR > 0.05 in the disomic data. For these genes, we do not have sufficient evidence to claim context dependence nor reject context dependence, as the discovery of these DEGs might be highly dependent on sample size.

    To visualize gene–environment interactions, we examined the expression of environment-driven DEGs and Ts21-induced DEGs across Ts21 and disomic liver and femur HSC/MPPs. We scaled expression across cells for each DEG to mean = 0 and variance = 1. We averaged the scaled expression across genes within the environment-driven and Ts21-induced gene sets, such that each cell has its own value for each gene set. We visualized the mean and standard error of these values across cells in Fig. 2c.

    GSEA of upregulated Ts21-induced genes was performed by inputting the list of genes into EnrichR. Scatterplots show the top Gene Ontology terms (molecular function and biological process) or ENCODE and ChEA transcription factors.

    Identifying differences in cell cycling across cell types, environment and Ts21

    We assigned a cell cycle using the score_genes_cell_cycle() function in Scanpy with the standard list of cycling genes from Tirosh et al.64, as applied to all cells from samples of the same environment and disease type. We determined cycling by the predicted cycling phase being equal to ‘G1’ or not (either ‘G2M’ or ‘S’). We compared the proportions of cycling cells using a Mann–Whitney U-test.

    Evaluating the regional distribution of catalogued somatic mutations

    Using publicly available data, we evaluated the hypothesis that the regulatory landscape is affected in Down syndrome to influence where somatic mutations occur. First, we downloaded somatic mutation data from Hasaart et al. in fetal Ts21 and disomic HSCs4, and converted the mutation positions from hg19 to hg38 using liftOver. Second, we identified genes expressed in Ts21 HSCs. We used the filtered set from the Ts21 cycling versus less-cycling HSC differential expression analyses, which were identified by the filterByExpr() function. Next, we narrowed down the Ts21 and disomic sets of mutations within the HSC-expressed genes to the set of non-coding intronic mutations. Finally, we downloaded candidate cis-regulatory elements (cCREs) in hg38 from ENCODE; for Ts21 and disomic mutation sets, we calculated the proportion of intronic somatics in Ts21 HSC-expressed genes that overlap with ENCODE cCREs. To determine significant differences, we bootstrapped the disomic intronic mutation set for 1,000 times and compared the observed Ts21 proportion to the disomic distribution. We calculated P values as the proportion of bootstrapped disomic mutation sets with larger values than the Ts21 value.

    Analysis of the 10X Visium data

    For disomic and Ts21 liver datasets, we used our annotated cell types from the disomic and Ts21 large scRNA-seq datasets as our input reference data for Cell2location65. Next, we merged all SpaceRanger outputs of tissue sections for the disomic liver and then separately for the Ts21 liver to create two Scanpy objects56. We removed mitochondrial genes and spots with the total expressed gene count of less than 800 (the remaining spots were of sufficient good quality for downstream analysis).

    We then estimated cell-type abundances for each spatial spot. Using Cell2location, we trained a negative binomial regression model on the input reference data. We applied our model to the Scanpy formatted data, considering tissue section as a covariate to account for distinct batches. We used the estimated posterior mean value of each cell type (from Cell2location) as the local abundances. For each section, we computed the section-level relative abundance of each cell type as the proportion of its estimated abundance across all spots over the total estimated abundance of all cell types across all spots. We compared relative abundances between disomic and Ts21 using a Wilcoxon rank-sum test, and corrected P values by using Benjamini–Hochberg.

    To evaluate cell-type colocalization, we computed spot-level relative abundance of each cell type, dividing the abundance of each cell type on an individual spot by the total abundance of all cell types on the same spot. We then computed a Pearson distance matrix among cell types, based on these spot-level relative abundances across sufficient-quality spots of tissue sections in the same disease status, respectively, for the Ts21 liver and disomic liver. We next performed hierarchical clustering, with inter-cluster distance estimated by the Ward variance minimization algorithm.

    Processing the multiome data

    We performed the initial processing of multiome data using Seurat (v5.0.3) and Signac (v1.13)66. After Cellranger processing, we identified high-quality multiome cells for downstream analysis if they satisfied the following criteria: more than 750 RNA unique molecular identifiers, more than 250 expressed genes, less than 40% mitochondrial read fraction, transcription start site (TSS) enrichment score of more than 3 and more than 1,000 ATAC fragments in peaks. We next identified and annotated transcriptionally distinct clusters within Ts21 and disomic samples using Seurat. We created Ts21-specific and disomic-specific expression matrices by merging the matrices across Ts21 or disomic samples, respectively. Within the separate Ts21 and disomic expression datasets, we log-normalized with a scaling factor of 10,000, identified 2,000 highly variable genes, scaled and centred the data, performed PCA, and used Harmony with lambda = 1 to batch correct for sample-specific variation. We then constructed a k-nearest neighbours graph based on the Euclidean distance in PCA space (using the first 30 components), and identified transcriptionally distinct clusters using the Leiden algorithm. We nominated marker genes for each cluster by performing a Wilcoxon signed-rank test that compares cells within one cluster to all other cells. We performed further clustering by performing K-means clustering and we merged clusters of the same cell type. With this approach, we annotated each cell in two separate multiome datasets (Ts21 and disomic). The Harmony-corrected datasets were visualized in two dimensions using UMAP.

    Our overarching process for creating the chromatin accessibility matrix was to call peaks within each sample before calling a final set of peaks from cell-type-specific ATAC profiles. We first called peaks within each sample using macs2, as implemented with default parameters in the callPeaks() function in Signac. We created a unified set of peaks across all samples by combining any intersecting peaks into a single peak, and removing the combined peaks that were less than 20 bp or more than 10 kb wide. This set of peaks was used to compute a cell × peaks matrix for each sample from the ATAC fragments file. Using all peaks present in at least ten cells, we ran latent semantic indexing (a two-step procedure of first using term frequency-inverse document frequency normalization and then singular-value decomposition) to project the ATAC matrix into a reduced dimension representation. We performed batch correction over all samples using Harmony before constructing a k-nearest neighbours graph across the first 30 components (except omitting the first component, which correlates with sequencing depth). This graph was used to perform clustering using the Leiden algorithm with resolution = 1; within each cluster, a new set of peaks were called using macs2. This set of peaks was combined into a unified set of peaks, which was used to form the final cell × peaks matrix, which contained all final Ts21 and disomic cells.

    For downstream analyses involving a single combined dataset, the two datasets were merged into a single matrix. Log-normalization, scaling, PCA, Harmony batch correction and UMAP were applied to the combined dataset.

    Myeloid trajectory analysis of snRNA using CellRank

    Processed snRNA data from multiome was subset to cells of the myeloid lineage. Both disomic and Ts21 cohorts were downsampled to 15,000 cells. A trajectory graph was calculated using the force-directed layout (FLE) function in Pegasus. Instead of the UMAP space, the cell trajectories were plotted in the FLE space. Trajectory analysis of the RNA expression data closely followed the CellRank tutorial ‘CellRank beyond RNA velocity’. Moments of connectivity were calculated using scVelo with 30 principal components and 15 neighbours. The root cell was manually selected according to the diffusion map of HSCs, selecting the cell with the greatest Euclidean distance in FLE space from the centre of the cluster, indicating a cell with a divergent transcriptome. The pseudotimeKernel was used to calculate pseudotime with default parameters. The CytoTRACEKernel was used to compute the transition matrix with the parameters used in the tutorial (threshold_scheme = ‘soft’, nu = 0.5). To compute terminal states and the probability of each cell differentiating towards each terminal state, the GPCCA estimator was utilized with default parameters. Schur decomposition was performed, and five terminal states were automatically selected according to an eigengap in the real part of the eigenvalues. Terminal states were labelled according to the cell type with the closest association (late erythroid, monocytes, mast cells, pDCs and megakaryocytes) and absorption probability was calculated. Significant differences in predicted terminal states between Ts21 and disomic HSCs were calculated using a binomial test that used disomic proportions as the background probabilities.

    Topic modelling of the HSC multiome using MIRA

    We used MIRA to perform topic modelling of HSCs in the 10X multiome dataset42. Only HSCs were included for all downstream MIRA analysis. Out of the 6,215 HSCs, 3,784 were Ts21 and 2,431 were disomic. All MIRA analysis closely followed the online tutorials.

    To generate the latent topics for RNA, the variational autoencoder (VAE) framework uses raw expression counts as input. Rare genes were removed by filtering genes only expressed in 15 or fewer cells. Exogenous genes (n = 7,905) were selected using the highly variable gene function in Scanpy, selecting for all genes with a minimum mean dispersion of 0.1. Exogenous genes are genes that will be captured in topics but will not be used as VAE features. Endogenous genes (n = 4,359) were selected by filtering the exogenous genes for those with a normalized dispersion greater than 0.5. Endogenous genes were used as features for the VAE network. The ExpressionTopicModel was instantiated with the default parameters. The learning rate bounds were manually tuned to cover the portion of the learning rate versus loss curve with the steepest slope. The model was then tuned using TopicModelTuner with default iterations, a minimum number of topics set to 2, a maximum number of topics set to 15, a batch size of 32, threefold cross-validation and a training size of 0.8.

    To generate the latent topics for ATAC, the VAE framework used binarized peak counts. Peaks were filtered according to the epiScanpy tutorial. Using the same process for RNA, 72,541 exogenous peaks and 45,095 endogenous peaks were selected according to a minimum mean dispersion of 0.05 and a normalized dispersion greater than 0.5. The AccessibilityTopicModel was instantiated with the default parameters and ‘dataset_loader_workers’ set to 3. The learning rate bounds were manually tuned to cover the portion of the learning rate versus loss curve with the steepest slope. The model was then tuned using TopicModelTuner with default iterations, a minimum number of topics set to 2, a maximum number of topics set to 15, a batch size of 8, onefold cross-validation and a training size of 0.8.

    GSEA was performed on the expression topics using wrapper of enrichr in MIRA and the top 200 genes associated with each topic. For the accessibility topics, transcription factor-binding sites were annotated in peaks using the motif scanning in MIRA and the hg38 reference from the UCSC repository. Each peak was scanned according to the JASPAR 2020 vertebrate collection of transcription factor-binding motifs. Transcription factors that were not expressed in the RNA data were removed. For comparing RNA and ATAC topics, the joint data were split into a disomic and Ts21 dataset and the ‘get_topic_cross_correlation’ function in MIRA was performed.

    Trajectory analysis of the HSC multiome using MIRA

    First, a joint embedding space was calculated using the ‘make_joint_representation’ function in MIRA to combine both modalities. A neighbourhood graph and UMAP embedding were performed on the joint representation (15 neighbours and a minimum distance of 0.1). The datasets were then batch corrected using Harmony on the joint UMAP features. The neighbourhood graph and UMAP embedding were re-calculated on the Harmony-adjusted feature space.

    We next calculated several different HSC branches. First, the diffusion map was calculated using Scanpy ‘diffmap’ with default parameters and then normalized by MIRA to regularize distortions in magnitude of the eigenvectors. Schur decomposition was performed and the eigengap heuristic was used to automatically select the proper number of diffusion components within the data (3). The data were subsetted to three diffusion components and the neighbourhood graph was calculated on the diffusion map embedding and components were connected. Pseudotime was calculated using the ‘get_transport_map’ function in MIRA, which defines a transport map using a Markov chain model of forward differentiation. A root cell was selected as the maximum value of the third diffusion component based on the suggestion in the tutorial. This root cell was located in the centre of the highest density of HSCs. Terminal cells were identified using the ‘find_terminal_cells’ function in MIRA with 8 iterations and a threshold of 0.01. There were three distinct clusters of terminal cells, the cell in each cluster farthest away from the root cell was selected, and the probability of each cell differentiating towards those three branches was calculated. The lineage probabilities were parsed into a bifurcating tree structure using ‘get_tree_structure’ with the threshold set to 1. Expression and accessibility dynamics across time were plotted using a streamgraph that depicts this tree structure.

    Regulatory potential modelling of the HSC multiome using MIRA

    LITE modelling in MIRA was performed to link gene expression to nearby cis-regulatory elements. For every gene, MIRA learns a regulatory window describing a range in which changes in local accessibility appear to influence gene expression. The regulatory window decays exponentially both upstream and downstream according to the TSS of each gene. Consequently, each gene is associated with a unique TSS using non-redundant human TSS annotations (hg38 GENCODE VM39). The model was trained on the union of genes that were highly variable and were the top 5% most activated in each of the 10 expression topics (n = 5,367). The genes that did not have an annotated TSS were removed leaving 4,454 genes. The LITE model was instantiated with default parameters and the raw expression and accessibility data were then fitted (4 out of 4,454 genes failed to fit). Now that each gene contained a trained regulatory potential model, the expression of each gene was estimated by calculating the maximum a posteriori prediction given the accessibility state of each gene in each cell.

    Next, we identified transcription factors that regulate the expression of genes specific to each of the three HSC branches using the probabilistic in silico depletion method in MIRA. MIRA simulates ‘computational knockouts’ of each transcription factor. MIRA uses regulatory potential modelling to predict gene expression based on local chromatin accessibility, and then masks cis-regulatory elements with specific motifs to define transcription factors where motif accessibility is important to gene expression prediction. This is measured by the changes in performance of the regulatory potential model of the gene to predict expression after computationally masking the binding sites of every transcription factor. In this manner, transcription factors that strongly regulate the expression of a gene will be prioritized because masking of their binding site will significantly decrease the accuracy of the LITE model prediction. The function ‘probabilistic_isd’ was run with default parameters across all modelled genes. To counteract the inefficiency of noisy transcription factor-binding site predictions, co-varying genes associated with individual topics were queried for a shared association across many transcription factors. The ‘driver_TF_test’ function was utilized to identify potential transcription factors regulating the expression of branch-specific topics. The top 150 topic-specific genes were included, and a Wilcoxon rank-sum test was performed over the association scores. After identifying transcription factors regulating topic expression, the top genes being regulated by these transcription factors were queried using ‘fetch_ISD_matrix’ and selected according to the ranked association score. We additionally performed this procedure across all RNA topics, which we reported in Supplementary Table 16.

    Divergence in accessibility and expression of the HSC multiome using MIRA

    To identify genes in which expression cannot be accurately predicted by local chromatin accessibility alone, MIRA NITE modelling was performed. In addition to cis-regulation, the NITE model expands on the scope of the LITE model by incorporating accessibility topics, which are genome wide. The NITE model was initialized using the same parameters, topics and genes as the LITE model using ‘spawn_NITE_model’. The model was fit and expression was predicted using default parameters. The difference between LITE and NITE model performance for every gene was calculated using ‘get_chromatin_differential’. The ‘get_NITE_score_genes’ function was used to calculate a cumulative metric per gene describing the divergence of local accessibility and expression across all cells. Genes with a high cumulative NITE score indicate genes that are regulated in part by non-local mechanisms. The top 500 genes with the highest NITE score were incorporated into the GSEA analysis in MIRA. To ascertain whether Ts21 HSCs expressed genes enriched for non-local regulation, pseudobulk differential expression was performed (DESeq2) between disomic and Ts21 HSCs. The distribution of cumulative gene NITE scores for the top 300 condition-specific genes (log2FC) was compared using Wilcoxon rank-sum test with Benjamini–Hochberg correction.

    Defining differentially accessible promoters using the EPD

    We downloaded a set of curated promoter annotations from the EPD43 available through EPDnew for build hg38 (https://epd.expasy.org/epd/human/human_database.php?db=human). We then intersected these annotations with peaks that were differentially accessible in Ts21 HSCs (as compared with disomy HSCs) using the genomicRanges R package. Differentially accessible promoters (FDR < 0.1) were then tested to determine whether they were more likely to be linked to a significantly DEG (FDR < 0.1) using a Fisher’s exact test.

    Defining functional enhancers in ATAC data using ABC

    We defined functional enhancers as those with a high potential to regulate gene expression. Following this definition, we used the ABC model to construct genome-wide maps of enhancer–gene connections45. The ABC scores generated for each enhancer reflect the chromatin activity and chromosome interaction between the enhancer and surrounding genes. We utilized merged ATAC-seq mapping results from disomy HSCs and Ts21 HSCs to identify potential enhancer locations and quantify their chromatin activity. We applied the averaged Hi-C contact data provided by the ABC authors to map contact interactions between chromosomal regions at 5-kb resolution. The default parameters and thresholds were then used to run the ABC model pipeline available on GitHub for implicating a list of relevant enhancers. For motif analyses, non-promoter enhancers were kept (those ABC enhancers that did not overlap with an EPD promoter).

    Motif enrichment analysis in differentially accessible peaks using monaLisa

    To identify transcription factor motifs that are responsible for differential accessibility, we utilized the JASPAR2022 (v0.99.7)67 and monaLisa (v0.63.0)68 R packages. First, liver HSC peaks with FDR < 0.1 in a Ts21 versus disomy HSC analysis were identified and stratified into separate bins based on the direction of differential accessibility. Next, to account for any size differences between groups, we resized all peaks to the median peak length across all groups. After accounting for differences in peak length, we used the ‘calcBinnedMotifEnrR’ function from monaLisa, which internally corrects for GC content and identifies motifs that are significantly enriched in either Ts21-biased or disomy-biased peaks. The motif enrichment analysis above was repeated using all differentially accessible peaks, only promoters and only enhancers.

    Next, to assess differential accessibility of promoters, we downloaded a set of curated promoter annotations from the EPDnew database for hg38 (https://epd.expasy.org/epd/human/human_database.php?db=human). We then intersected these annotations with peaks that were differentially accessible using the genomicRanges R package. Differentially accessible promoters were then tested to determine whether they were more likely to be linked to a significantly DEG (FDR < 0.1) using a Fisher’s exact test. The motif enrichment analysis above was repeated using the EPD promoters.

    To assess utilization of the AP-1 motif between Ts21 and disomy, we specifically used motifs for AP-1 identified in Isakova et al.44. We first identified which motifs belonged to the TRA-response element (TRE) or CRE family by calculating the similarity between motifs as the Pearson correlation between the position frequency matrices using the ‘MotifSimilarity’ function from monaLisa. We then identified two distinct clusters of motifs using hclust() and cutree() R functions with K = 2. These clusters corresponded to the TRE and the CRE motif families. All TRE and CRE motifs were then used in the pipeline described above to identify differential usage of AP-1 motifs. Gene Ontology terms were then assessed using enrichR.

    Finally, we quantified the contribution of each motif to differential accessibility in Ts21 HSCs. We utilized motifmatchR (v1.20.0) to identify ABC enhancers or EPD promoters that contain a significant match to each motif. Then, for each motif, we fit a linear regression model logFC ~ match + peak_type + match:peak_type, where match indicates whether the peak contains a significant match of the motif, peak_type indicates whether the peak is an enhancer or promoter, and logFC represents the log fold change values of the peak. This model allows each motif to have a different effect on differential accessibility between enhancers and promoters. To quantify the contribution of the motif to overall differential accessibility, we computed the multiple R2 from the regression model.

    Assessing enrichment of GWAS SNPs in single cells using SCAVENGE

    Trait-relevant individual cells were calculated using SCAVENGE69, which combines network propagation and SNP enrichment analysis to map causal variants to their cellular context. First, variants with posterior inclusion probability (PIP) > 0.001 were downloaded in the format used by Yu et al.69, which were originally processed and described within Vuckovic et al.70. Second, the full multiome ATAC dataset (including all cell populations) was used as input to downstream tasks. A mutual k-nearest neighbour graph was computed to represent the relationship between neighbouring single cells, using k = 30. Next, g-chromVAR71 was used to calculate bias-corrected z scores for each tested trait and each single cell to estimate cell-trait relevance. The top 5% ranked cells served as seed cells for the SCAVENGE network propagation, which were further scaled and normalized to calculate the final SCAVENGE trait relevance score (TRS).

    SCAVENGE TRS was contrasted between different lineage branches using a pseudobulk approach similar to the approach that we used for differential expression analysis. On a per-sample-and-branch level, we pseudobulked the SCAVENGE scores by summing SCAVENGE scores across all cells of the same branch and of the same sample. We then applied three linear models to assess the effect of each of the three branches on the SCAVENGE TRS. In each model, we accounted for the number of cells within the pseudobulk as a covariate. We determined significance by multiple test corrections at FDR < 0.1 across the 9 analyses (3 branches (1, 2, 3) and 3 traits (RBC, white blood cell and lymphocyte counts)).

    Identification and analysis of peak–gene links using SCENT

    SCENT72 was used to identify peak–gene links in disomic HSCs and Ts21 HSCs, by correlating peak accessibility (binarized) and gene expression (raw) counts. SCENT is a Poisson regression model that recomputes the standard errors in the model coefficients by using bootstrapping, which helps to maintain false-positive rates. We first identified all peak–gene combinations within 500 kb of each other. Separately within disomic and Ts21 HSCs, we retained all peak–gene combinations where more than 5% of cells were accessible and expressed the peak and the gene, retaining a total of 38,478 peak–gene combinations. Owing to biological differences between disomy and Ts21, this was an overlapping but not identical peak–gene set tested in disomic and Ts21 HSCs.

    Next, we applied a SCENT model using binarized peak accessibility, percent mitochondrial reads, log(number of UMIs) and sample as the covariates, and expression counts as the dependent variable to each peak–gene combination. On the basis of the SCENT paper, we set up an iterative bootstrapping scheme to balance runtime and P value accuracy based on the Poisson regression model P values, where P > 0.1 consisted of 100 bootstraps, P < 0.1 consisted of 1,000 bootstraps and P < 0.01 consisted of 10,000 bootstraps. SCENT was performed on a computing cluster using chunks of 100–500 peak–gene sets. We corrected for multiple testing using FDR, and determined significant peak–gene links at FDR < 0.2. Finally, we repeated the Ts21 analysis after downsampling the 3,784 Ts21 HSCs to match the sample size of 2,431 disomic HSCs to assess the impact of cell population size.

    Furthermore, we used SCENT to test whether the effect of peak accessibility on gene expression is modified by trisomy. To do so, we included an interaction term between ATAC peak accessibility and Ts21 status in the SCENT model to address whether the effect of accessibility on expression depends on trisomy. We applied the new SCENT interaction model to a combined Ts21 and disomic HSC dataset. The analysis was performed on all significant peak–gene links in the disomic-only or Ts21-only analyses, and significant interaction terms were assessed at FDR < 0.2.

    One reason why Ts21-only peak–gene links were only identified in Ts21 yet have no significant interaction term would be because of a lack of gene expression or peak accessibility in disomic cells. To test whether Ts21-only peak–gene links without a significant interaction term were more accessible and expressed in Ts21 HSCs than in disomic HSCs, we used differential accessibility (from the 10X multiome ATAC) and differential expression results (from the large scRNA-seq analysis). To calculate differential accessibility in the 10X multiome ATAC, we computed pseudobulk profiles and performed limma-voom with trisomy status as the covariate of interest, as similarly described in the large scRNA-seq analysis. We filtered peaks for differential analysis using the filterByExpr() function in edgeR. We performed two separate binomial tests to assess whether the list of (1) Ts21-only peaks or (2) Ts21-only genes were upregulated in Ts21 compared with all other peaks or genes tested in differential analyses. We defined upregulated in Ts21 as nominal P < 0.05 and logFC > 0 in the limma-voom results.

    We performed two enrichment tests with regard to our Ts21 peak–gene links and RBC GWAS. We defined SCENT peaks and SCENT genes according to three significance thresholds from the SCENT peak–gene analysis: FDR < 0.1, FDR < 0.2 and nominal P < 0.05, and identified GWAS-related peaks and genes using fine-mapped RBC GWAS SNPs at PIP > 0.2. In the first analysis, we assessed the enrichment of fine-mapped RBC GWAS SNPs (PIP > 0.2) within Ts21 SCENT peaks (also known as active cis-regulatory regions) using a Fisher’s exact test. Here, we are comparing whether peaks with a GWAS variant are more likely to be associated with gene expression than with a background set of peaks without a GWAS variant, which include peaks accessible (in more than 5% of cells) and close to a gene (less than 500 kb) that is expressed (in more than 5% of cells). In the second analysis, we assessed the enrichment of differential expression at target GWAS genes defined by Ts21 SCENT peak–gene links. We defined differential expression as FDR < 0.1 in the differential expression analysis of HSCs using the large scRNA-seq data. We subsetted to significant peak–gene links and calculated a 2 × 2 contingency table reflecting (1) whether the peak contained a fine-mapped variant, and (2) whether the target gene is differentially expressed. A Fisher’s exact test assessed the hypothesis that important enhancers (containing fine-mapped GWAS SNPs) have a role in differential expression within HSCs (affecting their target genes).

    Within the results, we report the enrichment P values within the text when using the set of peak–gene links with nominal P < 0.05 (from SCENT). In addition, when reporting peak–gene examples within the results, we use the nominal P values for the SCENT peak–gene test, the differential expression test or the differentially accessibility test. We include the full summary statistics within the Supplementary tables.

    Reporting summary

    Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

    [ad_2]

    Source link

  • Selective haematological cancer eradication with preserved haematopoiesis

    [ad_1]

    Structural dataset and computational analysis

    The experimentally determined 3D structure of the CD45 extracellular domain was retrieved from the PDB (5FMV)22. The per-residue relative solvent accessibility area was computed using a previously published algorithm45 implemented in FreeSASA46 using default parameters. Prediction of B-cell epitopes was based on BepiPred-2.0 (ref. 47) using the default threshold (0.5) for epitope residues nomination. The EV mutation sequence analysis framework48 was used to search for the CD45 sequence with the non-redundant UniProtKB database49. A multiple-sequence alignment was built using five iterations of the jackhammer HMM search algorithm50 with default significance score for the inclusion of homologous sequences.

    Plasmid cloning

    For sgRNA cloning into the px458 host vector (a gift from F. Zhang) (Supplementary Table 1), forward and reverse primers containing complementary CRISPR RNA (crRNA) sequences flanked by BbsI restriction sites were used (Supplementary Table 4). The px458 plasmid was double digested with AgeI-HF (NEB, R3552S) and EcoRI-HF (NEB, R3101S) to eliminate the regions coding for GFP and Cas9. The px458 vector was then digested using BbsI (ThermoFisher Scientific, ER1012), gel purified and ligated with the phosphorylated and annealed crRNA oligonucleotides (called sgRNA plasmid once cloned).

    To transiently overexpress CD45 mutants, we introduced each variant of interest in a plasmid expressing WT CD45RO. Briefly, we digested the pCD45RABC plasmid (Sino Biological) (Supplementary Table 1) with HindIII-HF (NEB, R3104S) and XcmI (NEB, R0533L) to remove the alternative spliced exons A, B and C. The point mutations of the human CD45 variants were then introduced into the plasmid expressing CD45RO using PCR (Supplementary Table 4).

    Plasmids

    All ligations were transformed in JM109-competent bacteria (Promega, P9751). BE plasmids were from Addgene (SPACE-NG, ABE8e-NG, ABEmax-SpRY, ABEmax-SpG, CBE4max-NG, CBE4max-SpG and xCas9(3.7)-BE4) (Supplementary Table 1).

    BE mRNA and sgRNAs

    ABE8e-NG mRNA (capped (cap 1) using CleanCap AG; fully substituted with 5-methoxy-U; 120A polyA tail) and ABE8e(TadA-8e V106W)-SpRY mRNA (capped (cap 1) using CleanCap AG 3′-O-methylation; fully substituted with N1-methyl-pseudo-U; 80A polyA tail) were from Trilink Biotechnologies and Tebu-bio. We used 100-base lyophilized chemically modified sgRNAs from Synthego using their CRISPRevolution sgRNA EZ Kit service and resuspended at 100 µM (3.2 µg µl−1) in 1× TE buffer from Synthego (10 nM Tris, 1 mM EDTA, pH 8.0; chemical modifications include 2′-O-methylation of the three first and last bases and 3′ phosphorothioate bonds between the first three and last two bases of each sgRNA).

    Genomic DNA extraction, PCR and Sanger sequencing

    Cells from the BE plasmid screening were lysed in tail lysis buffer (100 mM Tris pH 8.5, 5 mM Na-EDTA, 0.2% SDS, 200 mM NaCl) containing proteinase K (Sigma-Aldrich) at 56 °C (1,000 rpm) for 1 h. The DNA was precipitated with isopropanol (1:1 volume ratio) and washed in 70% ethanol. The DNA was then resuspended in H2O and the genomic DNA concentration was measured with a NanoDrop device (Thermo Fisher).

    For samples containing few cells, genomic DNA was extracted using QuickExtract (Lucigen, QE09050). Cell pellets were resuspended in 30 µl QuickExtract, incubated at 60 °C for 6 min, vortexed for 1 min and subsequently re-incubated at 98 °C for 10 min.

    PCR was performed using GoTaq G2 Green Master Mix (Promega, M782B). The gDNA of samples analysed by NGS was extracted using QuickExtract (Lucigen, QE09050) or the Quick-DNA 96 Plus kit (Zymo, D4070) and the genomic DNA concentration was measured with a Qubit device (Thermo Fisher).

    For Sanger sequencing, different PCR primers were used depending on the CD45 exon targeted by the sgRNA and the sequencing technology (Supplementary Table 4). Sequencing of PCR amplicons was done at Microsynth and sequencing chromatograms were analysed using the EditR R package51 to quantify BE efficiencies.

    Next-generation amplicon sequencing

    For NGS, targeted amplicon libraries were generated using a three-step PCR protocol. In brief, nested PCRs were done on genomic DNA samples using KAPA HiFi HotStart polymerase (Roche) (Supplementary Table 4). After Illumina barcoding (Nextera indices, Illumina) using KAPA HiFi HotStart polymerase (Roche), PCR samples were pooled, purified using AMPure XP beads (Beckman Coulter) and quantified using Qubit dsDNA HS assay kit (Thermo Fisher). Libraries were paired-end sequenced on an Illumina Miniseq instrument using the Illumina Miniseq Mid output kit (300 cycles) with 50% PhiX spike-in (Illumina). After demultiplexing, each sample was assessed for quality using FastQC52 and processed using the CRISPResso2 tool53. For each of the samples, we provided the reference amplicon sequence (hg38) and the guide RNA sequence (reverse complement) and defined the quantification window centre to −10, the quantification window size to 15 and the plot window size to 30. We applied minimum paired end reads overlap between 10 and 200 and provided the following Trimmomatic sentence: ILLUMINACLIP:NexteraPE-PE:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36. Finally, we used a custom R script (https://gitlab.com/JekerLab/cd45_shielding) to count and translate into amino acid each allele from the CRISPResso2 output file Alleles_frequency_table.txt in the quantification window. Alleles with less than 0.8% frequency were considered as ‘other’.

    Genetic determination of human chimerism

    Genetic discrimination between PDX- and HSPC-derived cells was performed using the Devyser Chimerism NGS kit (Devyser) according to the manufacturer’s recommendations. In brief, sequencing libraries were prepared from genomic DNA targeting 24 polymorphic insertion–deletion markers distributed on 16 different chromosomes. The libraries were sequenced on a MiniSeq (Illumina) instrument using the high-throughput flow cell, generating 74-bp pair-end reads. An informative marker set was defined for the PDX donor/HSPC donor pair. It consisted of 10 markers that reliably discriminated between DNA from PDX and HSPC donor cells. The average proportion of leukaemia donor-specific reads to total reads was calculated to determine the proportion of PDX cells in each sample. It was confirmed that the genomic DNA of the mouse host did not interfere with the analysis.

    CHANGE-seq-BE

    Genomic DNA was extracted from human peripheral blood mononuclear cells (PBMCs) using the Puregene tissue kit (Qiagen, 158063) according to the manufacturer’s instructions including the proteinase-K and RNase steps (Qiagen, 158143 and 158153). CHANGE-seq-BE was adapted and modified from the original CHANGE-seq method54 to validate genome-wide activity for ABEs28. Similar to CHANGE-seq, purified genomic DNA tagmented with a custom Tn5-transposome to generate an average length of 650 bp and followed by gap repair with Kapa HiFi HotStart Uracil + DNA Polymerase (KAPA Biosystems, KK2802) and Taq DNA ligase (NEB, M0208L). Gap-repaired DNA was treated with USER enzyme (NEB, M5505L) and T4 polynucleotide kinase (NEB, M0201L). Intramolecular circularization of the DNA was performed with T4 DNA ligase (NEB, M0202L) and residual linear DNA was degraded by a cocktail of exonucleases containing plasmid-safe ATP-dependent DNase (Lucigen, E3110K), lambda exonuclease (NEB, M0262L), exonuclease I (NEB, M0293L) and exonuclease III (NEB, M0206L). The circularized DNA was then treated with Quick CIP (NEB, M025L) to dephosphorylate 5′ and 3′ ends of any residual linear DNA. Circularized genomic DNA (125 ng) was treated with ABE8e–SpRY:sgRNA-49.3 complexes in vitro in a 50 μl reaction for 24 h at 37 °C. ABE RNP complexes nicked the targeted DNA strand and deaminated adenine bases to inosine in the non-targeted stranded DNA of both on- and off-target sites. Further enzymatic steps were included with ABE treatment in the CHANGE-seq-BE method to generate double-strand breaks. Nicked DNA circles were treated with endonuclease V in 10× NEB buffer 4 (NEB, M0305S). Endo V cleaved DNA adjacent to inosines to generate linear DNA with 5′ overhangs. Gaps were filled with klenow fragments (3′ > 5′ exo) and deoxyribonucleotide triphosphates (dNTPs) (NEB, M0212L) in NEB buffer 2. End-repaired DNA products were A-tailed and further ligated with a hairpin adapter using an HTP library preparation kit (Kapa, KK8235), USER treated and amplified by PCR-barcoded universal primers with NEBNext multiplex oligonucleotides for illumina (NEB, E7600S), using Kapa HiFi HotStart uracil master mix. PCR libraries were quantified by quantitative PCR (KAPA Biosystems, KK4824) and sequenced with 151-8-8151 cycles on an Illumina NextSeq 2000 instrument. CHANGE-seq-BE data analyses were performed using open-source software: https://github.com/tsailabSJ/changeseq/tree/dev.

    rhAmpSeq

    Validation of off-target sites was performed using the rhAmpSeq system from IDT. rhAmpSeq primer panels for targeted amplification were generated using the rhAmpSeq design tool defining the insert size between 150 and 250 bp. Applied primer sequences are listed in Supplementary Table 4. A rhAmpSeq CRISPR library was prepared according to the manufacturer’s instructions and sequenced on an Illumina MiniSeq instrument (MiniSeq high output kit, 300 cycles). Custom python code and open-source bioinformatic tools were used to analyse rhAmpSeq data. First, we generated FASTQ format files by demultiplexing high-throughput-sequencing BCL data files. Next, the reads were processed using CRISPRessoPooled (v.2.0.41) with quantification_window_size 10, quantification_window_center −10, base_editor_output, conversion_nuc_from A, conversion_nuc_to G. The allele frequency table from the output files was used to calculate the A•T-to-G•C editing frequency. Specifically, the editing frequency for each on- or off-target site was defined as the ratio between the number of reads containing the edited base (that is, G) in a window from positions 4 to 10 of each protospacer (where the GAA PAM is positions 21–23) and the total number of reads. To calculate the statistical significance of off-target editing, we applied a method previously described55. In brief, a 2-by-2 contingency table was constructed using the number of edited reads and the number of unedited reads in the treated sample and its corresponding control sample. Next, a χ2 test was done. The FDR was calculated using the Benjamini–Hochberg method. Significant off-targets were defined on the basis of: first, FDR ≤ 0.05 and second, the difference in editing frequency between treated and control (≥1%).

    Cell line culture conditions

    All cancer cell lines (listed in Supplementary Table 5) were cultured in RPMI-1640 (Sigma-Aldrich, R8758) supplemented with 10% heat-inactivated FCS (Gibco Life Technologies) and 2 mM GlutaMAX (ThermoFisher Scientific, 35050061) at 37 °C.

    All cell lines were retrovirally transduced with MI-Luciferase-IRES-mCherry (gift from X. Sun; Supplementary Table 1). Cells were then FACS-sorted on the basis of mCherry expression. After expansion, MOLM-14 and OCI–AML2 were profiled for short tandem repeats and tested negative for mycoplasma before being frozen until further use. Jurkat and NALM-6 were purchased from ATCC and were therefore not profiled for short tandem repeats.

    DF-1 cells were cultured in DMEM high-glucose medium (Sigma-Aldrich, D5796) supplemented with 10% non-heat-inactivated FCS and 2 mM GlutaMAX at 39 °C (Supplementary Table 5).

    Human primary T cell culture and activation conditions

    Leukocyte buffy coats from anonymous healthy human donors were purchased from the blood-donation centre at Basel (Blutspendezentrum SRK beider Basel, BSZ). PBMCs were isolated by density centrifugation using SepMate tubes (StemCell Technologies, 85450) and the density gradient medium Ficoll-Paque (GE Healthcare) according to the manufacturer’s protocol. Frozen PBMCs were thawed and human primary T cells were then isolated using an EasySep human T cell isolation kit (Stemcell Technologies, 17951) following the manufacturer’s protocol. T cells were cultured overnight without stimulation at a density of 1.5 × 106 cells per ml in RPMI-1640 medium supplemented with 10% heat-inactivated human serum (AB+, male; purchased from BSZ), 10 mM HEPES (Sigma-Aldrich), 2 mM GlutaMAX, 1 mM sodium pyruvate, 0.05 mM 2-mercaptoethanol and 1% MEM non-essential amino acids (all from Gibco Life Technologies). The next day, the human primary T cells were activated with interleukin-2 (IL-2) (150 U ml−1, proleukin, University Hospital Basel), IL-7 (5 ng ml−1, R&D Systems), IL-15 (5 ng m−1, R&D Systems) and Dynabead Human T-Activator CD3/CD28 (1:1 beads:cells ratio) (Gibco, 11132D). The activated cells were de-beaded before electroporation.

    hCD34+ HSPC isolation and culture conditions until electroporation

    Leukopaks were purchased from CytoCare and hCD34+ HSPCs were isolated by the LP-34 process using CliniMACS Prodigy (Miltenyi). Isolated hCD34+ HSPCs were thawed and grown in HSPC medium for two days until electroporation (StemSpan SFEM II (StemCell, 09655) supplemented with 100 ng ml−1 human stem cell factor (hSCF) (Miltenyi, 130-096-695), 100 ng ml−1 human FMS-like tyrosine kinase ligand (hFlt3)-ligand (Miltenyi, 130-096-479), 100 ng ml−1 human thrombopoietin (hTPO) (Miltenyi, 130-095-752) and 60 ng ml−1 hIL-3 (Miltenyi, 130-095-069).

    Electroporation conditions

    K562 cells (2 × 106) were resuspended in buffer T and mixed with 5 µg BE plasmid (Supplementary Table 1) and 1.5 µg sgRNA plasmid for co-electroporation using a Neon transfection system (ThermoFisher, MPK10096; 1,450 V, 10 ms, 3 pulses). To monitor the electroporation efficiency, GFP expression was evaluated 24 h after electroporation using an optical microscope. In Extended Data Fig. 1f, BE results are displayed using a custom BE score: log((sum of editing frequencies per condition/number of edited positions per condition)+1).

    De-beaded human activated T cells (1 × 106) were resuspended in 100 µl Lonza supplemented P3 electroporation buffer with 7.5 µg BE mRNA (1 µg µl−1) and 7.5 µg sgRNA (3.2 µg µl−1) (Supplementary Table 2) and electroporated using the 4D-Nucleofector system (Lonza) with program EH-115. Immediately after electroporation, 900 µl pre-warmed human T-cells medium was added directly in the cuvettes and incubated for 20 min at 37 °C for the T cells to recover. Cells were then transferred in 48-well flat-bottomed plates (Corning, 3548) and the medium was supplemented with 500 U ml−1 IL-2. The medium was renewed every two days.

    hCD34+ HSPCs (1 × 106) were electroporated 48 h after thawing with 7.5 µg BE mRNA (1 µg µl−1) and 13.6 µg sgRNA (3.2 µg µl−1) (Supplementary Table 2) with a 1:100 BE:sgRNA molar ratio, following the same protocol as for human T cells but with program CA-137. Electroporated hCD34+ HSPCs were kept in culture at 0.5 × 106 cells per ml in a six-well flat-bottomed plate (Corning, 3516) in HSPC medium supplemented with 100 ng ml−1 hSCF, 100 ng ml−1 hFlt3–ligand and 100 ng ml−1 hTPO for in vivo applications and with the addition of 60 ng ml–1 hIL-3 for in vitro assays. The medium was renewed every five days. Edited hCD34+ HSPCs prepared for in vivo injection were frozen two days after electroporation in cryo-preservation CryoStor CS10 medium (Stem Cell Technologies, 07930) at a density of 10 × 106 cells ml−1.

    CFU assays

    The CFU assay was started 72 h after gene editing. For each condition, 1.1 ml semi-solid methylcellulose medium (StemCell Technologies) containing 200 cells was plated in a well of a SmartDish (StemCell Technologies, 27370) in duplicates. The cells were incubated at 37 °C, for 14 days. The resulting progenitor colonies were counted and scored using STEMVision Analysis (StemCell Technologies) according to the manufacturer’s instructions. The mean of the total number of colonies in the NTC samples for each experiment was set as 1.

    DF-1 cell-transfection conditions

    Plasmid (6.5 µg) encoding WT hCD45RO or its variants was mixed with 200 µl serum-free DMEM medium and 19.5 µl polyethylenimine (1 mg ml−1; Chemie Brunschwig, POL23966-100). The transfection mix was added dropwise to 1 × 106 DF-1 cells plated the day before in a six-well plate. Cells were analysed 48 h later by flow cytometry.

    In vitro ADC-mediated killing assays

    For in vitro ADC killing assays, 5,000 base-edited human activated T cells or 25,000 base-edited hCD34+ HSPCs were plated five days after electroporation in 96-well plates (flat-bottomed for T cells and round-bottomed for HSPCs; Corning 3596 and 3799, respectively) in 100 µl of corresponding medium (supplemented with only 50 U ml−1 of IL-2 for human T cells). For ADC killing assays involving saporin, a 100 nM stock was prepared by incubating the biotinylated antibody (BC8 or MIRG451 mAbs) and saporin–streptavidin (ATS-Bio, IT-27-1000) at a 1:1 molar ratio for 30 min at room temperature.

    For in vitro ADC killing of co-cultures, 12,500 Jurkat cells were stained for 20 min with CTV (Invitrogen, C34557A) at 37 °C and then seeded at a 1:1 cell ratio with 12,500 base-edited hCD34+ HSPCs five days post-electroporation in 96-well round-bottom plates in 100 µl HSPC medium with corresponding concentrations of CIM053–SG3376 (ADC Therapeutics). Cells were incubated for 72 h at 37 °C, stained for flow cytometry or cell sorting and analysed using a BD LSRFortessa. Genomic DNA was extracted for sequencing.

    For in vitro ADC killing of mCherry–luciferase-marked tumour cell lines (Jurkat, NALM-6, OCI-AML-2 and MOLM-14), 2,000 cells were plated in 384-well plates in medium with or without 30 min pre-incubation at 37 °C with 50 µg ml−1 (333.33 nM) naked CIM053 antibody (40 µl final total volume per well). Following a 72 h incubation period, 5 μl firefly d-luciferin (0.75 mg ml−1 resuspended in medium (Biosynth, L-8220)) was added to each well and incubated for 5 min at room temperature. Luminescence readouts were recorded using a BioTek Synergy H1 plate reader.

    Expression and purification of soluble CD45wt and CD45 variants

    For precise antibody–protein affinity measurements and biophysical characterization, CD45wt and variants containing only D1 and D2 of the ECD were produced. The protein sequence (residues 225–394) was histidine tagged at the carboxy terminus and contains few N- and C-terminal added amino acids (full WT sequence: ETGIEGRKPTCDEKYANITVDYLYNKETKLFTAKLNVNENVECGNNTCTNNEVHNLTECKNASVSISHNSCTAPDKTLILDVPPGVEKFQLHDCTQVEKADTTICLKWKNIETFTCDTQNITYRFQCGNMIFDNKEIKLENLEPEHEYKCDSEILYNNHKFTNASKIIKTDFGSPGEGTKHHHHHH, SEQ ID 57, Uniprot ID P08575). Expi293F GnTI cells (Thermo Fisher, A39240) that lack N-acetylglucosaminyltransferase I (GnTI) activity and therefore lack complex N-glycans were used for protein expression. After collection, the protein was purified using Ni-NTA chromatography followed by digestion of high-mannose glycans with endoglycosidase H (EndoHf; New England BioLabs, P0703S) at 37 °C overnight. EndoHf was removed from the protein solution with amylose resin and the CD45 protein was further purified by size-exclusion chromatography in buffer comprising 20 mM HEPES, pH 7.4, 150 mM NaCl. Peak monomer and dimer fractions (where needed) were concentrated using a 10 kDa cut-off Amicon centrifugal filter (UFC8010) and protein aliquots were flash-frozen in liquid nitrogen before storage at −150 °C. Variant CD45 proteins were produced using the same experimental procedure. The monomer content percentage for each protein was taken from the size-exclusion chromatogram.

    Binding analysis of soluble CD45 proteins by BLI

    Analysis of MIRG451 and BC8 binding to the selected variants was performed on an Octet system RED96e (Sartorius) or R8 (Sartorius) at 25 °C with shaking at 1,000 rpm using 1× kinetic buffer (Sartorius, 18-1105). The selected variants were screened for their ability to bind to MIRG451 and BC8 using different concentrations of CD45 (WT or variant). MIRG451 was captured by an anti-human Fc-capture biosensor (AHC) (Sartorius, 18-5060) for 300 s at 0.5–1 µg ml−1. As an analyte, human CD45wt and variants, containing only domains 1 and 2, were titrated at seven different concentrations, from 1,000 nM to 15.6 nM or from 50 nM to 0.78 nM with a 1:2 dilution series. Association of the analyte to MIRG451 was monitored for 600 s and dissociation of the analyte from MIRG451 was monitored for 1,800 s. Reference subtraction was performed against buffer-only wells. AHC tips were regenerated using 10 mM Gly-HCl, pH 1.7. Data were analysed using the Octet data analysis software HT 12.0. Data were fitted to a 1:1 binding model. Kinetic rates Ka and Kd were globally fitted.

    To analyse binding to BC8, streptavidin biosensors (Sartorius, 18-5020) were first coated with CaptureSelect biotin anti-LC-κ (murine) conjugate (Thermo Scientific, 7103152100) for 600 s at 1 µg ml−1. BC8 was then captured by the coated streptavidin biosensors for 300 s at 0.5–1.0 µg ml−1. Analyte titration was performed as for MIRG451. Association of the analyte to BC8 was monitored for 300 s and dissociation of the analyte from BC8 was monitored for 900 s. Reference subtraction, regeneration and data analysis were performed as for MIRG451.

    Characterization of CD45 variants by nanoDSF

    The thermostability of CD45 D1–D2 variants was analysed by differential scanning fluorimetry and monitoring tryptophane fluorescence using Nanotemper Prometheus NT.48 NanoDSF or a Nanotemper Prometheus Panta (NanoTemper Technologies)56,57,58. Thermal denaturation was monitored by tryptophane/tyrosine fluorescence at 350 and 330 nm and an excitation wavelength of 280 nm was used. CD45wt and variants were prepared at 0.25–1.0 mg ml−1 in 20 mM HEPES, 150 mM NaCl, pH 7.4. Then 10 μl was put into the capillaries and placed into the sample holder. Each protein was measured in triplicates per experiment and the CD45wt was measured in four different experiments. The temperature was increased from 20 °C to 90 °C or 95 °C. The analysis was performed using the ratio of the fluorescent intensities at 350 and 330 nm. The software of the instrument was used to calculate Tonset and TM as well as the mean and s.d. of the triplicates. The melting temperature was determined as the inflexion point of the sigmoidal curve and compared with that of CD45wt.

    Flow cytometry analysis and sorting

    Flow cytometry was done on BD LSRFortessa instruments with BD FACSDiva software. Data were analysed with FlowJo software. Antibodies used for flow cytometry are listed in Supplementary Table 6. Cells were sorted with BD FACSAria or BD FACSMelody cell sorter instruments. Sorted cells were resuspended in 30 µl QuickExtract. PCRs were performed and sent for Sanger sequencing.

    CD45 expression of AML samples

    The CD45 expression of 27 people diagnosed with AML at University Hospital Basel was assessed using routinely acquired flow cytometry data as part of the diagnostic work-up. Gating for AML blasts, lymphocytes and erythrocytes was performed manually using FlowJo 10.10.0. Owing to the experimental set-up (threshold for SSC-A and FSC-A to exclude debris), a distinct erythrocyte population could not be distinguished in all samples (23 of 27). Data were analysed with GraphPad Prism 10 and statistical significance was calculated using mixed-effects analysis. All patients gave written informed consent to the analysis of clinical data for research purposes and the study was approved by the local ethics committee (BASEC-Nr 2023-01372).

    Animal experiments

    All animal work was done in accordance with the federal and cantonal laws of Switzerland. Protocols were approved by the Animal Research Commission of the Canton of Basel-Stadt, Switzerland. All mice were housed in a specific pathogen-free condition in accordance with institutional guidelines and ethical regulations. NBSGW (stock 026622) female mice were purchased from Jackson Laboratories. HSPCs were edited as described above. Two days after electroporation, cells were collected and frozen in CryoStor CS10 medium. Cells were thawed on the day of injection, washed and resuspended in PBS. Recipient NBSGW female mice (4 weeks old) were injected intravenously into the tail vein with HSPCs (the number of cells injected varied between 0.6 and 1.1 million and is indicated in each figure). Chimerism was analysed by flow cytometry in blood after ten weeks. Mice were treated with saline or CIM053–SG3376 at the dose(s) and intervals indicated in each figure. For tumour experiments in humanized mice, 1 × 106 MOLM-14–mCherry–luc cells were injected into the tail vein. Then, 10 or 12 days after tumour inoculation, the mice were treated with saline or 1 mg per kg CIM053–SG3376. The mice received a second antibody dose of 0.5 mg per kg CIM053–SG3376 10 or 25 days after the first dose. Mice were euthanized 43 or 45 days after tumour inoculation or when reaching the maximum allowed clinical score. To monitor tumour growth, mice were injected intraperitoneally with 100 μl d-luciferin (BioSynth, L-8220) and were subjected to Newton7.0 imaging (Vilber).

    For secondary transplant, NSG–SGM3 female mice (stock 013062) were purchased from Jackson Laboratories. Recipient mice were irradiated the day before the BM transplant with 200 cGy. Primary transplant mice were euthanized, the BM was isolated and 40% of it was re-injected into the new host. Mice from secondary transplants were euthanized 8 weeks after humanization.

    MOLM-14–mCherry–luc (1 × 106), OCI-AML-2–mCherry–luc (2 × 106), Jurkat–mCherry–luc (5 × 106) or NALM-6–mCherry–luc (0.5 × 106) cells were injected into the tail vein of NBSGW mice. After tumour inoculation, mice were monitored regularly (for behaviour, weight and imaging). Mice were treated with saline, control-SG3376 or CIM053–SG3376 at the dose and intervals indicated in the relevant figure and euthanized 21 days after tumour inoculation or when reaching the maximum allowed clinical score.

    Deidentified patient-derived AML samples were obtained from the PDX repository59,60 (Cancer Research Center of Toulouse, France). Signed written informed consent for research use in accordance with the Declaration of Helsinki was obtained from patients and approved by the Geneva Health Department Ethic Committee. PDX cells (0.6 × 106) were injected into the tail vein of humanized NBSGW mice (8 weeks after HSPC injection). The weight of the mice was monitored regularly. Mice were treated with saline or CIM053–SG3376 at the doses and intervals indicated in the figure and the mice were euthanized 54 days after tumour inoculation. Some control mice were euthanized 3 days before antibody treatment.

    Tissue collection and processing

    After the mice were euthanized, 0.2 ml blood, both hind legs (femur and tibia) and the spleen were collected from each mouse. Cell suspensions were generated, red blood cells were lysed using ACK lysis buffer and then the cell suspensions were filtered. For tumour experiments, organs were collected on the day of euthanasia, single cell suspensions were generated and frozen in cryo medium. Samples from all mice were thawed and stained on the same day to minimize experimental variability. Cells were stained for different antigens and 30 μl Accucheck counting beads (1,066 microspheres per μl; Invitrogen, PCB100) were added to each sample and the results were analysed by FACS using a BD LSRFortessa instrument.

    Statistics and reproducibility

    Statistical analyses were done using GraphPad Prism 9 and 10 software. In all figure legends, n refers to the number of experimental replicates. For multiple comparisons, two-way ANOVA tests were used with significance levels indicated. Data are presented as mean ± standard deviation. Survival curves were analysed using the log-rank Mantel–Cox test. rhAmpSeq was analysed using a χ2 test. The FDR was calculated using the Benjamini–Hochberg method.

    Some data points of the in vivo experiments were excluded after visual inspection of samples if the FACS time gate showed irregularities. One mouse that did not engraft HSPCs was excluded from Fig. 5 and Extended Data Fig. 10. Cell numbers in the sgNTC group treated with CIM053–SG3376 were so low that analysis of some assays became unreliable (NGS, genetic chimerism analysis). We therefore excluded this group from NGS.

    The number of biological replicates is specified for each experiment in the relevant figure legend. Several key experiments were performed by different people at times in different laboratories, and reagents were shared. For instance, identification of variants, characterization of recombinant variants and FACS validation were performed by different people. Some experiments were performed in the academic lab and validated in Cimeio labs and vice versa. To avoid unconscious bias when assigning mice to saline or the CIM053–SG3376 groups, we always assigned the mice with the largest tumour mass to the ADC group.

    The investigator who determined genetic chimerism (NGS and analysis) was blinded and provided the results to the investigator in charge of supervising in vivo experiments. The people who performed CHANGE-Seq_BE, rhAMPSeq and analysed the data were blinded and provided the results to the investigator in charge of supervising the in vivo experiments.

    Availability of materials

    Non-proprietary materials are freely available on reasonable request. Restrictions apply to proprietary, commercial material.

    Reporting summary

    Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

    [ad_2]

    Source link