Transcriptomics – Page 2

Multiscale topology classifies cells in subcellular spatial transcriptomics

[ad_1]

TopACT

TopACT operates on subcellular-resolution spatial transcriptomics data. These data take the form of a grid of spots such that each spot in the grid has an associated expression vector in ${\mathbb{R}}$^D. The output of our method is a cell type annotation for each spot in the grid. First, we use an annotated sc/snRNA-seq reference dataset to construct an automatic single-cell classifier, called a local classifier. This classifier could in principle use any supervised learning approach; for example, a neural network or random forest but a simple and effective choice is the support vector machine (SVM)^35,36,37. The only restriction is that the classifier must take as input an expression vector and output a probability vector over all cell types. We achieve this by using Platt scaling³⁸. Given a local classifier, TopACT then proceeds as follows. Fix a spot s in the grid. For a given radius r ≥ 0, consider a ball of radius r drawn about s. Let X_r be the sum of all the expression vectors of spots in this ball. Feeding X_r as input to the local classifier produces a probability vector v_r over the cell types. For a list r₁ ≤ r₂ ≤ … ≤ r_k of radii, these vectors can be combined into a multiscale cell type confidence matrix A = [${{\bf{v}}}_{{r}_{1}}$, ${{\bf{v}}}_{{r}_{2}}$, …, ${{\bf{v}}}_{{r}_{k}}$]. Now, pick a confidence threshold θ ∈ (0, 1). For a spot s, let j be minimal such that the most likely cell type i at scale r_j has confidence A_ij ≥ θ. We set the cell type of s to be i if such a j exists. In other words, the cell type assigned to the spot s is the cell type predicted by the local classifier at the smallest possible scale at which a confident prediction can be made. For a full technical description of the TopACT method, see Supplementary Methods 1A.

Synthetic data generation

We use a two-stage process to generate synthetic benchmark data, first generating a synthetic cell type map and then imputing gene expression. A synthetic grid of spots with cell type annotations is produced. We sample 625 points uniformly at random from the unit square [0, 1] × [0, 1], taking these points to be cell centres. We draw a Voronoi diagram (computed using the implementation in SciPy³⁹ based on Qhull⁴⁰) on the basis of these points to simulate cell boundaries. Cell types are then assigned at random to each Voronoi region, in proportion to the cell type abundances in the snRNA-seq data. These cell types are then applied to a 500 × 500 grid of spots overlaid on the unit square. The end result is a grid of spots, each annotated with a cell type. Next, we impute the gene expression at each spot using a Poisson process with parameters inferred from the mouse kidney snRNA-seq data described below. This process is based on a simplified version of the model described by ref. ²⁴. In detail, for a cell type T and gene g let λ_Tg denote the mean expression of gene g over all cells in the snRNA-seq dataset with cell type T. If a spot s is assigned the cell type T, we then model the expression v_sg of gene g at s as being distributed by a Poisson distribution

$${{\bf{v}}}_{sg} \sim {\rm{Poisson}}(\alpha {\lambda }_{Tg})$$

where α = exp(−7.3) is a fixed parameter determining the transcriptional abundance. To model zero-inflation, we then select 20% of spots uniformly at random to be assigned zero reads, regardless of the Poisson-modelled expression.

For molecular diffusion experiments we modelled the diffusion effect separately for each synthetic transcript. Given a transcript a, we sampled a diffusion magnitude D_a ~ exp(λ_diff) from an exponential distribution where λ_diff is the mean diffusion for a single transcript. Independently, a diffusion direction θ_a is sampled from a uniform distribution over [0, 2π). This yields coordinate-wise displacements

$${d}_{a}^{x}={D}_{a}\cos {\theta }_{a}\,{d}_{a}^{y}={D}_{a}\sin {\theta }_{a}$$

The original spot coordinates x_a, y_a for the transcript can then be revised accordingly to displaced coordinates

$${x}_{a}^{{\rm{d}}{\rm{i}}{\rm{f}}{\rm{f}}}={x}_{a}+\lfloor {d}_{a}^{x}/{d}_{{\rm{s}}{\rm{p}}{\rm{o}}{\rm{t}}}\rfloor \,{y}_{a}^{{\rm{d}}{\rm{i}}{\rm{f}}{\rm{f}}}={y}_{a}+\lfloor {d}_{a}^{y}/{d}_{{\rm{s}}{\rm{p}}{\rm{o}}{\rm{t}}}\rfloor $$

The rescaling by d_spot = 0.715 accounts for the inter-spot distance of 0.715 μm in Stereo-seq mouse kidney experiments.

Synthetic data analysis

We ran TopACT directly on synthetic data, with an SVM local classifier trained on the same snRNA-seq reference dataset used for generation. For fixed-window bin 20 analysis, we split the 500 × 500 synthetic grid into square bins, each covering a 20 × 20 region of spots. Bin 20 was chosen so that each bin matches the mean area of a synthetic cell. Moreover, at bin 20 the resulting grid approximates 10 μm resolution, which is considered the ‘sweet spot’ for single-cell analysis¹. We then summed the expression over all spots in each region. RCTD²⁴ was run on couplet mode with default settings, using the same snRNA-seq reference dataset and we assigned each bin the RCTD predicted ‘first type’. For the modal cell type classification, we assigned to each bin its most frequent ground truth cell type. For analysis of rare cell type identification, we took rare cell types to be those making up less than 5% of the total samples in the snRNA-seq reference data. To extract cell loci, we first computed a binary image corresponding to each rare cell type. We performed a binary dilation to clean any cells that were split into several adjacent components and then took the centre of each connected component to be a cell centre.

MPH landscapes

MPH tracks how the topological features (here, loops) of a shape evolve as certain parameters are varied. Given an input point cloud, we record the first persistent homology (H₁) of its associated Rips-codensity bifiltration. This information is summarized in a sequence, called an MPH landscape⁹, of functions λ_k: ${\mathbb{R}}$ × ${\mathbb{R}}$ → ${\mathbb{R}}$ for k = 1, 2,…. Given a radius parameter s and a codensity parameter t, the value λ_k(s, t) ∈ ${\mathbb{R}}$ roughly describes the significance of the kth most significant topological feature in the bifiltration at those parameter values. Here, we focus on λ₁: ${\mathbb{R}}$ × ${\mathbb{R}}$ → ${\mathbb{R}}$ which describes the significance of the most significant such feature. For a full introduction to MPH, see Supplementary Methods 1B. Here, we computed average control and treated MPH landscapes for TopACT-predicted immune cell points clouds (Supplementary Methods 2B). MPH was computed with RIVET (https://github.com/rivetTDA/rivet/) and converted to MPH landscapes using the code from ref. ⁷ (https://github.com/MultiparameterTDAHistology/SpatialPatterningOfImmuneCells).

Statistical tests and reproducibility

Statistical tests were performed using the Python packages Scipy (https://www.scipy.org/) and statannotations (https://github.com/trevismd/statannotations). All tests are one-sided Welch’s t-tests.

The snRNA-seq (one control and two lupus nephritis mice, four healthy human kidney samples) and spatial experiments (one control and two lupus nephritis mice, one IgAN patient kidney sample) were not replicated. Mouse histological images are representative of images from six lupus and six control animals and multiplex immunofluorescence images representative of images from three control and three lupus mice.

Animals

Female BALB/cOlaHsd mice were purchased from Envigo at 5 weeks of age. Animals were housed in specific pathogen-free individually ventilated cages, at 20–24 °C and 45–65% humidity, kept on a 12 h light/dark cycle from 08:00 to 20:00, with food and water freely available. All animal experiments were performed under project licence P84582234, with UK Home Office approval and local approval by the Oxford University Clinical Medicine Animal Welfare and Ethical Review Body and were carried out in compliance with UK Home Office Guidelines and the Animals Scientific Procedure Act 1986 (amended 2013) and reported in line with the ARRIVE guidelines. Mice were treated topically with either 5% Aldara (Imiquimod) cream (Meda Pharmaceuticals) or Vaseline (Unilever) control on both ears, three times weekly for 8 weeks.

Mouse kidney processing

Snap-frozen mouse kidneys were obtained from 12 female mice randomized to treatment TLR7 agonist or control (unblinded). Treated mice develop lupus-like renal disease with glomerular endocapillary proliferation, which includes proliferation of circulating immune cells that have migrated to the capillary tuft. The 10 μm cryosections from three mice (four slices from a control sample and two and four slices, respectively, from two treated samples) were successfully processed for spatial transcriptomics with Stereo-seq⁶. Remaining kidney tissue from the above samples was dissociated to single nuclei, partitioned and sequenced to generate snRNA-seq data⁴¹, yielding a matched single-nucleus and subcellular spatial transcriptomics dataset. We cluster cells in the snRNA-seq dataset using Seurat¹⁹ and annotate cell types according to top marker genes. Initial clustering identified 30 populations, shown in Extended Data Fig. 2, alongside key cell type markers. The spatial data for each sample consist of gene expression readings measured on a grid of 220 nm DNA nanoball spots with a centre-to-centre inter-spot distance of 715 nm. These data are represented by a D-dimensional expression vector (D ≈ 25,000) at each spot. Spatial analysis was restricted to a boundary region defined as the convex hull of high expression spots (Supplementary Methods 2 and Supplementary Fig. 1).

Human tissue

We used the Xenium platform (10x Genomics)¹⁷ to generate spatial data from a human kidney IgA nephropathy biopsy core obtained with the assistance of the Oxford Centre for Histopathology Research as an approved project under the Oxford Radcliffe Biobank research tissue bank ethical approval (South Central—Oxford C Research Ethics Committee: 19/C/0193). For snRNA-seq, healthy control kidney tissue was obtained from pre-implantation biopsies in four living donor kidneys through an approved project in the Oxford Transplant Biobank (South Central—Oxford C Research Ethics Committee: 19/SC/0529). All human samples were obtained with informed patient consent for research. Human tissue experiments were carried out under the University of Oxford Human Tissue Act license 12217.

Mouse brain model

We examined publicly available adult mouse spatial data profiled with Stereo-seq⁶. To ensure an unbiased comparison, for scRNA-seq integration we used the same reference mouse brain atlas⁴² as in the original spatial study and trained the TopACT local classifier to identify the PVM1 subcluster. Similar to mouse kidney, we restricted analysis to a boundary region defined as the convex hull of high expression spots. From TopACT spot classifications, we performed a binary dilation and then called PVM cells as connected components of size at least 60 spots. To validate these called cells, we examined the expression level of PVM marker genes (as defined in the same atlas⁴²) in TopACT-predicted PVM cells compared to uniformly sampled background cells.

Spatial RNA-seq

Stereo-seq was performed as previously described⁶. Capture chips were loaded with DNA nanoballs (DNB) generated by rolling circle amplification of random 25 base pair (bp) oligonucleotides. Single end sequencing (MGI DNBSEQ-Tx) was performed to determine the DNB coordinate identity at each spatial location on the chip, followed by ligation of 22 bp polyT and 10 bp molecular identity oligos to the DNB. The 10 μm kidney tissue sections were cryosectioned from optimal cutting temperature (OCT) embedded frozen blocks and adhered to the chip surface, fixed in methanol, stained with nucleic acid dye (Thermo Fisher Scientific, Q10212) for imaging and incubated at 37 °C with 0.1% pepsin (Sigma, P7000) for 12 min to permeabilize. After permeabilization, we performed reverse transcription and cDNA amplification and the Agilent 2100 was used to check the range of cDNA fragments. The cDNA was interrupted by inhouse Tn5 transposase and amplified and the fragments double-selected. After screening, libraries were subjected to Agilent 2100 quality inspection. Finally, the double-selection libraries were constructed into libraries suitable for the MGI DNBSEQ-Tx sequencing platform through circularization steps and were sequenced to collect data (50 bp for read 1 and 100 bp for read 2).

For Xenium (10X Genomics) on formalin fixed paraffin embedded human renal biopsy blocks, tissue was sectioned and 5 μm slides prepared following the manufacturer-recommended protocol (CG000580) onto Xenium slides using the predesigned 377 gene 10X Human Multi-Tissue and Cancer Xenium Pre-Designed Gene Expression Panel (10x, 1000626). Padlock probes were incubated on the tissue overnight before rolling circle amplification and chemical autofluorescence quenching. Slides were imaged on a Xenium Analyzer machine. Cell segmentation, gene transcript by cell and transcript by tissue location data matrices were generated by the Xenium Onboard Analysis pipeline. Supervised cell clustering was performed in Seurat¹⁹ by finding a set of anchors between the healthy human snRNA-seq dataset and the per cell Xenium data and using these to transfer labels to the Xenium segmentation defined cells.

For TopACT, transcripts with Q-score of 20 or more were selected and binned to 1 μm resolution before processing with the standard TopACT pipeline with a minimum radius of 2 μm, a maximum radius of 5 μm and a confidence level of 0.9. The local classifier was trained from the paired snRNA-seq dataset using the procedure detailed in Supplementary Methods 2A.

snRNA-seq

Tissue from the same frozen organs used for mouse kidney spatial transcriptomics was used to perform complementary snRNA-seq. Single nuclei were isolated as previously described with minor modifications⁴³. In brief, kidney tissues were placed into a 2 ml Dounce homogenizer (Sigma) with 2 ml of prechilled homogenization buffer (10 mM Tris pH 8.0 (Thermo Fisher), 250 mM sucrose (Sigma), 1% BSA (Sangon Biotech), 5 mM MgCl₂ (Thermo Fisher), 25 mM KCl (Thermo Fisher), 0.1 mM dithiothreitol (Thermo Fisher), 1X protease inhibitor cocktail (Roche), 0.4 U μl⁻¹ of RNase inhibitor (MGI), 0.1% NP40 (Roche)). After incubation on ice for 10 min, tissues were homogenized by ten strokes of the loose pestle A and filtered with 70 μm cell strainer (Falcon). The homogenate was further homogenized with ten strokes by tight pestle B, filtered using 30 μm cell strainer (Sysmex) into 15 ml conical tube and centrifuged at 500g for 5 min at 4 °C. The pellet was resuspended in 1 ml of blocking buffer (PBS (Thermo Fisher), 1% BSA, 0.2 U μl⁻¹ of RNase inhibitor) and centrifuged at 500g for 5 min; this step was repeated once. The pellet was resuspended using cell resuspension buffer (MGI) at concentration of 1,000 nuclei per μl for further library preparation. The snRNA-seq libraries were prepared using DNBelab C Series Single-Cell Library Prep Set (MGI, no. 1000021082)⁴⁴ Droplets were generated from a single nuclei suspension, followed by emulsion breakage, bead collection, reverse transcription and cDNA amplification to generate barcoded libraries. Indexed libraries were constructed following the manufacturer’s protocol, quantified using Qubit ssDNA Assay Kit (Thermo Fisher Scientific, Q10212) and sequenced using DNBSEQ-T1 at the China National GeneBank (Shenzhen, China) with read length 41 bp for read 1, 100 bp for read 2 and 10 bp for sample index.

For human snRNA-seq, single nuclei were isolated from fresh or liquid nitrogen flash-frozen renal biopsies using the 10X Genomics Chromium Nuclei Isolation Kit with RNase Inhibitor 16 rxns (PN-1000494), following the kit protocol with the following modifications, 0.2 U μl⁻¹ of supplemental RNase inhibitor was added to the lysis buffer and debris removal buffer, polypropylene 1.5 ml Eppendorf collection tubes were coated with 10% BSA (MACS BSA Stock Solution, Miltenyi Biotec, 130-091-376) overnight before use and the final nuclei suspension was filtered through a 40 μm FLOWMI cell strainer (SP Bel-Art, 136800040). Samples with more than 1 million nuclei were then flow cytometry sorted to clean up the sample using Sytox-7AAD Live dead staining (Invitrogen S10349). Samples with less than 1 million nuclei after washing were filtered a second time with the 40 μm FLOWMI cell strainer. Gene libraries from isolated human renal single nuclei were constructed with droplet-based scRNA-seq using the Chromium Next GEM Single Cell 3′ GEM, Library & Gel Bead Kit v.3.1, 4 rxns (PN-1000128). Single nuclei were loaded at 1,000 nuclei per μl for a targeted yield of 10,000 nuclei per sample. Libraries were sequenced on a NovoSeq6000 (Illumina) at the Oxford Genomics Centre or with Novogene. Runs were demultiplexed and the resulting fastq files processed through the 10X Genomics Cellranger pipeline. Filtered gene matrix data were then analysed in R using the Seurat package.

Single-nucleus clustering

The snRNA-seq data were analysed using Seurat v.4.0.2 (mouse) or 4.4.0 (human)¹⁹. Mouse nuclei were filtered on gene count less than 500 or greater than 3,500 and mitochondrial percentage greater than 5. Human nuclei were filtered and selected for gene features greater than 500 and less than 6,000, gene count less than 25,000 and mitochondrial percentage less than 25. Data were log normalized, variable features identified and linear transformation scaling performed. Principal component analysis dimensionality reduction was run before the human snRNA-seq data were Harmony⁴⁵ integrated to remove batch effects. The first 30 principal components were selected and clusters identified using the ‘FindClusters’ method in Seurat with a resolution of 0.6 (mouse) or 0.5 (human). The ‘FindAllMarkers’ function was used to identify genes that characterized each cluster and differential expression of genes was tested between clusters. Cluster annotation was performed manually on the basis of the top markers, applying knowledge of renal physiology with reference to the literature. Visualizations of the annotated snRNA-seq dataset in the form of UMAP and violin plots are available in Extended Data Figs. 1 (human) and 2 (mouse).

Mouse kidney immune subclustering

To investigate the subpopulations of TopACT-identified immune cells in mouse kidney, we performed a supervised annotation of these cells based on snRNA-seq subclusters using Sonar⁴⁶. Annotation resolved subpopulations of B cells, dendritic cells, macrophages and T cells. Examining the proportions of these subpopulations across samples (Extended Data Fig. 5a) shows expansion of T cell, dendritic cell and macrophage populations in IMQ-treated kidneys. Annotating each immune cell according to its proximity to glomeruli further shows acquisition of T cells, dendritic cells and macrophage cells within the glomeruli themselves (Extended Data Fig. 5b). Extended Data Fig. 5c shows the expression of cell type-specific markers by immune subset and condition (IMQ-treated or control). We visualize the spatial distribution of these subpopulations for representative samples in Extended Data Fig. 6. This visualization shows enrichment of T cells (top row) and macrophages (middle row) but not B cells (bottom row), in glomeruli, consistent with multiplex immunofluorescence results in Fig. 5.

Comparison to ssDNA-based segmentation

We used ssDNA imaging of the mouse kidney Stereo-seq samples to compute cell segmentation based on cell nucleus locations, as previously described⁶. For further validation, we compared TopACT-called cell loci for immune and podocyte cells with these cell bins on a single representative IMQ-treated sample. A TopACT-predicted cell was assigned to an ssDNA bin if its centre was in the bin itself or if the centre-to-centre distance of the bin was sufficiently close (within ten spots, 7.15 μm), the latter case dealing with the scenario of two cell bins with overlapping boundaries but non-coincident cell centres. We found that 110 out of 137 (80%) of TopACT-predicted immune cells and 46 out of 50 (92%) of TopACT-predicted podocyte cells coincided with an ssDNA bin. Extended Data Fig. 3 shows the cell bins annotated according to the assigned TopACT cell type. Only three ssDNA bins were found to coincide with more than one TopACT-predicted cell, providing further evidence that TopACT predictions correspond to ground truth cells. Visual inspection of these three examples is suggestive of the underlying ssDNA bins being doublets.

Multiplex immunofluorescence

Multiplex immunofluorescence staining was performed on 4-μm-thick formalin fixed paraffin embedded sections according to the OPAL protocol (Akoya Biosciences) on the Leica BOND RXm auto stainer (Leica Microsystems). Six staining cycles were carried out using the following primary antibody–Opal fluorophore pairs; thereafter the sections were counterstained with diamidino-2-phenylindole (DAPI; FP1490, Akoya Biosciences): Ly6G/Ly6C (Gr-1) 1:400 (MAB1037-SP; R&D Systems)–Opal 480 1:150; CD4 1:500 (ab183685; Abcam)–Opal 520 1:150; CD8 1:800 (98941; Cell Signaling)–Opal 520 1:150; CD68 1:1,200 (ab125212; Abcam)–Opal 570 1:150; CD19 1:600 (90176; Cell Signaling)–Opal 620 1:150; CD11b 1:80,000 (ab133357; Abcam)–Opal 690 1:150; and E-cadherin 1:500 (3195; Cell Signaling)–Opal 780 1:25.

Samples were deparaffinized and rehydrated according to Leica BOND Rx protocol. Antigen retrieval was carried out at 100 °C for 20 min, tissue sections were incubated in either Epitope Retrieval solution 1 or 2, before the application of each primary antibody. Sections were incubated with primary antibody for 1 h; thereafter, Opal fluorophores were substituted for 3,3′‐diaminobenzidine (DAB) and added using the BOND Polymer Refine Detection System, with a 10 min incubation time. After completion of all cycles of staining, sections were counterstained with DAPI and mounted with VECTASHIELD Vibrance Antifade Mounting Medium (H-1700-10; Vector Laboratories). Slides were scanned and multispectral images of tissue sections obtained using the Akoya Bioscience Vectra PolarisTM. Analysis of the images was carried out using Zeiss Arivis v.4.1.1 (Zeiss). Mean intensity was measured across glomerular regions, manually defined on the basis of E-cadherin and nuclei and non-glomerular regions comprising the whole field of view with the glomerular regions subtracted. Mean intensities for each channel were summed per glomerular and non-glomerular region to calculate the total ‘immune’ intensity per glomerular region. Ratios were calculated per glomerulus using the mean intensity which was calculated by subtracting the mean intensity of a central glomerular region from the mean intensity of the whole glomerular region.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

June 19, 2024
Single-cell and spatial atlases of spinal cord injury in the Tabulae Paralytica

[ad_1]
Sofroniew, M. V. Dissecting spinal cord regeneration. Nature 557, 343–350 (2018).

Article
ADS
CAS

Google Scholar
Anderson, M. A. et al. Natural and targeted circuit reorganization after spinal cord injury. Nature Neurosci. 25, 1584–1596 (2022).
Courtine, G. & Sofroniew, M. V. Spinal cord repair: advances in biology and technology. Nat. Med. 25, 898–908 (2019).

Article
CAS

Google Scholar
Ahuja, C. S. et al. Traumatic spinal cord injury. Nat. Rev. Dis. Primers 3, 17018 (2017).

Article

Google Scholar
Rowald, A. et al. Activity-dependent spinal cord neuromodulation rapidly restores trunk and leg motor functions after complete paralysis. Nat. Med. 28, 260–271 (2022).

Article
CAS

Google Scholar
Squair, J. W. et al. Neuroprosthetic baroreflex controls haemodynamics after spinal cord injury. Nature 590, 308–314 (2021).

Article
ADS
CAS

Google Scholar
Wagner, F. B. et al. Targeted neurotechnology restores walking in humans with spinal cord injury. Nature 563, 65–71 (2018).

Article
ADS
CAS

Google Scholar
Kathe, C. et al. The neurons that restore walking after paralysis. Nature 611, 540–547 (2022).
Popovich, P. G., Wei, P. & Stokes, B. T. Cellular inflammatory response after spinal cord injury in Sprague-Dawley and Lewis rats. J. Comp. Neurol. 377, 443–464 (1997).

Article
CAS

Google Scholar
Anderson, M. A. et al. Astrocyte scar formation aids central nervous system axon regeneration. Nature 532, 195–200 (2016).

Article
ADS
CAS

Google Scholar
Squair, J. W., Gautier, M., Sofroniew, M. V., Courtine, G. & Anderson, M. A. Engineering spinal cord repair. Curr. Opin. Biotech. 72, 48–53 (2021).

Article
CAS

Google Scholar
Asboth, L. et al. Cortico-reticulo-spinal circuit reorganization enables functional recovery after severe spinal cord contusion. Nat. Neurosci. 21, 576–588 (2018).

Article
CAS

Google Scholar
Scheff, S. W., Rabchevsky, A. G., Fugaccia, I., Main, J. A. & Lumpp, J. E. Experimental modeling of spinal cord injury: characterization of a force-defined injury device. J. Neurotraum. 20, 179–193 (2003).

Article

Google Scholar
Dusart, I. & Schwab, M. E. Secondary cell death and the inflammatory reaction after dorsal hemisection of the rat spinal cord. Eur. J. Neurosci. 6, 712–724 (1994).

Article
CAS

Google Scholar
Zappia, L. & Oshlack, A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. Gigascience 7, giy083 (2018).

Article

Google Scholar
Milich, L. M. et al. Single-cell analysis of the cellular heterogeneity and interactions in the injured mouse spinal cord. J. Exp. Med. 218, e20210040 (2021).

Article
CAS

Google Scholar
Sofroniew, M. V. & Vinters, H. V. Astrocytes: biology and pathology. Acta Neuropathol. 119, 7–35 (2010).

Article

Google Scholar
Faulkner, J. R. et al. Reactive astrocytes protect tissue and preserve function after spinal cord injury. J. Neurosci. 24, 2143–2155 (2004).

Article
CAS

Google Scholar
Herrmann, J. E. et al. STAT3 is a critical regulator of astrogliosis and scar formation after spinal cord injury. J. Neurosci. 28, 7231–7243 (2008).

Article
CAS

Google Scholar
Wanner, I. B. et al. Glial scar borders are formed by newly proliferated, elongated astrocytes that interact to corral inflammatory and fibrotic cells via STAT3-dependent mechanisms after spinal cord injury. J. Neurosci. 33, 12870–12886 (2013).

Article
CAS

Google Scholar
Zheng, S. C. et al. Universal prediction of cell-cycle position using transfer learning. Genome Biol. 23, 41 (2022).

Article
CAS

Google Scholar
Munji, R. N. et al. Profiling the mouse brain endothelial transcriptome in health and disease models reveals a core blood–brain barrier dysfunction module. Nat. Neurosci. 22, 1892–1902 (2019).

Article
CAS

Google Scholar
Zeisel, A. et al. Molecular architecture of the mouse nervous system. Cell 174, 999–1014.e22 (2018).

Article
CAS

Google Scholar
Yasuda, K. et al. Drug transporters on arachnoid barrier cells contribute to the blood–cerebrospinal fluid barrier. Drug Metab. Dispos. 41, 923–931 (2013).

Article
CAS

Google Scholar
Dewar, D., Underhill, S. M. & Goldberg, M. P. Oligodendrocytes and ischemic brain injury. J. Cereb. Blood Flow Metabol. 23, 263–274 (2002).

Article

Google Scholar
Petracca, Y. L. et al. The late and dual origin of cerebrospinal fluid-contacting neurons in the mouse spinal cord. Development 143, 880–891 (2016).

CAS

Google Scholar
Vígh, B. et al. The system of cerebrospinal fluid-contacting neurons. Its supposed role in the nonsynaptic signal transmission of the brain. Histol. Histopathol. 19, 607–628 (2004).

Google Scholar
Squair, J. W. et al. Recovery of walking after paralysis by regenerating characterized neurons to their natural target region. Science 381, 1338–1345 (2023).
Courtine, G. et al. Recovery of supraspinal control of stepping via indirect propriospinal relay connections after spinal cord injury. Nat. Med. 14, 69–74 (2008).

Article
CAS

Google Scholar
Skinnider, M. A. et al. Cell type prioritization in single-cell data. Nat. Biotechnol. 39, 30–34 (2021).

Article
CAS

Google Scholar
Squair, J. W., Skinnider, M. A., Gautier, M., Foster, L. J. & Courtine, G. Prioritization of cell types responsive to biological perturbations in single-cell data with Augur. Nat. Protoc. 16, 3836–3873 (2021).
Renthal, W. et al. Transcriptional reprogramming of distinct peripheral sensory neuron subtypes after axonal injury. Neuron 108, 128–144.e9 (2020).
Nguyen, M. Q., Pichon, C. E. L. & Ryba, N. Stereotyped transcriptomic transformation of somatosensory neurons in response to injury. eLife 8, e49679 (2019).

Article

Google Scholar
Cajal, S. R. Y. & May, R. M. Cajal’s Degeneration and Regeneration of the Nervous System (Oxford Univ. Press, 1991).
Sperry, R. W. Chemoaffinity in the orderly growth of nerve fiber patterns and connections. Proc. Natl Acad. Sci. USA 50, 703–710 (1963).

Article
ADS
CAS

Google Scholar
Cajal, S. R. Y. Degeneration and Regeneration of the Nervous System (Oxford Univ. Press, 1928).
Anderson, M. A. et al. Required growth facilitators propel axon regeneration across complete spinal cord injury. Nature 561, 396–400 (2018).

Article
ADS
CAS

Google Scholar
Harel, N. Y. & Strittmatter, S. M. Can regenerating axons recapitulate developmental guidance during recovery from spinal cord injury?. Nat. Rev. Neurosci. 7, 603–616 (2006).

Article
CAS

Google Scholar
Mironova, Y. A. & Giger, R. J. Where no synapses go: gatekeepers of circuit remodeling and synaptic strength. Trends Neurosci. 36, 363–373 (2013).

Article
CAS

Google Scholar
Lin, A. C. & Holt, C. E. Local translation and directional steering in axons. EMBO J. 26, 3729–3736 (2007).

Article
CAS

Google Scholar
Short, D., Masry, W. E. & Jones, P. High dose methylprednisolone in the management of acute spinal cord injury–a systematic review from a clinical perspective. Spinal Cord 38, 273–286 (2000).

Article
CAS

Google Scholar
Bracken, M. B. Efficacy of methylprednisolone in acute spinal cord injury. JAMA 251, 45 (1984).

Article
CAS

Google Scholar
Hurlbert, R. J. Methylprednisolone for acute spinal cord injury: an inappropriate standard of care. J. Neurosurg. Spine 93, 1–7 (2000).

Article
CAS

Google Scholar
Klein, S. L. & Flanagan, K. L. Sex differences in immune responses. Nat. Rev. Immunol. 16, 626–638 (2016).

Article
CAS

Google Scholar
Gal-Oz, S. T. et al. ImmGen report: sexual dimorphism in the immune system transcriptome. Nat. Commun. 10, 4295 (2019).

Article
ADS

Google Scholar
Mattucci, S. et al. Basic biomechanics of spinal cord injury—how injuries happen in people and how animal models have informed our understanding. Clin. Biomech. 64, 58–68 (2018).

Article

Google Scholar
O’Shea, T. M. et al. Lesion environments direct transplanted neural progenitors towards a wound repair astroglial phenotype in mice. Nat. Commun. 13, 5702 (2022).

Article
ADS

Google Scholar
Squair, J. W. et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 12, 5692 (2021).

Article
ADS
CAS

Google Scholar
Granja, J. M. et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 53, 403–411 (2021).

Article
CAS

Google Scholar
Kang, J. B. et al. Efficient and precise single-cell reference atlas mapping with Symphony. Nat. Commun. 12, 5890 (2021).

Article
ADS
CAS

Google Scholar
Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 (2017).

Article
CAS

Google Scholar
Schwarzschild, M. A., Cole, R. L. & Hyman, S. E. Glutamate, but not dopamine, stimulates stress-activated protein kinase and AP-1-mediated transcription in striatal neurons. J. Neurosci. 17, 3455–3466 (1997).

Article
CAS

Google Scholar
Burda, J. E. et al. Divergent transcriptional regulation of astrocyte reactivity across disorders. Nature 606, 557–564 (2022).
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

Article
CAS

Google Scholar
Christian, C. A. et al. Endogenous positive allosteric modulation of GABAA receptors by diazepam binding inhibitor. Neuron 78, 1063–1074 (2013).

Article
CAS

Google Scholar
Chung, K. & Deisseroth, K. CLARITY for mapping the nervous system. Nat. Methods 10, 508–513 (2013).

Article
CAS

Google Scholar
Olsson, A.-K., Dimberg, A., Kreuger, J. & Claesson-Welsh, L. VEGF receptor signalling? in control of vascular function. Nat. Rev. Mol. Cell Bio. 7, 359–371 (2006).

Article
CAS

Google Scholar
Li, Y. et al. Microglia-organized scar-free spinal cord repair in neonatal mice. Nature 587, 613–618 (2020).

Article
ADS
CAS

Google Scholar
Li, C. et al. Temporal and spatial cellular and molecular pathological alterations with single-cell resolution in the adult spinal cord after injury. Signal Transduct. Target Ther. 7, 65 (2022).

Article
CAS

Google Scholar
Wells, J. E. A., Hurlbert, R. J., Fehlings, M. G. & Yong, V. W. Neuroprotection by minocycline facilitates significant recovery from spinal cord injury in mice. Brain 126, 1628–1637 (2003).

Article

Google Scholar
Stirling, D. P. et al. Minocycline treatment reduces delayed oligodendrocyte death, attenuates axonal dieback, and improves functional outcome after spinal cord injury. J. Neurosci. 24, 2182–2190 (2004).

Article
CAS

Google Scholar
Festoff, B. W. et al. Minocycline neuroprotects, reduces microgliosis, and inhibits caspase protease expression early after spinal cord injury. J. Neurochem. 97, 1314–1326 (2006).

Article
CAS

Google Scholar
Rabchevsky, A. G., Fugaccia, I., Sullivan, P. G., Blades, D. A. & Scheff, S. W. Efficacy of methylprednisolone therapy for the injured rat spinal cord. J. Neurosci. Res. 68, 7–18 (2002).

Article
CAS

Google Scholar
James, N. D. et al. Chondroitinase gene therapy improves upper limb function following cervical contusion injury. Exp. Neurol. 271, 131–135 (2015).

Article
CAS

Google Scholar
Muir, E. M. et al. Modification of N-glycosylation sites allows secretion of bacterial chondroitinase ABC from mammalian cells. J. Biotechnol. 145, 103–110 (2010).

Article
CAS

Google Scholar
Bartus, K. et al. Large-scale chondroitin sulfate proteoglycan digestion with chondroitinase gene therapy leads to reduced pathology and modulates macrophage phenotype following spinal cord contusion injury. J. Neurosci. 34, 4822–4836 (2014).

Article

Google Scholar
Esposito, M. S., Capelli, P. & Arber, S. Brainstem nucleus MdV mediates skilled forelimb motor tasks. Nature 508, 351–356 (2014).

Article
ADS
CAS

Google Scholar
Kathe, C. et al. Wireless closed-loop optogenetics across the entire spinal cord in ecological environments. Nat. Biotechnol. 40, 198–208 (2022).

Article
CAS

Google Scholar
Takeoka, A., Vollenweider, I., Courtine, G. & Arber, S. Muscle spindle feedback directs locomotor recovery and circuit reorganization after spinal cord injury. Cell 159, 1626–1639 (2014).

Article
CAS

Google Scholar
Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).

Article
CAS

Google Scholar
Chung, K. et al. Structural and molecular interrogation of intact biological systems. Nature 497, 332–337 (2013).

Article
ADS
CAS

Google Scholar
Tomer, R., Ye, L., Hsueh, B. & Deisseroth, K. Advanced CLARITY for rapid and high-resolution imaging of intact tissues. Nat. Protoc. 9, 1682–1697 (2014).

Article
CAS

Google Scholar
Voigt, F. F. et al. The mesoSPIM initiative: open-source light-sheet microscopes for imaging cleared tissue. Nat. Methods 16, 1105–1108 (2019).

Article
CAS

Google Scholar
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

Article
ADS
CAS

Google Scholar
Caglayan, E., Liu, Y. & Konopka, G. Neuronal ambient RNA contamination causes misinterpreted and masked cell types in brain single-nuclei datasets. Neuron 110, 4043–4056.e5 (2022).

Article
CAS

Google Scholar
Fleming, S. J. et al. Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender. Nat. Methods 20, 1323–1335 (2023).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

Article
CAS

Google Scholar
McGinnis, C. S., Murrow, L. M. & Gartner, Z. J. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst 8, 329–337.e4 (2019).

Article
CAS

Google Scholar
Germain, P.-L., Lun, A., Meixide, C. G., Macnair, W. & Robinson, M. D. Doublet identification in single-cell sequencing data using scDblFinder. F1000Res. 10, 979 (2021).

Article

Google Scholar
Bais, A. S. & Kostka, D. scds: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics 36, 1150–1158 (2019).

Article

Google Scholar
Xi, N. M. & Li, J. J. Benchmarking computational doublet-detection methods for single-cell RNA sequencing data. Cell Syst. 12, 176–194.e6 (2021).

Article
CAS

Google Scholar
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).

Article
CAS

Google Scholar
Osseward, P. J. et al. Conserved genetic signatures parcellate cardinal spinal neuron classes into local and projection subsets. Science 372, 385–393 (2021).

Article
ADS
CAS

Google Scholar
Häring, M. et al. Neuronal atlas of the dorsal horn defines its architecture and links sensory input to transcriptional cell types. Nat. Neurosci. 21, 869–880 (2018).

Article

Google Scholar
Alkaslasi, M. R. et al. Single nucleus RNA-sequencing defines unexpected diversity of cholinergic neuron types in the adult mouse spinal cord. Nat. Commun. 12, 2471 (2021).

Article
ADS
CAS

Google Scholar
Blum, J. A. et al. Single-cell transcriptomic analysis of the adult mouse spinal cord reveals molecular diversity of autonomic and skeletal motor neurons. Nat. Neurosci. 24, 572–583 (2021).

Article
CAS

Google Scholar
Delile, J. et al. Single cell transcriptomics reveals spatial and temporal dynamics of gene expression in the developing mouse spinal cord. Development 146, dev173807 (2019).
Hamel, R. et al. Time-resolved single-cell RNAseq profiling identifies a novel Fabp5-expressing subpopulation of inflammatory myeloid cells in chronic spinal cord injury. Preprint at bioRxiv https://doi.org/10.1101/2020.10.21.346635 (2020).
Hayashi, M. et al. Graded arrays of spinal and supraspinal v2a interneuron subtypes underlie forelimb and hindlimb motor control. Neuron 97, 869–884.e5 (2018).

Article
CAS

Google Scholar
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).

Article
ADS
CAS

Google Scholar
Sathyamurthy, A. et al. Massively parallel single nucleus transcriptional profiling defines spinal cord neurons and their activity during behavior. Cell Rep. 22, 2216–2225 (2018).

Article
CAS

Google Scholar
Wahane, S. et al. Diversified transcriptional responses of myeloid and glial cells in spinal cord injury shaped by HDAC3 activity. Sci. Adv. 7, eabd8811 (2021).

Article
ADS
CAS

Google Scholar
Baek, M., Menon, V., Jessell, T. M., Hantman, A. W. & Dasen, J. S. Molecular logic of spinocerebellar tract neuron diversity and connectivity. Cell Rep. 27, 2620–2635.e4 (2019).

Article
CAS

Google Scholar
Vanlandewijck, M. et al. A molecular atlas of cell types and zonation in the brain vasculature. Nature 554, 475–480 (2018).

Article
ADS
CAS

Google Scholar
Marques, S. et al. Oligodendrocyte heterogeneity in the mouse juvenile and adult central nervous system. Science 352, 1326–1329 (2016).

Article
ADS
CAS

Google Scholar
O′Shea, T. M. et al. Border-forming wound repair astrocytes. Preprint at bioRxiv, https://doi.org/10.1101/2023.08.25.554857 (2023).
Liddelow, S. A. et al. Neurotoxic reactive astrocytes are induced by activated microglia. Nature 541, 481–487 (2017).
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Hou, W. & Ji, Z. Palo: spatially aware color palette optimization for single-cell and spatial data. Bioinformatics 38, 3654–3656 (2022).
Koopmans, F. et al. SynGO: an evidence-based, expert-curated knowledge base for the synapse. Neuron 103, 217–234.e4 (2019).

Article
CAS

Google Scholar
Büttner, M., Ostner, J., Müller, C. L., Theis, F. J. & Schubert, B. scCODA is a Bayesian model for compositional single-cell data analysis. Nat. Commun. 12, 6876 (2021).

Article
ADS

Google Scholar
Phipson, B. et al. propeller: testing for differences in cell type proportions in single cell data. Bioinformatics 38, 4720–4726 (2022).

Article
CAS

Google Scholar
Simmons, S. Cell type composition analysis: comparison of statistical methods. Preprint at bioRxiv https://doi.org/10.1101/2022.02.04.479123 (2022).
Zimmerman, K. D., Espeland, M. A. & Langefeld, C. D. A practical solution to pseudoreplication bias in single-cell studies. Nat. Commun. 12, 738 (2021).

Article
ADS
CAS

Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

Article
CAS

Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

Article

Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008).

Article

Google Scholar
Fornes, O. et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48, D87–D92 (2020).

CAS

Google Scholar
Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-00830-w (2021).
McKenzie, A. T., Katsyv, I., Song, W.-M., Wang, M. & Zhang, B. DGCA: a comprehensive R package for differential gene correlation analysis. BMC Syst. Biol. 10, 106 (2016).

Article

Google Scholar
Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18, 1352–1362 (2021).

Article

Google Scholar

[ad_1]

scGRO–seq conceptualization

Capturing nascent RNA with sufficient efficiency from single cells for meaningful analysis was deemed challenging. However, recognizing the potential insights into transcription mechanisms that single-cell nascent RNA sequencing could offer, we set out to develop a single-cell version of the GRO–seq method a decade after its use in cell populations. Our efforts were met with two significant challenges: selectively capturing a small fraction of nascent RNA among various RNA species within a cell and accurately distinguishing nascent RNAs from individual cells.

The primary limitation we encountered was capture efficiency. The quantity of nascent RNA from transcribing RNA polymerases in an individual cell, mainly due to the intermittent nature of transcription with short bursts and long latency periods, is significantly lower than the mRNA copies that accumulate over time. Traditional nascent RNA capture methods yield only a meagre number of nascent RNAs from single cells. Miniaturizing GRO–seq using strategies derived from scRNA-seq was not feasible because nascent RNA lacks the consensus polyadenylation sequence used in RNA-seq. Instead, GRO–seq and related methods selectively label nascent RNA in bulk cells using modified nucleotides and use single-stranded RNA–RNA ligation with PCR handles on both ends. This ligation process proved unsuitable for scGRO–seq owing to its low efficiency and the need for nascent RNA purification before ligation, which risks depleting the already scarce nascent RNA from single cells.

To overcome these challenges, we devised a strategy that involved labelling nascent RNA in cells and attaching single-cell barcodes to the labelled nascent RNA without requiring purification from other cellular RNA. After exploring several approaches without success, we turned to click chemistry, specifically CuAAC. We speculated that by sourcing or synthesizing CuAAC-compatible chain-terminating nucleotide triphosphate analogues and performing nuclear run-on with the modified nucleotides to selectively label nascent RNA, we could label nascent RNA from individual cells with 5′-AzScBc DNA with a PCR handle. Then, we could pool the barcoded nascent RNA from multiple cells for selective RT in the presence of a TSO and subsequent PCR amplification for sequencing.

To successfully implement this strategy, we identified three important biochemical hurdles to address. First, we needed to demonstrate the ability of native RNA polymerase to incorporate 3′-(O-propargyl)-NTPs during nuclear run-on reactions. Second, preserving the intactness of nuclei during the run-on reaction was essential to enable the separation of individual nuclei for single-cell barcoding. Finally, we had to confirm the ability of reverse transcriptase to traverse the triazole ring junction formed during CuAAC. Successful resolution of the first and third hurdles would pave the way for CuAAC-based nascent RNA sequencing in cell populations, whereas overcoming the second hurdle would establish the foundation for scGRO–seq.

Development of AGTuC

To develop a nascent RNA tagging method suitable for capturing a small fraction of RNA from single cells, we initiated our approach by focusing on a cell-population-based strategy. We aimed to develop an enhanced nascent RNA tagging method that optimally integrates selective labelling and single-cell barcode tagging, bypassing the need for RNA purification. Among the tested methods, we identified click chemistry as the most suitable option because of its high selectivity, efficiency, robustness in diverse experimental conditions, cost-effectiveness and speed. Our goal was to selectively label nascent RNA through a nuclear run-on reaction, conjugate a single-stranded DNA PCR handle (that can accommodate a single-cell barcode for future use in single-cell analysis), reverse transcribe the RNA–DNA conjugate and prepare a NGS library.

To achieve single-nucleotide resolution of transcribing polymerases and efficient RT, we identified two click-chemistry-compatible, chain-terminating nucleotides with a relatively small functional group: 3′-(O-propargyl)-ATP and 3′-azido-3′-dATP (Extended Data Fig. 1a). Nascent RNA labelled with 3′-(O-propargyl)-NTPs forms a 1,4-disubstituted 1,2,3-triazole junction with azide-labelled DNA through CuAAC, as shown in Click-Code-Seq⁵⁰, whereas nascent RNA labelled with 3′-azido-3′-dNTPs forms a slightly bulkier junction with dibenzocyclooctyne labelled DNA through strain-promoted alkyne-azide cycloadditions (Extended Data Fig. 1b). Nuclear run-on with 3′-(O-propargyl)-ATP and CuAAC showed superior efficiency compared with 3′-azido-3′-dATP and strain-promoted alkyne-azide cycloadditions (Extended Data Fig. 1c).

To convert the clicked RNA–DNA conjugate to cDNA, we tested eight different reverse transcriptase enzymes, varied the temperature and duration of RT and evaluated three TSOs (Extended Data Fig. 1d–f, some results not shown). Our optimized method, which we AGTuC, was then performed in 5 million mouse ES cell nuclei. AGTuC nascent RNA profiles closely resembled PRO–seq profiles (Extended Data Fig. 2a) and exhibited strong correlations at both gene and enhancer levels (Extended Data Fig. 2b,c). Notably, the AGTuC library protocol involved significantly fewer steps than PRO–seq and could be completed in a single day (Extended Data Fig. 2d). AGTuC is a simpler, faster and cheaper alternative to GRO–seq and PRO–seq for nascent RNA sequencing from cell populations.

Development of inAGTuC

To adapt CuAAC-mediated nascent RNA sequencing to single cells, we explored the feasibility of performing AGTuC in single cells. Implementing AGTuC at the single-cell level presented challenges as the nuclear run-on reaction with 0.5% sarkosyl disrupts the nuclear membrane before cell barcodes could be attached during the post-run-on CuAAC step, which leads to unintended mixing of nascent RNA from different cells. One potential solution was to perform AGTuC in single tubes, which would prevent nascent RNA mixing. However, this approach requires RNA purification after the run-on reaction, but purification results in further depletion of exceedingly low amounts of nascent RNA in single cells. Alternatively, omitting RNA purification would lead to an abundance of 3′-(O-propargyl)-NTPs supplied in excess during the run-on reaction, which could outcompete 5′-AzScBc DNA during CuAAC.

To address this challenge, we developed inAGTuC, a new strategy that enables labelling nascent RNA with 3′-(O-propargyl)-NTPs while preserving nuclear integrity. This approach overcomes the issues associated with nascent RNA mixing before single-cell barcoding. We proposed that performing the run-on reaction without disrupting the nuclear membrane would facilitate the easy removal of excess nucleotides through a few centrifugation and resuspension steps while retaining propargyl-labelled nascent RNA within the nuclei. This approach would produce clean nuclei with labelled nascent RNA, free from excess reactive nucleotides, which could be compartmentalized with 5′-AzScBc DNA for CuAAC. We could minimize further RNA loss by pooling and processing the single-cell-barcoded nascent RNA from multiple cells.

To achieve an efficient run-on reaction, PRO–seq and AGTuC disrupt the polymerase complex with 0.5% sarkosyl detergent, of which nuclear membrane lysis is collateral damage. We sought to identify the lowest sarkosyl concentration that maintains nuclear membrane integrity while maximizing run-on efficiency and found that a 20× reduction in sarkosyl concentration preserved nuclear intactness, with only a 20% reduction in run-on efficiency (Extended Data Fig. 3a,b). To maximize the capture efficiency of nascent RNA, we optimized the molecular crowding effect of PEG 8000 and the ratio of Cu(I) to the CuAAC accelerating ligand BTTAA (Extended Data Fig. 3c). Although a low sarkosyl concentration preserves nuclear integrity, it also retains the RNA polymerase complex intact, thereby shielding the propargyl-labelled 3′ end of nascent RNA from reacting with 5′-AzScBc DNA. We investigated nascent RNA release from the RNA polymerase complex using common denaturants and found that 6 M urea and TRIzol was efficient (Extended Data Fig. 3d). However, the denaturant in TRIzol hindered CuAAC reaction (Extended Data Fig. 3e). Notably, urea also offered the added benefit of retaining the RNA–DNA conjugate in the aqueous phase during TRIzol clean-up to remove PEG 8000 from the CuAAC reaction (Extended Data Fig. 3f). For reaction clean-up, we assessed various methods, finding cellulose membrane to be effective in removing CuAAC reagents (Extended Data Fig. 3g), whereas silica matrix columns performed well in retaining RNA and ssDNA (Extended Data Fig. 3h). Subsequently, we evaluated DNA polymerase for library preparation and DNA size-selection methods (Extended Data Fig. 3i,j).

Considering the goal of working with single cells, we performed inAGTuC with cell numbers between 5 million used in AGTuC and 1 cell planned for scGRO–seq. Specifically, we placed 100 to 1,000 intact nuclei in each well of a 96-well plate containing urea. Nascent RNA in each well was barcoded with a unique 5′-AzScBc DNA by CuAAC and pooled from the 96 wells, and a sequencing library was prepared as in AGTuC. The inAGTuC libraries exhibited similar profiles in gene bodies compared with PRO–seq and AGTuC. However, they could not capture the paused peaks at the 5′ end of genes and enhancers (Extended Data Fig. 4a–c). This observation is consistent with the need for a higher sarkosyl concentration for efficient run-on of paused polymerase complexes. The four inAGTuC libraries correlated well with each other (Extended Data Fig. 4d), with the potential to discover more insights with deeper sequencing (Extended Data Fig. 4e,f). Despite only partially capturing nascent RNA from a paused complex, the inAGTuC libraries correlated well with those from AGTuC and PRO–seq (Extended Data Fig. 4g).

To systematically characterize the compatibility of inAGTuC with even fewer cells, we prepared four inAGTuC libraries in a 96-well plate, with 12 c.p.w., 120 c.p.w. and 1,200 c.p.w., which is roughly equivalent to 1,000, 10,000 and 100,000 nuclei, respectively. We also included a 1,200 c.p.w. plate, omitting Cu(I) as a negative control. Despite lower coverage, the inAGTuC library with 12 c.p.w. (total of about 1,000 cells) successfully captured the overall nascent RNA profile. It exhibited a good correlation with 120 c.p.w. (total of about 10,000 cells) and 1,200 c.p.w. (total of around 100,000 cells) (Extended Data Fig. 5a–c).

3′-(O-propargyl)-nucleotide synthesis

For this study, several CuAAC-compatible nucleotide analogues modified with azide or alkyne functionalities were evaluated. Ultimately, 3′-(O-propargyl)-NTPs were selected for three main reasons: (1) these analogues lack 3′ hydroxyl groups, making them chain-terminating and enabling single-nucleotide resolution of the 3′ end of nascent RNA; (2) the CuAAC reaction produces a compact junction due to the presence of a single carbon bond between the sugar group of the nucleotide and the propargyl group at the 3′ end position; and (3) they are relatively cost-effective compared with biotin-modified nucleotides commonly used in PRO–seq.

3′-(O-Propargyl)-ATP (NU-945) was offered by Jena Biosciences. To complete the set, custom synthesis requests were made for 3′-(O-propargyl)-CTP (NU-947), 3′-(O-propargyl)-GTP (NU-946) and 3′-(O-propargyl)-UTP (NU-948), all of which are now available for purchase from Jena Biosciences.

Single-cell barcoded DNA adaptors

During scGRO–seq development, 3 sets of 96 5′-AzScBc DNA were synthesized by GeneLink. Each design encompassed four components: a 5′ azide positioned at the 5′ terminus, a 10–12 nucleotide sequence for the single-cell barcode, a 4–6 nucleotide sequence for the UMI and a PCR handle. The 5′ azide modification was obtained following a previously described method⁵¹. Specifically, an oligonucleotide containing 5′ iodo-dT was synthesized through solid-support phosphoramidite oligonucleotide synthesis, and subsequent replacement of the iodo group with an azide group was achieved through a reaction with sodium azide at 60 °C for 1 h. The sequences of three different 5′-AzScBc DNA are available in Supplementary Table 7.

The hairpin structure of the 86-nucleotide 5′-AzScBc DNA (Supplementary Fig. 3a) is formed through self-folding. The RT process is initiated using the 3′ end of the oligonucleotide, which serves as a built-in primer. This design ensures a 1:1 stoichiometry between the PCR handle and the RT primer, minimizing mispriming and nonspecific amplification during RT. The folded hairpin structure also generates a restriction site for the EagI enzyme, which is digested before PCR amplification.

Undesired extension by reverse transcriptase is effectively prevented by a three-carbon spacer at the 3′ end of the 43-nucleotide 5′-AzScBc DNA⁵². This version of the azide adaptor harbours a 5-nucleotide ACAGG sequence after the azide-dT at its 5′ end (Supplementary Fig. 3b). During RT, the extension of primers annealing to unclicked 5′-AzScBc, the addition of non-templated CCC and the incorporation of TSO results in undesired cDNA that are preferred substrates for PCR amplification. If unaddressed, these amplicons can overwhelm the sequencing library. The ACAGG sequence plays a crucial role in depleting these PCR amplicons.

A previously described method named DASH uses recombinant Cas9 protein and gRNA complex to digest and deplete undesired dsDNA⁵³. The ACAGG sequence is necessary to generate a gRNA target sequence in the undesired PCR amplicons (underlined sequence). In PCR amplicons formed between nascent RNA and 5′-AzScBc DNA, the complementation of gRNA is interrupted by the presence of a nascent RNA sequence, which makes the desired products incompatible with DASH. AGG serves as the protospacer adjacent motif.

Cell line

The V6.5 mouse ES cells used in this study were established by the Jaenisch Laboratory (Whitehead Institute, Massachusetts Institute of Technology) from the inner cell mass of a 3.5-day-old mouse embryo from a C57BL/6(F) × 129/sv(M) cross.

Cell culture

Mouse ES cells were cultured in Dulbecco’s modified Eagle medium (Gibco, 11995), plus 10% fetal bovine serum (HyClone, SH30070.03), supplemented with 1× penicillin–streptomycin (Gibco, 15140), 1× non-essential amino acids (Gibco, 1140), 1× l-glutamine (Gibco, 25030), 1× β-mercaptoethanol (Sigma, M6250) and 1,000 U ml^–1 leukaemia inhibitory factor (Sigma, ESG1107) on tissue-culture-treated 10 cm plates (Corning, CLS430167) pre-coated with 0.2% gelatin (Sigma, G1890) prepared in PBS (Fisher, MT21031CV). Cells were grown at 37 °C and 5% CO₂ and passed with HEPES buffered saline solution (Lonza, CC-5024) and 0.25% trypsin-EDTA (Gibco, 25200) when 70% confluency was reached (every 2 days).

Sample preparation

Tissue culture cells were prepared for nuclear run-on reaction by either nuclei isolation or cell permeabilization as described below. All centrifugation steps were performed at 1,000g for 5 min. Cells were collected by removing the tissue culture medium, rinsing with PBS and placing the plates on ice. Cells were scraped while still on ice. The cells were collected into a 15 ml conical tube and centrifuged at 1,000g for 5 min.

For nuclei isolation, the pellet was resuspended in ice-cold douncing buffer (10 mM Tris-Cl pH 7.4, 300 mM sucrose, 3 mM CaCl₂, 2 mM MgCl₂, 0.1% Triton X-100, 0.5 mM DTT, 0.1× Halt protease inhibitor and 0.02 U µl^–1 RNase inhibitor) and transferred to a 7 ml dounce homogenizer (Wheaton, 357542). After incubation on ice for 5 min, the cells were dounced 25 times with a tight pestle, transferred back to the 15 ml conical tube and centrifuged to pellet the nuclei. The pellet was washed twice in a douncing buffer.

For cell permeabilization, the pellet was resuspended in ice-cold permeabilization buffer (10 mM Tris-Cl pH 7.4, 300 mM sucrose, 10 mM KCl, 5 mM MgCl₂, 1 mM EGTA, 0.05% Tween-20, 0.1% NP-40, 0.5 mM DTT, 0.1× Halt protease inhibitor and 0.02 U µl^–1 RNase inhibitor). After incubation on ice for 5 min, the cells were centrifuged to pellet the nuclei. The pellet was washed twice in the permeabilization buffer.

The washed pellet was resuspended in storage buffer (10 mM Tris-Cl pH 8.0, 5% glycerol, 5 mM MgCl₂, 0.1 mM EDTA, 5 mM DTT, 1× Halt protease inhibitor and 0.2 U µl^–1 RNase inhibitor) at a concentration of 5 × 10⁶ nuclei per 50 µl of storage buffer, flash-frozen in liquid nitrogen and stored at −80 °C. The nuclei and permeabilized cells in the storage buffer can be stored for up to 5 years at −80 °C, making them readily available for nuclear run-on experiments.

Nuclear run-on with 3′-(O-propargyl)-nucleotides

A volume of 50 µl of 2× nuclear run-on buffer (20 mM Tris-Cl pH 8.0, 10 mM MgCl₂, 400 mM KCl, 50 µM 3′-(O-propargyl)-ATP, 50 µM 3′-(O-propargyl)-CTP, 50 µM 3′-(O-propargyl)-GTP, 50 µM 3′-(O-propargyl)-UTP, 0.05% Sarkosyl, 1 mM DTT, 2× Halt protease inhibitor and 0.4 U µl^–1 RNase inhibitor) was prepared per sample and heated to 37 °C. Once thawed from −80 °C, permeabilized cells or nuclei were added to the heated tube containing nuclear run-on buffer and incubated for 5 min at 37 °C with gentle tapping at the incubation midpoint. Permeabilized cells or nuclei were centrifuged at 500g for 2 min at 4 °C, and the supernatant was aspirated off. The pellet was washed 3 times in 150 µl resuspension buffer (5 mM Tris-Cl pH 8.0, 2.5% glycerol, 2.5 mM MgAc₂, 0.05 mM EDTA, 1.25 mM MgCl₂, 60 mM KCl, 3 mM DTT, 0.2× Halt protease inhibitor and 0.2 U µl^–1 RNase inhibitor). After the final wash, the permeabilized cells or nuclei were resuspended in a 2 ml resuspension buffer and passed through a 35 µm nylon mesh (Falcon, 352235).

Single-cell sorting and nuclei sorting

For single-cell and nuclei sorting, 96-well plates with 2.5 µl 8 M urea were prepared using a multichannel or 96-well pipettor (Avidien MicroPro 300, 30835029). Single cell and nuclei populations characterized by forward and side scattering were sorted by FACS into the 96-well plate containing urea. The sorted plates can be used in CuAAC directly or sealed with aluminium foil or a plastic seal and stored at −80 °C.

CuAAC

A 96-well plate containing 5′-AzScBc DNA with a unique cell barcode in each well previously synthesized and aliquoted was thawed from −80 °C. Sodium ascorbate, PEG 8000, CuSO₄ and accelerating ligand BTTAA were prepared and dispensed into each well of the 96-well plate containing 5′-AzScBc DNA. The CuAAC reaction mix was dispensed into individual wells containing single cells in urea using a multichannel or 96-well pipette. The final concentration of CuAAC reaction in each well was 30 nM 5′-AzScBc DNA, 800 mM sodium ascorbate, 15% PEG 8000, 1 mM CuSO₄, 5 mM BTTAA and 2.66 M urea in a 7.5 µl volume. The 96-well plates were sealed, vortexed for 10 s in an orbital vortexer and centrifuged for 1 min at 500g before incubation for 2 h at 50 °C.

After incubation, the CuAAC reaction was quenched with 5 mM EDTA and pooled from 96 wells into a 1.5 ml Eppendorf tube. PEG 8000 was removed using TRIzol. The remaining CuAAC reagents (sodium ascorbate, CuSO₄ and BTTAA) were removed with a centrifugal filter with 3 kDa cellulose membrane (Amicon, 2020-04). The purified RNA was fragmented with 10 mM ZnCl₂ for 5 min at 65 °C.

RT through the triazole link and pre-amplification

RT of the clicked RNA–DNA conjugate was performed with highly processive Moloney murine leukaemia virus (M-MuLV) reverse transcriptase lacking RNase H activity but capable of RNA-dependent and DNA-dependent polymerase activity, non-templated addition and template switching (Thermo Fisher, EP0751). RT reaction (1× RT buffer, 0.5 mM dNTPs, 0.8 U µl^–1 RNase inhibitor, 16% PEG 8000, 1 µM RT primer (except for hairpin-forming 5′-AzScBc DNA), and 1 µm TSO) was incubated with the RNA–DNA conjugate for 2 h at 50 °C. The cDNA was size-selected in 10% denaturing PAGE away from the unclicked 5′-AzScBc DNA and empty cDNA formed between the 5′-AzScBc DNA and TSO.

The purified cDNA was PCR amplified for 6 cycles to generate dsDNA with NEBNext Ultra II Q5 High-Fidelity 2× master mix (NEB, M0544) and 0.5 µM PCR primers with unique dual index using the PCR cycles presented in Supplementary Table 8.

Removal of empty adaptors using DASH

The dsDNA from the pre-amplification of cDNA was subjected to DASH to remove the undesired amplicons formed by RT of unclicked 5′-AzScBc DNA and TSO, as described above. Cas9–gRNA complex (6.6 µM Streptococcus pyogenes Cas9 nuclease (NEB, M0386T), 20 µM gRNA, 1× NEBuffer r3.1 and nuclease-free duplex buffer (IDT, 11-05-01-04)) was prepared by incubation for 15 min at 25 °C. The incubated complex was added to the cleaned PCR reaction and incubated for 1 h at 37 °C.

PCR amplification and NGS

The DASHed library was PCR amplified with NEBNext Ultra II Q5 High-Fidelity 2× master mix (NEB, M0544) and 0.5 µM PCR primers with a unique dual index using the two-step PCR cycles presented in Supplementary Table 9.

The NGS library was sequenced on Illumina NovaSeq SP100 flow cells with 64 nucleotides forward read, 43 nucleotides reverse read, 8 nucleotides index 1 and 8 nucleotides index 2.

Alignment and pre-processing

Adaptor sequences were removed from paired-end fastq files using Cutadapt⁵⁴. In brief, the read 1 sequence CCCCTGTCTCTTATACACAT and the read 2 sequence AGATCGGAAGAGCGTCGTGT were trimmed with a maximum error rate of 0.15, requiring a minimum overlap of 12 nucleotides between the read and adapter. The resulting adapter-trimmed reads were demultiplexed using Flexbar⁵⁵. Cell barcodes and UMIs were extracted from the 5′ end of read 1, applying a barcode error rate of 0.15 and retaining reads of at least 14 nucleotides in length. The adapter-clipped and demultiplexed reads were first mapped to the mouse ribosomal genome using bowtie2 (ref. ⁵⁶) in –very-sensitive mode. The reads unmapped to the ribosomal genome were mapped to the mouse genome (mm10 build) in –very-sensitive mode. After mapping, duplicate reads were identified and removed utilizing UMI and mapping coordinates with UMI-tools⁵⁷.

Filtering experimental batches and cells

The scGRO–seq batches with r² values of at least 0.6 against at least 60% of all batches were selected for further analysis. Cells were required to contain a minimum of 1,000 UMIs and 750 features for further analysis. Our study involved 17 batches of scGRO–seq experiments across 39 96-well plates, encompassing a total of 3,744 cells. Of these, 36 plates (each containing a minimum of 24 high-quality cells) and 2,635 cells met the threshold.

Estimation of capture efficiency

The average capture efficiency of scGRO–seq was estimated to be approximately 10%. We used data from the intron seqFISH study¹⁷, which quantified the abundance of 34 introns by single-molecule fluorescent in-situ hybridization (smFISH). Based on the slope of the line of best fit between data from smFISH and intron seqFISH, the detection efficiency of intron seqFISH was estimated to be 44%. When scGRO–seq was compared with intron seqFISH, the detection efficiency of scGRO–seq was 26% of intron seqFISH. Based on these two detection efficiencies, the estimated capture efficiency of scGRO–seq is about 10% (26% of 44% is approximately 10%). This estimate is based on the 8 min of median time required for intron to be spliced out once it is transcribed, which ranges from 5 to 10 min according to several studies using diverse methods^{58,59,60,61,62,63,64}. Thus, the capture efficiency of 10% is an average approximation and can vary among cells and batches.

Enhancer annotation

Active transcription regulatory elements (TREs) in mouse ES cells were identified with PRO–seq data using dREG⁶⁵. Further filtering of the dREG results, carried out to eliminate TREs within or proximal to 1,500 bp of the RefSeq annotated genes (n = 23,980), identified 68,299 high-confidence TREs. The remaining TREs within 500 bp of each other were combined, which resulted in the final list of 12,542 enhancers. To capture nascent RNA derived from elongating RNA polymerases at these enhancers, the TREs were extended at least 1500 bp from the TSS in both directions. The overlapping enhancers were stitched together after extension.

Transcription unit calling

groHMM (https://www.bioconductor.org/packages/release/bioc/vignettes/groHMM/inst/doc/groHMM.pdf) was used to call de novo transcription unit on PRO–seq data. All combinations of tuning parameters (−50, −100, −200, and −400 for LP and 5, 10, and 15 for UTS) were tested. LP represents the ‘log-transformed transition probability of switching from transcribed state to non-transcribed state’, and UTS represents ‘the variance of the emission probability for reads in the non-transcribed state’. In our test, −50 LP and 10 UTS performed best for optimal transcription unit calling.

Evidence of bursting

Transcriptional bursting was examined de novo using scGRO–seq data by measuring two parameters: the multiplicity of RNA polymerases and the distance between the RNA polymerases. The bursting model suggests that transcription occurs in short bursts punctuated by long silent periods, which results in on and off states. The alternative model is the relatively uniform transcription initiation by primarily solitary RNA polymerase. We expected two observations under the bursting model.

First, we expected a higher incidence of more than one RNA polymerase per burst and a concurrent depletion of single RNA polymerases. To test the evidence of bursting, we selected genes longer than 11 kb (n = 13,564) and trimmed 0.5 kb regions from the 5′ and 3′ ends of the gene that are known to harbour paused polymerases. With an average transcription rate of 2.5 kb min^–1, the remaining 10 kb region resulted in an observation window of 4 min. Based on the evidence of monoallelic transcription described in the main text and a short observation window of 4 min, we assigned all signals for a gene in individual cells to one allele. We quantified the observed incidence of zero, one (singlets) and more than one RNA polymerase (multiplets) per allele. The majority of alleles had zero polymerase. To calculate the expected incidences of RNA polymerases under the non-bursting model, we permuted the cell identity of scGRO–seq reads 200 times without changing the read positions. The permutation maintains the number of UMIs per cell, breaks the bursting-mediated association between RNA polymerases, and mimics the RNA polymerases distribution under the non-bursting model. We quantified the permuted incidences of zero, singlets and multiplets.

Second, if more than one RNA polymerase is observed in the burst window, either due to transcriptional bursting or random chance, we expected the transcription bursting model would result in more closely spaced molecules than expected by the random chance. We took all multiplets in observed or permuted data and calculated the distance between RNA polymerase molecules within each pair. We binned the distances in 50 bp bins and calculated the ratio of RNA polymerase pairs between the observed and permuted data.

Burst kinetics

Genes over 11 kb (n = 13,564) were selected for studying transcriptional bursting kinetics, and 500 nucleotide regions at both ends known to harbour paused polymerases were truncated. In cases in which genes exceeded 10 kb after trimming, they were shortened to 10 kb starting from the initiation site of the gene. With an average transcription rate of 2.5 kb min^–1, this 10 kb burst window served an average burst duration of 4 min. The calculation of burst size and burst frequency proceeded as described below.

Burst size

For each gene, the number of cells with at least one read within the 10 kb burst window (number of bursts) was identified, and then the average UMIs per burst was computed. If a consistent single read per burst was observed, the burst size of that gene was set to 1. However, if the average burst size was 1.2, the residual burst above 1 indicated a higher burst size. Accounting for the 10% capture efficiency, wherein the likelihood of capturing paired reads within a burst window is 1%, the residual burst was proportionally adjusted by the capture efficiency. The equation for the burst size is shown in Supplementary Fig. 4 (top).

Burst frequency

For each gene, the burst frequency was determined as the number of bursts per allele (two alleles in autosomal and one in sex chromosomes) per transcription time. The transcription time was calculated as the duration needed to traverse the 10 kb burst window with a uniform transcription rate of 2.5 kb min^–1, translating to 4 min. The calculated burst frequency was normalized by the capture efficiency, taking the burst size into account. Although burst events with a larger burst size, like ten, would be consistently detected even with 10% capture efficiency, normalization was applied for cases in which a burst size like four would result in a 60% false negative rate, which indicated a non-existent burst despite active bursting. Thus, burst frequency normalization was scaled by burst size to ensure accurate quantification. The equation for the burst frequency is shown in Supplementary Fig. 4 (bottom).

Genes with core promoter elements like TATA and Initiator sequences were retrieved from the Eukaryotic Promoter Database (http://epd.vital-it.ch)⁶⁶. Genes containing a pause button, a sequence associated with promoter–proximal paused RNA polymerase, were recovered from the CoPRO dataset⁶⁷.

Simulation of idealized burst kinetics

We simulated read counts for populations of single cells to evaluate the performance of our estimators for burst rate and size. In the first simulation, we randomly generated the true burst size (T_size) for all human genes from a normal distribution (mean = 2, standard deviation = 3). Similarly, we generated true burst rates (T_rate) for all human genes from a normal distribution (mean = 1, standard deviation = 1). T_size less than 1 was corrected to 1, and T_rate less than 0.1 burst per hour was corrected to 0.1. These parameters were used to simulate UMIs per gene per cell as follows:

1.

For each cell and each gene, a sample from a Poisson distribution with rate parameter λ = T_rate.
2.

Scale the sampled burst by T_size and round to the nearest integer.
3.

After generating molecule counts for all genes and all cells, randomly subsample to a specified level (for example, 10% sampling efficiency) without replacement.

In the second simulation, T_size and T_rate were taken from our genome-wide estimates described in Fig. 2, and UMIs per gene per cell were similarly generated. Simulations were performed ten times to ensure consistent results.

Cell cycle analysis

Three sets of transcriptionally characterized genes were used to characterize the cell cycle phase in individual cells. Transcription of 68 replication-dependent histone genes on chromosome 3, chromosome 6, chromosome 11 and chromosome 13 were used to determine the S phase collectively. Transcription of four genes (Orc1, Ccne1, Ccne2 and Mcm6) were used to assign G1/S phase, and six genes (Wee1, Cdk1, Ccnf, Nusap1, Aurka and Ccna2) were used to assign G2/M phase. Cells with more than a read in one of the genes or reads in more than one gene were hierarchically clustered, which revealed three major clusters of the cell-cycle-phase-specific transcription pattern. The other three smaller clusters without distinct transcription patterns were not considered for downstream analyses. Differentially expressed genes among G1/S, S and G2/M phases of the cell cycle were identified using the ‘FindAllFeatures’ function of Seurat⁶⁸ (single-cell analysis package).

Gene–gene co-transcription

The co-transcription of genes was determined using two criteria: correlation and permutation. scGRO–seq reads were collected from up to the first 10 kb of genes after 500 bp regions at both ends were trimmed (n = 15,666). The genes by cells expression matrix was binarized. For the correlation approach, pairwise correlation was performed for all gene pairs, and the P value was calculated using the chi-square test. It was adjusted for multiple hypothesis tests using the Benjamini–Hochberg correction method.

Permutation was performed by shuffling the cell identifiers of reads while maintaining their gene assignments. The permutation method accounts for several unknown and known biases and, more importantly, maintains the number of reads in each cell. The observed and permuted co-transcription frequencies of gene pairs were calculated. The empirical P value for a gene pair was determined by counting the incidence of equal or higher co-transcription frequency in 1,000 permutations compared with the observed co-transcription frequency.

Gene pairs with correlation coefficients of greater than 0.1 and multiple hypothesis corrected P values of less than 0.05 from the correlation approach and an empirical P value of less than 0.05 from the permutation approach were considered co-transcribed. A network of pairwise co-transcribed genes was created using the Leiden algorithm, and the modules were selected for gene ontology analyses using the clusterProfiler R package.

Enhancer–gene co-transcription

Enhancer–gene co-transcription was determined following the logic of gene–gene co-transcription, substituting genes on one arm with enhancers. scGRO–seq reads were collected from up to the first 10 kb of genes after 500 bp regions at both ends were trimmed, and from at least a 3 kb region around enhancers (1,500 bp sense and 1,500 bp antisense) after a 500 bp region around the TSS was removed to avoid paused polymerases. Strand-specific reads on either side of the enhancer TSS were combined to determine enhancer expression. The features (genes + enhancers) by cell expression matrix was binarized, and the co-transcribed enhancer–gene pairs were determined using the correlation and permutation tests, similar to the approach used in the gene–gene co-transcription calculation. The UMIs per cell are maintained in each permutation. Enhancer–gene pairs only from the same chromosomes were retained for downstream analyses. We also included non-overlapping SEs identified in mouse ES cells.

Enhancers of pluripotency factors

Validated enhancers associated with pluripotency transcription factors OCT4 (also known as POU5F1), SOX2, Nanog and KLF4 were collected from studies referenced in the main text. To define time bins within genes, genes were divided into 5 kb bins (2-min bins calculated using the 2.5 kb min^–1 constant transcription rate of elongating RNA polymerases) in the sense and antisense direction until the end of the transcription wave called by groHMM⁶⁹, or they overlapped bins from other genes. For enhancers, the TSS was first determined based on the strongest OCT4, SOX2 and Nanog chromatin immunoprecipitation and sequencing (ChIP–seq) peaks. The precise position was determined by evaluating the divergent transcription around them. The reads from corresponding bins in sense and antisense directions were combined.

CRISPR-validated SEs

A set of validated SEs and their target genes were used from a previously published study referenced in the main text. SEs in gene introns or associated with miRNA were excluded due to the ambiguity in assigning reads and short gene length, respectively. For the time bin analyses, genes and SEs were divided into four 5 kb bins (2-min with the 2.5 kb min^–1 constant transcription rate of elongating polymerases) in the sense and antisense direction, limiting the analyses to the first 20 kb. Using a 20 kb region in this analysis yields four 5 kb bins. The TSS was first determined based on the strongest OCT4, SOX2 and Nanog ChIP–seq peaks, and precise position was determined by evaluating the divergent transcription around them. The reads from corresponding bins in sense and antisense directions were combined. The scrambled random pairs in SE–gene time bin analysis represent the co-transcribed bins between SEs and genes that are not the verified pairs.

External data

Various data types were analysed, compared and benchmarked against this study. PRO–seq data (GSE169044), ChIP data for p300 (GSM2360934), ATAC–seq (GSE169044), CDK9 (GSM1082347), RNA PolII (GSM318444), H3K4me1 (GSM281695), H3K4me3 (GSM1082344), H3K27Ac (GSM594579), OCT4 (GSM1082340), SOX2 (GSM1082341) and Nanog (GSM1082342) were downloaded from the Gene Expression Omnibus database. PRO–seq libraries were prepared using the same cells used for scGRO–seq under identical conditions⁷⁰. Intron seqFISH data on mouse ES cells were downloaded from table S1 of ref. ¹⁷. The genes-by-cells intron seqFISH matrix was binarized, and burst frequency was calculated assuming the signal in each gene comes from a burst equivalent to the 10 kb region used in scGRO–seq, given the probes were designed against the introns at the 5′ regions of genes. Mouse ES cell scRNA-seq was used from a previous study⁷, and the burst kinetics was downloaded from 41586_2018_836_MOESM5_ESM.xlsx file associated with this study.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

Software tools identify forgotten genes

[ad_1]

Illustration of a magnifying glass looking at a strand of DNA isolated on a dark blue background — Credit: Adapted from Getty

The human genome contains some 40,000 protein-coding and non-coding genes. They’re not all studied equally, however. Although scientists can now survey thousands of genes at a time to find those associated with a given trait, they still tend to focus on the same genes that were popular even before the Human Genome Project was completed, more than 20 years ago.

A pair of tools aims to flag interesting but neglected human genes to researchers who might be looking for genetic diamonds in the rough.

NatureTech

One tool, called Find My Understudied Genes (FMUG), emerged from a study published in March¹, which first explores why interesting, but relatively under-researched, genes are not highlighted in genetic surveys, and then offers FMUG as a remedy.

The second tool is the Unknome database, created by a team led by Matthew Freeman at the University of Oxford, UK, and Sean Munro at the MRC Laboratory of Molecular Biology, Cambridge, UK, that was described² in 2023.

“We are in the lucky position to know what we don’t know,” says Thomas Stoeger, a biologist at Northwestern University in Chicago, Illinois, and co-author of the FMUG study.

Given a set of genes, the Unknome database identifies orthologs — genes with common ancestry — in other species, then counts the number of published findings on each gene and its relatives, weighted by the strength of the evidence behind the finding. Users can rank genes by how studied they are.

Economists count the cost of ‘risky’ science

FMUG helps people to narrow a list of human genes — such as possible targets from genome-scale sequencing studies — using various filters, including the gene’s popularity in the published literature. Stoeger’s lab used a prototype of the software to focus its efforts on an ageing-associated gene called SFPQ. Subsequent work allowed the team to document that decreased expression of SFPQ caused effects similar to ageing³. “There’s a lot of excitement in the ageing community about that finding,” Stoeger says.

Pattern of neglect

There are plenty of reasons that some genes are studied more than others. One obvious possibility is that some sequences simply have weaker links to diseases and are therefore less ‘interesting’ to researchers and funding bodies. But studies have found little correlation between the strength of evidence for a gene and the number of papers that are published about it. Stoeger and his colleagues found¹, for example, that 44% of the genes that the US National Institutes of Health had identified as promising targets for Alzheimer’s disease hadn’t been mentioned in the titles or abstracts of any papers on Alzheimer’s.

The team considered three other explanations for the lack of studies: that understudied genes don’t turn up in the genomic surveys as ‘hits’ related to traits or diseases; that study authors fail to highlight the understudied genes; or that authors of follow-up papers fail to study understudied genes that genomic papers have highlighted. By analysing 909 genome-wide surveys collected from 4 study databases and thousands of papers on specific genes that cited those surveys, the authors found support for the second hypothesis: of 18,295 genetic hits identified in 148 surveys of gene-expression data included in the analysis, just 161 were mentioned in the study title or abstract. Those that were mentioned tended to be already well studied in the literature.

Ambitious survey of human diversity yields millions of undiscovered genetic variants

The authors then asked why authors of genomics studies might choose to highlight certain genes over others. One reason, they found, was the availability of research reagents specific to those genes. Another was the number of existing papers on the genes. It’s a self-reinforcing loop, says Reese Richardson, a biologist at Northwestern University and a co-author of the paper. If not many people study a gene, others might not develop the tools to study it, so it remains unstudied.

Some genes might also be understudied for sociological reasons, says Freeman. “There’s this tension between wanting to be pioneers and explorers, and on the other hand, feeling safe in little social groups where you get recognition and kudos, and you’re confident that your paper will be reviewed by a buddy of yours who works on something similar,” he says. Intriguingly, the Northwestern researchers found that papers on understudied genes receive more citations than other papers.

Users of FMUG — which is available for Windows, macOS and iOS — import a list of genes and then can apply any of about 300 filters to highlight, say, genes with fewer than a certain number of associated papers. Other filters include the existence of certain tools for studying the genes, or the availability of a mouse homologue. “With a few minutes of work, you might find something that you are able to study that others are not,” Stoeger says.

Researchers at the University of Copenhagen used FMUG to show that extremely compact ‘intrinsically disordered’ proteins tend to be understudied relative to those with more common characteristics⁴. “I found the tool useful simply because it enables us to highlight these kinds of disparities,” Giulio Tesei, one of the paper’s co-authors, says.

The most popular genes in the human genome

And this month, researchers posted a paper⁵ on the bioRxiv preprint server highlighting understudied genes in the roundworm, Caenorhabditis elegans. The authors compiled 432 tables from 112 papers listing genes that silence RNA in the roundworm, then listed the genes from those tables that didn’t appear often (fewer than 10 times) in the main text of the 112 articles or other papers.

Freeman says that beyond their practical benefits, such tools are valuable because they raise awareness of blind spots in the published literature. There’s a high cost to such neglect, he says: understudied genes can have key roles in fundamental biology, disease aetiology and drug discovery. And for budding researchers, there’s another benefit, he adds. In establishing a research group, scientists need to identify a topic to call their own. “The big challenge is, how do I find my own niche?”

[ad_2]

Source link

May 24, 2024

Cells cope with altered chromosome numbers by enhancing protein breakdown

[ad_1]

Nature, Published online: 22 May 2024; doi:10.1038/d41586-024-01360-6

When chromosomes are lost or gained, massive changes in gene expression disrupt the delicate balance of proteins in a cell. Yeasts with incorrect chromosome numbers counteract this by degrading excess proteins.

[ad_2]

Source link

May 22, 2024

what three human recipients have taught scientists

[ad_1]

Rear view of four people in blue protective gear performing surgery. — Surgeons transplant a pig kidney into 62-year-old Richard Slayman, who died this month of causes unrelated to the procedure.Credit: Massachusetts General Hospital

Last week, the first living person to receive a kidney from a pig died, just under two months after his transplant — sooner than his doctors had expected. But the timing is in keeping with that of the first people to receive pig hearts, both of whom died around two months after their transplants.

The relatively short survival time for all three recipients demonstrates that these pioneering cross-species transplants “have not had as great success as would have been predicted from the primate studies”, says Robert Montgomery, a transplant surgeon at New York University (NYU) in New York City.

But the three procedures offered hope to desperately ill people who had run out of options. And researchers say that they have learnt valuable lessons from the first pig-organ transplants into humans, on topics ranging from the types of medication that recipients need to the amount of testing that pig organs must undergo. “This is not an insolvable problem,” Montgomery says. “I’m encouraged that we’re as far along as we are.”

Nature spoke to xenotransplant surgeons about what they’ve learnt so far, and how they see the field moving forwards.

Relieving a shortage

The use of organs from other species in humans, called xenotransplantation, has long been a dream of surgeons because of the chronic shortage of suitable human organs. Researchers have homed in on pigs as a donor species, in part because their organs’ size and anatomy resemble those of humans.

Data from non-human primates that have received pig organs are promising: a study¹ published in 2023 reported that five monkeys each survived for more than one year after receiving transplanted pig kidneys.

First pig liver transplanted into a person lasts for 10 days

The first xenotransplant into a living person was in 2022, when 57-year-old David Bennett received a pig heart and survived for 60 days after the procedure. A second man, Lawrence Faucette, received a pig heart in 2023 and survived for 40 days.

Muhammad Mohiuddin, a surgeon at the University of Maryland School of Medicine in Baltimore who was on the care team for both pig-heart transplants, cites several possible explanations for Bennett’s death. In the weeks before he died, Bennett had an infection, so physicians gave him an immune-boosting therapy made up of pooled antibodies from thousands of donors. Scientists later found that some of the antibodies had reacted to the pig organ², meaning that the treatment could have exacerbated Bennett’s condition. Since then, Mohiuddin has worked with local blood banks to develop ways to screen for reactive antibodies.

Another possible explanation for Bennett’s limited survival is a latent infection of the transplanted heart with a pathogen called porcine cytomegalovirus, which might have been activated and then harmed the heart. The virus was found in the organ after Bennett’s death but was missed by tests before the transplant, signalling that more sensitive tests must be used to screen organs, Mohiuddin says.

Compassionate use

All the xenotransplants into living people have received ‘compassionate use’ approval from the US Food and Drug Administration (FDA), granted in rare cases in which a person’s life is at risk and there are no other treatments available. People treated on such grounds tend to be much sicker than the average person on the transplant waiting list, making it difficult to work out whether an unfavourable outcome is the result of the procedure or the recipient’s poor health, Mohiuddin says. That’s why some researchers have been pushing for the FDA to begin clinical trials of the procedure, which would allow for systematic evaluation of its performance.

It’s possible, for example, that poor underlying health contributed to the death on 7 May of Richard Slayman, the first living recipient of a pig kidney. Tatsuo Kawai, one of the surgeons who conducted the transplant at Massachusetts General Hospital in Boston, tells Nature that Slayman’s kidney was functioning well the day before his death and that he died for reasons unrelated to his transplant. In the year before the procedure, Slayman had developed congestive heart failure.

Five people wearing face masks pose in a hospital. — Pig-kidney recipient Richard Slayman (seated) with his partner and doctors.Credit: Michelle Rose/Massachusetts General Hospital

Researchers are also experimenting with what can be done before the transplant to best prevent organ rejection. One technique is genetically modifying the donor pigs, but the number of genetic edits necessary to stave off rejection is far from settled, Montgomery says.

eGenesis, a biotechnology firm in Cambridge, Massachusetts, that bred the pig used in Slayman’s surgery, has produced pigs with a record 69 edits, both to avoid rejection and to reduce the risk that a virus lurking in the organ could infect the recipient. Meanwhile, Revivicor, a firm in Blacksburg, Virginia, has opted for about ten genetic edits.

Pig organs partially revived in dead animals — researchers are stunned

In the fourth and latest xenotransplant in a live person, Montgomery and his team tried a new approach using the thymus, an immune-related organ that could help teach the recipient’s immune system to recognize the pig organ. They grafted the source pig’s thymus to the kidney and then transplanted both into 54-year-old Lisa Pisano on 12 April. They used a pig with only a single genetic modification, which could make scaling up the production of pig organs easier, Montgomery says. Pisano remains in stable condition in hospital, he adds.

There is still much more to be learnt, he says. In a forthcoming study and in one published today in Nature Medicine³, Montgomery and his colleagues analysed tissue samples from two people who had been declared legally dead before receiving a pig heart and found that at the cellular level, rejection of xenotransplanted organs looks “very different” from that of organs transplanted from a human donor, Montgomery says. He adds that these findings could help researchers to anticipate rejection and develop tailored immunosuppressant regimens for future surgery.

[ad_2]

Source link

May 17, 2024

85 million cells — and counting — at your fingertips

[ad_1]

When it comes to single-cell gene-expression data, biologists face an embarrassment of riches. There are thousands of data sets to choose from. Unfortunately, those data sets have not all been processed in the same way; they might use different names for similar or identical cells or tissues; and they are scattered across the Internet — or available only on request.

Smart software untangles gene regulation in cells

Using any one data set is relatively straightforward. But collecting, curating and integrating the data to draw conclusions across experiments, is — in the words of bioinformatician Timothy Triche Jr at the Van Andel Institute in Grand Rapids, Michigan — “a huge pain in the butt”.

In one 2023 study¹, for instance, computational biologist Christina Theodoris at Gladstone Institutes in San Francisco, California, described a deep-learning model called Geneformer. Building on some 30 million single-cell transcriptomic data sets that Theodoris manually aggregated in 2021, Geneformer allows researchers to predict the impact of gene perturbations in cell types or genes it has never seen. But because the data were scattered across 18 public databases and multiple independent laboratories, she says, “it took me two months to collect all that data and process it”.

A vast resource

Today, the same effort would take only minutes, she says, thanks to a new resource from the Chan Zuckerberg Initiative (CZI) in Redwood City, California. Chan Zuckerberg CELL by GENE Discover (CZ CELLxGENE) is a collection of free and open-source tools for finding, querying, analysing, downloading and publishing single-cell data. As of April, it includes some 85 million single cells and 1,317 data sets covering 844 cell types, curated and uniformly processed by a team of 25 or so engineers, data curators and other staff, according to Patricia Brennan, vice-president of science technology at CZI. Most of the data represent single-cell RNA sequencing information from healthy human tissues, but non-human and cell-line data, as well as molecular-profiling data obtained using spatial transcriptomic methods, are also available. All of these data are stored in a common format, using a standard set of cell types and metadata.

A laptop screen showing a UMAP plot within the Chan Zuckerberg CELL by GENE platform — The CZ CELLxGENE tool helps researchers to visualize gene-expression data.Credit: Chan Zuckerberg Initiative

Users can find and explore the non-spatial data through the CZ CELLxGENE data portal, or access it using the R or Python programming languages through an application-programming interface called Census. (Spatial data should be added later this year, a spokesperson for CZI says.) Meera Prasad, a graduate student at the California Institute of Technology in Pasadena, is using CZ CELLxGENE to characterize the microenvironment across 9 million healthy and cancerous mammary cells representing some 150 cell types. By integrating those data with her lab’s spatial data, Prasad hopes to better replicate the tumour microenvironment, and also to identify genes that are related to the structural changes associated with cancer.

How to make spatial maps of gene activity — down to the cellular level

CZ CELLxGENE enables two key applications, says Jonah Cool, a science programme officer at CZI. Most obviously, researchers can ask questions across a vast amount of data that they and others have collected. Triche, for instance, has plumbed some 12 million mouse cells to study the influence of sex chromosomes on the biology of immune cells. “That’s approximately 11-and-a-half million more cells than we would typically run in a single-cell experiment,” he says. Repeating those analyses in-house would be a waste of money, but leveraging data that others have processed can be tedious. By ‘harmonizing’ these data sets and putting them in one place, CZ CELLxGENE removes many of what Triche calls “schlep steps”. “People underestimate the degree to which the impact of this data is amplified by making it usable for anybody who wants to,” he says.

The other application is in artificial intelligence. Researchers can use CZ CELLxGENE to build and train computational models that can predict, for instance, the identity of a cell or the impact of specific perturbations.

Model modularity

Users can select any of five such models, including Geneformer, and refine or apply them to their own data. They can also download ‘embeddings’ — compressed numerical representations of transcriptional data — from any of them, allowing users to ‘project’ their data and CZ CELLxGENE data into a common space. That, says Cool, means researchers can ask questions such as what cells are similar to a researcher’s cells, or which conditions induce changes in those cells.

Computer scientist Jure Leskovec at Stanford University in California, used his Universal Cell Embeddings model², which he trained on CZ CELLxGENE data, to identify rare mouse kidney cells known as Norn cells. By then applying this ‘classifier’ to a larger data set of 36 million cells, he found that Norn cells were also present in the heart, lung and gonads. “This generalizability is the key capability of these models,” he says.

NatureTech hub

CZ CELLxGENE is not the only resource that aggregates and simplifies single-cell data analysis. The Human Cell Atlas, for instance, has its own data portal. And both the University of California, Santa Cruz, and the Broad Institute of MIT and Harvard in Cambridge, Massachusetts, among others, host tools for analysing select single-cell data sets online.

In March, Lior Pachter, a computational biologist at the California Institute of Technology, and his team described their Commons Cell Atlas infrastructure³^,⁴ that stores and uniformly processes raw sequence data across data sets. (By contrast, CZ CELLxGENE retains data as ‘gene-count matrices’, although links to the original sequence data are also maintained, a spokesperson says.) These sequence data can be reanalysed as gene annotations change, Pachter notes, and his team exploited that to study gene-splice isoforms in human testis. “It’s really powerful and useful to be able to go back and rebuild the atlas again and again and again,” he says.

In September 2023, CZI announced that it would build a computing cluster of 1,000 graphical processing units (GPUs), which can rapidly accelerate or scale up model development.

This is helpful to researchers because most labs doing single-cell research, Cool explains, have access to maybe a handful of GPUs, therefore limiting the complexity of the models that they can build and lengthening experiments. Using the new cluster, Cool says, researchers can begin to build more sophisticated — and accurate — models. The cluster is expected to be “up and running by June”, a spokesperson says.

[ad_2]

Source link

April 29, 2024

Spatiotemporal transcriptomics reveals insights into primate ovarian aging

[ad_1]

The ovary is an essential organ for female fertility, and its age-dependent decline in function is a major cause of infertility. However, the molecular mechanisms underlying ovarian aging are still not well understood, particularly in higher vertebrates like primates. In this study, researchers used spatiotemporal transcriptomics to analyze the gene expression patterns in young and aged primate ovaries.

Key findings from the study include:

Somatic cells in the non-follicle region of the ovary undergo more significant transcriptional changes with age compared to those in the follicle region, suggesting that this area may be more susceptible to aging and could contribute to a hostile microenvironment that accelerates ovarian aging.

The study identified four primary contributors to ovarian aging (PCOA): inflammation, the senescent-associated secretory phenotype (SASP), cellular senescence, and fibrosis. These factors are likely to play crucial roles in the decline of ovarian function and fertility.

Researchers discovered spatial co-localization between a PCOA-featured spot and an unappreciated MT2 (Metallothionein 2) highly expressing spot (MT2high). This area is characterized by high levels of inflammation and may act as an “aging hotspot” within the primate ovary.

With advancing age, a subpopulation of MT2high cells accumulates, potentially spreading and amplifying senescent signals throughout the ovary.

This study provides the first comprehensive spatiotemporal transcriptomic atlas of the primate ovary, offering new insights into the molecular mechanisms underlying ovarian aging. These findings could lead to the identification of biomarkers and therapeutic targets for aging and age-related ovarian disorders, ultimately contributing to improved fertility treatments and interventions for women. The work entitled ” Aging hallmarks of the primate ovary revealed by spatiotemporal transcriptomics” was published on Protein & Cell (published on Dec. 21, 2023).

Source:

Journal reference:

Lu, H., et al. (2023). Aging hallmarks of the primate ovary revealed by spatiotemporal transcriptomics. Protein & Cell. doi.org/10.1093/procel/pwad063.

[ad_2]

Source link

April 19, 2024

Tumor microbiomes offer new insights for enhancing cancer therapies

[ad_1]

In a recent study published in the journal Cell, researchers used metagenomics, genomics, and transcriptomics to examine microbiome genomes in over 4,000 metastatic tumor tissues. They analyzed the tumor microbiome and tumor microenvironment (TME), offering biological information and influencing the development of bacteria-focused techniques to supplement and improve cancer treatments.

Microbial communities play a crucial role in the human body, influencing the immune system and anticancer therapies. They are present in primary tumors and interact with the commensal microbiota. The gut microbiota can modulate immune checkpoint blockers (ICB) and conventional chemotherapies. Fecal microbial transplants may improve clinical responsiveness to ICB agents. Understanding how tumor-resident bacteria shape tumor biology, immune infiltration, and treatment responsiveness is essential for understanding tumor response to ICB.

Study: A pan-cancer analysis of the microbiome in metastatic cancer Study: A pan-cancer analysis of the microbiome in metastatic cancer

About the study

In the present study, researchers used bioinformatics to investigate the microbiota in metastatic malignancies, evaluating 4,160 specimens from diverse cancer types.

The researchers used mapping and assembly-based metagenomics, genomes, transcriptomics, and clinical data to develop a pan-cancer repository that might help advance treatment techniques. They used two distinct computational approaches, PathSeq and Kraken2, to define tumor-resident microbiome communities at the genus level and a metagenomic assembly-based approach at the species level. The team then shaped the metastatic tumor microbiome by identifying the elements that influence its makeup and evaluating cancer-type-specific microbial communities. They used the characteristic hypoxia gene profile to assess the degree of hypoxia in metastatic cancers and then performed gene set enrichment analysis (GSEA). They also investigated whether microbial communities may affect host immunity and the TME.

The researchers investigated the relationship between gram-negative bacteria in metastases and Toll-like receptor (TLR) expression and whether lipopolysaccharide (LPS), obtained from dead or active bacteria, plays a primary role in TLR4 signaling in metastases. They additionally examined the relationship between bacterial makeup and tumor gene expression and the relationships between particular bacteria and immune cells.

To further understand the impact of metastatic heterogeneity and the durability of tumor-resident microorganisms over time, the team examined 185 pairs of 370 repeated tumor specimens obtained from 173 different individuals. They examined bacterial enrichment changes before and after tumor treatment with immunotherapy, targeted therapy, or hormone therapy. They also investigated bacterial count reductions following immunotherapy in responsive patients and whether these germs were more prevalent in non-responsive individuals before treatment. Lastly, they examined pre-treatment bacterial communities associated with a lack of response to immunosuppressive medication in an ICB-monotherapy cohort of NSCLC patients.

Results

The researchers detected tumor-resident bacteria deoxyribonucleic acid (DNA) in a pan-cancer metastasis cohort, and assembling tumor-derived bacterial DNA provided species-level genomic characterization. Bacterial diversity correlated with cellular and molecular tumor immunity characteristics. In an NSCLC cohort, high levels of fusobacterium DNA imply a poor immunotherapy response. Researchers found organ-specific microbe tropisms, anaerobic bacteria enrichments in hypoxic tumors, links between microbial diversity and tumor-infiltrating neutrophils, and Fusobacterium’s relationship with resistance to ICB therapy in lung cancer.

Using mapping-based techniques and screening genera to eliminate technical contamination and seldom-seen genera, the team cataloged 165 microbial genera from 3,526 specimens, with 68% facultative/anaerobic and 49% gram-negative anaerobes. They built 514 metagenomic-assembled genomes (MAGs) of medium- to almost high-quality using tumor-derived microbial sequences. The most common tumor types were colorectal, breast, prostate, lung, and melanoma, with the lymph node, liver, and lung being the most common metastatic locations for tumor samples.

The quantity of bacterial-derived reads expressed as a human-mapped genetic read proportion varied by cancer type, with higher fractions in renal and uterine malignancies and lower burdens in tumors originating from the brain and spinal cord. Renal and colorectal metastases were the most diverse, but head and neck metastatic tumors showed more dominant microbial genera.

Tumor-resident microbial communities were associated with tumor biology, with a strong correlation between LPS load and TLR4 signaling but not gram-positive lipoteichoic acid (LTA) load. Multivariate Cox proportional-hazards modeling showed lower overall survival (OS) and progression-free survival (PFS) rates significantly correlated with continuous Fusobacterium abundance, considering the genome-wide mutational load. Using the pan-cancer dataset, the researchers classified all tumors as Fuso-high or Fuso-low based on an upper quartile relative abundance cutoff similar to previously established criteria. Fuso-high tumors showed considerably decreased cytotoxic, interferon-gamma (IFN-γ), and major histocompatibility complex (MHC) class II gene expression profiles.

The study provides the first large-scale pan-cancer map of intratumor microbiomes in metastatic malignancies, examining diversity across anatomical regions, initial tumor type, and treatment responses, including immunotherapy. The study showed that the metastatic microbiome partially comprises anaerobic bacteria that may get altered during treatment. The study also discovered links between intra-tumoral microorganisms and the activation of innate immune sensing pathways, indicating that the tumor microenvironment alters via direct identification of bacterial ligands.

[ad_2]

Source link

April 11, 2024

Tag: Transcriptomics

TopACT

Synthetic data generation

Synthetic data analysis

MPH landscapes

Statistical tests and reproducibility

Animals

Mouse kidney processing

Human tissue

Mouse brain model

Spatial RNA-seq

snRNA-seq

Single-nucleus clustering

Mouse kidney immune subclustering

Comparison to ssDNA-based segmentation

Multiplex immunofluorescence

Reporting summary

scGRO–seq conceptualization

Development of AGTuC

Development of inAGTuC

3′-(O-propargyl)-nucleotide synthesis

Single-cell barcoded DNA adaptors

Cell line

Cell culture

Sample preparation

Nuclear run-on with 3′-(O-propargyl)-nucleotides

Single-cell sorting and nuclei sorting

CuAAC

RT through the triazole link and pre-amplification

Removal of empty adaptors using DASH

PCR amplification and NGS

Alignment and pre-processing

Filtering experimental batches and cells

Estimation of capture efficiency

Enhancer annotation

Transcription unit calling

Evidence of bursting

Burst kinetics

Burst size

Burst frequency

Simulation of idealized burst kinetics

Cell cycle analysis

Gene–gene co-transcription

Enhancer–gene co-transcription

Enhancers of pluripotency factors

CRISPR-validated SEs

External data

Reporting summary

Pattern of neglect

Relieving a shortage

Compassionate use

A vast resource

Model modularity

About the study

Results