Tag: Transcriptomics

New ovarian atlas paves the way for extended fertility and hormone restoration

[ad_1]

A new “atlas” of the human ovary provides insights that could lead to treatments restoring ovarian hormone production and the ability to have biologically related children, according to University of Michigan engineers.

This deeper understanding of the ovary means researchers could potentially create artificial ovaries in the lab using tissues that were stored and frozen before exposure to toxic medical treatments such as chemotherapy and radiation. Currently, surgeons can implant previously frozen ovarian tissue to temporarily restore hormone and egg production. However, this does not work for long because so few follicles-;the structures that produce hormones and carry eggs-;survive through reimplantation, the researchers say.

The new atlas reveals the factors that enable a follicle to mature, as most follicles wither away without releasing hormones or an egg. Using new tools that can identify what genes are being expressed at a single-cell level within a tissue, the team was able to home in on ovarian follicles that carry the immature precursors of eggs, known as oocytes.

“Now that we know which genes are expressed in the oocytes, we can test whether affecting these genes could result in creating a functional follicle. This can be used to create an artificial ovary that could eventually be transplanted back into the body,” said Ariella Shikanov, U-M associate professor of biomedical engineering and corresponding author of the new study in Science Advances.

The majority of the follicles, called primordial follicles, remain dormant and are located in the outer layer of the ovary, called the cortex. A small portion of these follicles activate periodically and migrate into the ovary, to a region known as the growing pool. Only a few of those growing follicles go on to produce mature eggs that get released into the fallopian tube.

With the ability to guide follicle development and tune ovarian environment, the team believes that engineered ovarian tissue could function for much longer than unmodified implanted tissue. This means that patients would have a longer fertility window as well as a longer period in which their bodies produce hormones that help regulate the menstrual cycle and support muscular, skeletal, sexual and cardiovascular health.

We’re not talking about utilizing a surrogate mother, or artificial insemination. The magic we’re working toward is being able to trigger an immature cell into maturity, but without knowing which molecules drive that process, we’re blind.”

Jun Z. Li, associate chair of U-M’s Department of Computational Medicine and Bioinformatics and co-corresponding author of the study

U-M’s team utilized a relatively new technology, called spatial transcriptomics, to track all of the gene activity-;and where it occurs-;in tissue samples. They do this by reading strands of RNA, which are like notes taken from the DNA strand, revealing which genes are being read. Working with an organ procurement organization, U-M researchers performed RNA sequencing of ovaries from five human donors.

“This was the first time where we could target ovarian follicles and oocytes and perform a transcription analysis, which enables us to see which genes are active,” Shikanov said.

“The majority of ovarian follicles, already present at birth, never enter the growing pool and eventually self-destruct. This new data allows us to start building our understanding of what makes a good egg-;what determines which follicle is going to grow, ovulate, be fertilized and become a baby.”

U-M’s work is part of the Human Cell Atlas project, which seeks to create “maps of all the different cells, their molecular characteristics and where they are located, to understand how the human body works and what goes wrong in disease.”

Shikanov, Li and U-M collaborators such as Sue Hammoud, U-M associate professor of human genetics and urology, are mapping other parts of the female reproductive system, including the uterus, fallopian tubes and ovaries. Other contributors include Andrea Suzanne Kuliahsa Jones, formerly of U-M and now at Duke University, and D. Ford Hannum, a U-M graduate student research assistant in bioinformatics.

The research was partially funded by the Chan Zuckerberg Initiative. Additional financial support was provided by the National Institutes of Health.

Source:

Journal reference:

Jones, A. S. K., et al. (2024) Cellular atlas of the human ovary using morphologically guided spatial transcriptomics and single-cell sequencing. Science Advances. doi.org/10.1126/sciadv.adm7506.

[ad_2]

Source link

April 6, 2024
how a small team created the largest mouse-embryo atlas so far

[ad_1]

Building an atlas of all the cell types that make up the body typically requires multinational collaborations and massive budgets. But a technique that can analyse the genetic activity of hundreds of thousands of individual cells at a time has allowed one small team to produce a time-lapse atlas of an embryonic mouse’s cells over ten days of development. The atlas, which was created by three researchers in one year for approximately US$370,000, could help scientists to understand how stem cells turn into specific cell types, how organs develop and even how the body changes just after it is born.

The study, which was published in Nature in February¹, “is impressive at many levels, both the scale of what they achieved and how they achieved it”, says Bertie Göttgens, a stem-cell biologist at the University of Cambridge, UK, who was not involved in the study.

Geneticist Jay Shendure at the University of Washington in Seattle doesn’t normally study mouse development. His laboratory is known for establishing molecular-biology techniques, including one called sci-RNA-seq3 that allows researchers to survey the assemblage of messenger RNA (mRNA), known collectively as the transcriptome, in individual cells.

Which single-cell analysis tool is best? Scientists offer advice

Instead of looking at whole cells, which would be difficult to keep intact through the process, scientists grind up a sample — in this case, a whole mouse embryo — and isolate its cell nuclei. They split these nuclei into individual dishes and add a different molecular tag to the mRNA in each dish. Next, they combine the nuclei, separate them again, mark each dish with a new tag and repeat. Eventually, each nucleus acquires a unique collection of tags — a molecular barcode — that the researchers can use to determine which tags define the cell’s transcriptome. They can then sequence these cells’ mRNA and construct a ‘tree’ that models how one cell type can turn into another, doing so across multiple animals of different ages on the basis of the genes they express.

Missing moments

Two of Shendure’s lab members, postdoc Chengxiang Qiu and research scientist Beth Martin, decided to demonstrate sci-RNA-seq3 by charting the single-cell transcriptomes of embryonic mice during the animals’ roughly 19-day gestation period. At first, they collected embryos every 24 hours over a 5-day period, but the transcriptomes changed so much between time points that it was difficult to follow how stem cells turned into specific cell types over time². Shendure likens it to a video that is missing too many frames: more like a stop-motion animation than a smooth progression.

So Martin and Qiu partnered with research scientist Ian Welsh at the Jackson Laboratory, a research institute and mouse-breeding facility in Bar Harbor, Maine. Welsh painstakingly collected 83 mouse embryos at 2–6-hour intervals over 10 days of gestation, from the point at which organs start to develop up until just after the animal’s birth. Welsh snap-froze the embryos and sent them to Seattle, where Martin collected single-cell transcriptomes. Qiu then mapped the data into trees that show when and how each of 190 cell types — liver or bone-marrow cells, for instance — originates in an embryo.

Smart software untangles gene regulation in cells

To flesh out the tree, the researchers integrated their data, which began eight days into gestation, with existing work from Shendure’s team and others that had mapped the transcriptomes of these and younger embryos. This added another 110,000 cells to the mix, and these data formed the tree’s ‘roots’, allowing the researchers to follow the branching of early stem cells into specific types seen in the older embryos.

The resulting atlas, containing the transcriptomes of mice across 45 time points, is now available for developmental biologists to study in more depth. With 12.4 million cells, it is the largest mouse-embryo atlas so far and is nearly one-quarter the size of the cell data collected by the Human Cell Atlas collaboration, which comprises 700 labs attempting to map all of the cells in the human body.

A 2D visualization of the mouse-atlas data set, with colours corresponding to 26 major cell types.Credit: C. Qiu et al./Nature

“It’s a fantastic resource for the community,” says cellular geneticist and Human Cell Atlas co-founder Sarah Teichmann at the Wellcome Sanger Institute in Hinxton, UK. Teichmann points out that there is still work to be done on the mouse atlas. Some time points have more complete transcriptomes than others, and the researchers have not yet separated mice by sex to look at those differences. But she says it will enable a number of studies, including the ability to compare mouse and human development. Shendure says he and his team plan to create single-cell atlases of juvenile and adult mice from conception to death.

Stress effects

Although Shendure and his group aim to let others conduct in-depth biological analyses of the data, they did note two phenomena in their paper. The point at which the transcriptome changed most dramatically, they found, was in the hour just after birth, which Shendure calls “the most stressful moment in your life”. Some of those differences were expected — lung and fat cells changed activity to cope with being outside the uterus, for instance — but other changes are still unclear.

How single-cell multi-omics builds relationships

Pure luck led them to another finding. To get the timing just right, Welsh typically delivered the mice by caesarean section. But one day, he returned from lunch to an unexpected nest of newborn pups. Martin processed the mice anyway and found that their transcriptomes were significantly different from those of mice born by caesarean section. Those differences could explain the variation in health outcomes seen between people who were born by these two methods, the researchers say.

Yonatan Stelzer, an epigeneticist at the Weizmann Institute of Science in Rehovot, Israel, says the study is encouraging for future efforts to map the cells of individual organs or tissues. The next step for embryos, he says, will involve not only studying how cells develop over time, but also following them through space in 3D, tracking how they split and move to form a whole mouse. Future research, he adds, could also investigate questions such as how two cells with similar transcriptomes end up with different fates to become the right or left eye, for instance. “We’re still far from solving the entire embryonic puzzle,” he says.

[ad_2]

Source link

March 20, 2024
Neuron migration to brain regions key to memory and navigation continues into childhood
[ad_1]
- RESEARCH BRIEFINGS
- 20 March 2024
This study identifies a major migratory route for young neurons in the brains of young children. This route forms during pregnancy and links the birthplace of these nerve cells to their destination in highly interconnected brain regions that are responsible for memory and spatial processing.
[ad_2]

Source link
March 20, 2024
Spatially organized cellular communities form the developing human heart

[ad_1]

Experimental procedures

Tissue samples

Heart samples were collected in strict observance of the legal and institutional ethical regulations. The heart samples were collected under a University of California San Diego (UCSD) Human Research Protections Program Committee Institutional Review Board (IRB)-approved protocol (IRB number 081510) by the UCSD Perinatal Biorepository’s Developmental Biology Resource after informed consent was obtained from the donor families. All experiments were performed within the guidelines and regulations set forth by the IRB (IRB number 101021, registered with the Developmental Biology Resource). Ethical requirements for data privacy include that sequence-level data (for example, fastq files) be shared through controlled-access databases.

Tissue processing

Tissue samples were collected in buffer containing 10 mM HEPES pH 7.8, 130 mM NaCl, 5 mM KCl, 10 mM glucose, 10 mM BDM, 10 mM taurine, 1 mM EDTA and 0.5 mM NaH₂PO₄, and overall morphology was checked under a stereotaxic dissection microscope (Leica).

For single-cell dissociation, tissue samples from eight hearts were further cut into small pieces and enzymatically digested by incubating with collagenase type IV (Gibco) and Accutase (ThermoFisher) at 37 °C for 60 min. After removing the dissociation medium, cells were resuspended in PBS supplemented with 5% FBS and sorted using a Sony SH800 sorter. Samples were diluted to approximately 1,000 cells per µl before processing for scRNA-seq, as shown in Supplementary Fig. 1a.

Samples for MERFISH were washed with ice-cold PBS and then fixed in 4% paraformaldehyde at 4 °C overnight. On the second day, samples were washed in ice-cold PBS 3 times, 10 min each, and were incubated in 10% and 20% sucrose at 4 °C for 4 h each, and in 30% sucrose overnight, followed by immersion with OCT (Fisher, 23-730-571) and 30% sucrose (v/v) for 1 h. The samples were then embedded in OCT and stored at −80 °C until sectioning.

hPSCs

For the single-cell and bioprinting studies, a H9-hTnnTZ-pGZ-D2 hPSC line (TNNT2:eGFP hPSC cardiomyocyte reporter line) was purchased from WiCell and maintained as previously described⁴¹. For the bioprinting studies, an additional engineered TNNT2:NLS-mKATE2 RUES2 hPSC cardiomyocyte transgenic reporter line that specifically expresses the mKATE2 fluorescent protein containing a nuclear localization signal (NLS-mKATE2) in differentiated cardiomyocytes was used (Supplementary Fig. 18a). Both lines were routined authenticated with fluorescence microscopy, immunofluorescence and flow cytometry studies, and tested negative for mycoplasma contamination by PCR. To generate the TNNT2:NLS-mKATE2-T2A-BsdR RUES2 hPSC cardiomyocyte reporter line (TNNT2:NLS-mKATE2), we transfected a RUES2 hPSC line with a Piggybac (PB) construct expressing NLS-mKATE2-T2A-BsdR driven by the cardiomyocyte-specific TNNT2 promoter. To clone the PB-TNNT2:NLS-mKATE2-T2A-BsdR, we used the PB plasmid pcsj532 (a gift from K. Willert, UCSD) and used Gibson assembly (SGI, GA1200) to clone in a synthesized TNNT2 promoter⁴² (Integrated DNA Technologies), PCR-amplified NLS-mKATE2-T2A-BsdR (with polyA) from pgRNA-CKB⁴³ (a gift from B. Conklin, Gladstone; Addgene, plasmid 73501) and PCR-amplified PGK:PuroR from RT3GEPIR⁴⁴ (a gift from J. Zuber, IMP, Austria; Addgene, plasmid 111169). All three components were assembled in one Gibson assembly with pcsj532 digested using NheI (NEB R3131L). RUES2 hPSCs were transfected using Lipofectamine STEM (Invitrogen, STEM00015) with the PB-TNNT2:NLS-mKATE2-T2A-BsdR and a plasmid expressing a human-optimized PB transposase (pcsj533, a gift from K. Willert, UCSD) to integrate the PB. Two days after transfection, the cells were selected using 0.4 μg ml^–1 puromycin. The subsequent surviving cells behaved similarly to the parental and the TNNT2:eGFP hPSC lines in terms of proliferation and differentiation. Protocols were approved by the IRB (number 190561) at UCSD.

hPSC cardiac cell differentiations and sample preparation

hPSC lines were cultured in E8 medium and grown on Geltrex (Gibco)-coated plates. Differentiation of hPSCs into cardiomyocytes was performed using established protocols as previously described^41,45,46. In brief, hPSCs were grown to 80% confluency, and on day 0 (D0), cells were cultured with RPMI/B27 supplement without insulin (B27 minus insulin; ThermoFisher) containing 10 µM CHIR (Fisher Scientific). After 24 h of CHIR application, the medium was replaced with fresh B27 without insulin and the cells were cultured for another 48 h. Next (D3), 5 µM IWP2 (Tocris) was supplemented to B27 without insulin and cultured for another 48 h. At D5, the B27 without insulin and with IWP2 was replaced with fresh B27 without insulin for another 48 h. From D7 onwards, cells were maintained in RPMI/B27 with insulin (B27, ThermoFisher). On D15, this B27 medium was then supplemented with either NRG1 (50 ng ml^–1)⁴⁷ or PBS, and further cultured until D30 and greater, refreshing the medium every 3 days.

scRNA-seq studies performed on hPSC-derived samples were prepared as described in the ‘Tissue processing’ section. In brief, D25 hPSC-derived cardiac cells were enzymatically digested by incubating with collagenase type IV (Gibco) and Accutase (ThermoFisher) at 37 °C for 60 min. After removing the dissociation medium, cells were resuspended in PBS supplemented with 5% FBS and sorted using a Sony SH800 sorter.

Animal studies

Animal studies were conducted in strict compliance with the Guide for the Care and Use of Laboratory Animals published by the National Institutes of Health and protocols approved by the Institutional Animal Care and Use Committee of UCSD (A3033-01). Mice were maintained on a 12 h–12 h light–dark cycle in a controlled temperature (20–22 °C) and humidity (30–70%) environment. The generation of Tcf21-creERT2 and Sema3c^fl/fl mice has been previously described^48,49. To validate the genotype of the mice, genomic DNA was extracted by adding a 2 mm tail clipping to a 75 µl solution containing 25 mM NaOH and 0.2 mM EDTA, and then heating the sample for 30 min at 98 °C. Next, 75 µl of 40 mM Tris-HCl (pH 5.5) was then added to neutralize the reaction, and a 1:50 dilution of genomic DNA template was used for genotyping PCR. Both male and female embryos were used in this study; the embryos were not genotyped to determine sex. To determine the developmental stage of embryonic development during which tamoxifen treatment was administered, noon on the day of the vaginal plug was assumed to be E0.5. Tamoxifen (Sigma, T5648-1G, 0.1 mg g^–1 body weight) was fed to pregnant mice by gavage at E10.5, and hearts were collected at E12.5, E14.5, E17.5 and postnatal day 1. The fixed hearts were embedded in paraffin, sectioned and stained with haematoxylin and eosin by the UCSD Histology Core. Images were taken on a Hamamatsu Nanozoomer Slide Scanning system and an Olympus VS200 slide scanner, and processed using NDP View 2 software (Hamamatsu) and QuPath (v.0.4.3)⁵⁰, respectively. Phenotypic analyses of ventricular wall thickness were performed as previously described⁵¹. In brief, the thicknesses of ventricular compact and trabecular zones were measured at the level of the papillary muscle, with measurements taken from at least three areas per section, and at least three sections per mouse.

Single-cell transcriptome library preparation and sequencing

Single-cell droplet libraries using the cell suspensions from the Sony SH800 sorter were prepared according to the manufacturer’s instructions using the 10x Genomics Chromium controller, Chromium Single Cell 3′ Library and Gel Bead kit v2 (PN-120237) and Chromium i7 Multiplex kit (PN-120262). All libraries were sequenced on a HiSeq 4000 (Illumina) to a mean read depth of at least 65,000 total aligned reads per cell.

MERFISH gene selection

To spatially detect cell populations identified in the scRNA-seq dataset, we designed a panel of 238 genes specific for these subpopulations. These genes were then simultaneously imaged on cardiac samples using the combinatorial barcoded imaging technique MERFISH⁷. We initially identified gene markers differentially expressed for each of the 75 cell subpopulations by performing differential gene expression (DGE) analyses as well as applying a NS-Forest2 (ref. ⁵²) classifier on scRNA-seq data obtained from the aforementioned human hearts in Supplementary Fig. 1. All markers were combined from the binary gene analysis utilizing NS-Forest2 (ref. ⁵²) (159 genes) (Supplementary Table 7) and DGE analysis (7,557 genes) (Supplementary Table 3) of the cell subpopulations, and were then filtered for genes that were either not long enough to construct 48 target regions (each 30-nucleotides long) without overlap or for which expression levels were outside the range of 0.01–300 average unique molecular identifier (UMI) per cluster, as measured by scRNA-seq. The performance of identifying marker genes between NS-Forest2 and Spapros⁵³ pipelines was also compared. The initial result of Spapros produced 80 genes, which is half the number chosen by NS-Forest2. To compare a similar number of genes between NS-Forest2 and Spapros, these 80 genes were removed from the dataset and Spapros was run again, which selected another 90 genes. The combination of these two sets of genes were used for the Spapros gene list (Supplementary Table 9). To quantify the ability of the selected genes to re-identify cell subpopulations at the same granularity as annotated in the scRNA-seq data, the dimensionality reduction and neighbour graph were recalculated using only the genes selected by the algorithm (NS-Forest2 or Spapros) that was being evaluated. Each cell was then reassigned its cell subpopulation label based on the majority cell subpopulation of its five nearest neighbours in the new neighbour graph. The percentage of cells reassigned their original label was used as an accuracy metric. With this metric, we found that NS-Forest2 and Spapros chose genes with similar performance (Supplementary Table 10). Among the 238 MERFISH target genes, 63 were manually selected from the DGE and NS-Forest2 gene lists, including established markers for atrial, ventricular and non-chambered cardiomyocytes, as well as non-cardiomyocyte cell markers for fibroblasts, pericytes, VSMC, epicardial, endocardial, BEC, LEC and immune cells. Genes specific for platelet–red blood cells were not selected. To validate the final target gene list, we tested whether we could transcriptionally rederive the cell populations by cluster analyses using only the 238 target genes. To this end, we reduced the scRNA-seq dataset to only the 238 genes in the MERFISH gene panel and then performed dimensionality reduction, graph-based clustering and UMAP visualization. We observed a similar level of transcriptional separation and definition of cell classes between using the 238 target genes versus using the 3,000 variable genes chosen to annotate the cell classes in the scRNA-seq data (Fig. 1a and Extended Data Fig. 1b). In addition to the 238 MERFISH genes, we selected 11 genes that were imaged sequentially using smFISH (Supplementary Table 11), including genes that validated the combinatorial MERFISH imaging (Extended Data Fig. 2d).

MERFISH probe library design and construction

A total of 238 genes were identified as MERFISH target genes, which were subsequently used for probe generation and MERFISH assays as shown in Supplementary Table 11. To encode MERFISH RNA probes for spatial detection, a 22-bit modified Hamming distance 4 code was used⁸. Each of the 238 possible barcodes required at least 4 accumulated errors to be converted into another barcode. This property permitted the detection of errors up to any 2 bits, and the correction of errors to any single bit. In addition, this encoding scheme used a constant Hamming weight (that is, the number of 1 bits in each barcode) of 4 to avoid potential bias in the measurement of different barcodes due to a differential rate of 1 to 0 and 0 to 1 errors and because the optical density within each bit can interfere with resolving individual fluorescent spots, as previously described⁶. We used 238 out of the 7,315 possible barcodes to encode cellular RNAs and chose 10 barcodes that were left unassigned to serve as blank controls. The encoding probe set that we used contained 30–48 encoding probes per RNA, with each encoding probe containing 3 out of the 4 readout sequences assigned to each RNA. Encoding probes were designed using our own pipeline, namely, ProbeDesign. ProbeDesign was developed using a fully optimized algorithm in C++ for both DNA and RNA probes. ProbeDesign used the same principles of probe design utilized by various published algorithms (OligoArray⁵⁴, OligoMiner⁵⁵, OligoPaint⁵⁶ and ProbeDealer⁵⁷), for which off-targets are based on genome-wide 17-nucleotide off-target counts. Probes were selected with similar GC content or melting temperature, and the repetitive regions were used for off-target counting but not for probe design. ProbeDesign was implemented in three steps. (1) Build a 17-nucleotide index based on the reference genome (DNA) or genome annotation files (RNA). This step is fully optimized with bit-vector and hash tables for high-performance computation; (2) Scan selected loci or genome sequences to calculate the off-targets based on the 17-nucleotide counts in the previous step. And (3), filter and rank probe candidates based on predefined selection criteria. For the RNA probe design, we used the transcript sequences derived from the human reference genome sequences (hg38) downloaded from ncbi_refseq (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/). The generation of the encoding probe sets were prepared from oligonucleotide pools, as previously described^58,59. In brief, we first used limited-cycle PCR to amplify the oligopools (Twist Biosciences). Then, we used these DNA sequences as the templates for in vitro transcription into RNA using T7 polymerase (NEB, E2040S). Subsequently, the RNA products were converted into single-stranded DNA with Maxima Reverse Transcriptase enzyme (Thermo Scientific, EP0751), and then the DNA was purified by alkaline hydrolysis (to remove the RNA templates) followed by DNA oligo purification kits (Zymo Research, D4060).

MERFISH sample preparation

Frozen hearts were sectioned at −20 °C on a cryostat (Leica CM3050S). A series of 12 μm coronal slices were cut at about 600 μm along the anterior–posterior axis of collected human hearts, which captured all of the major cardiac structures. MERFISH measurements of 238 genes with 10 non-targeting blank controls were performed as previously described using the encoded barcode sequences (Supplementary Table 11) and published readout probes⁶⁰. In brief, 12-μm-thick tissue sections were mounted on 40 mm no. 1.5 coverslips that were silanized and poly-l-lysine coated⁵⁸ and subsequently pre-cleared by immersing into 50% (v/v) ethanol, 70% (v/v) ethanol and 100% ethanol, each for 5 min. The tissue was then air-dried for 5 min, followed by treatment with Protease III (ACDBio) at 40 °C for 30 min, and then washed with PBS for 5 min. Tissues were then preincubated with hybridization wash buffer (30% (v/v) formamide in 2× SSC) for 10 min at room temperature. After preincubation, the coverslip was moved to a fresh 60 mm Petri dish, and the residual hybridization wash buffer was removed with a Kimwipe laboratory tissue. In the new dish, the coverslip was immersed with 50 μl of encoding probe hybridization buffer (2× SSC, 30% (v/v) formamide, 10% (w/v) dextran sulfate, 1 mg ml^–1 yeast tRNA and a total concentration of 5 μM encoding probes and 1 μM of anchor probe: a 15-nucleotide sequence of alternating dT and thymidine-locked nucleic acid (dT+) with a 5′-acrydite modification (Integrated DNA Technologies)). The sample was then placed in a humidified 37 °C oven for 36–48 h and then washed with hybridization wash buffer for 20 min at 37 °C and 20 min at room temperature. Samples were post-fixed with 4% (v/v) paraformaldehyde in 2× SSC and then washed with 2× SSC with murine RNase inhibitor for 5 min. To anchor the RNAs in place, the encoding probe-hybridized samples were embedded in thin, 4% polyacrylamide (PA) gels as previously described⁵⁸. In brief, the hybridized samples on coverslips were first washed with a de-gassed 4% PA solution, consisting of 4% (v/v) of 19:1 acrylamide/bis-acrylamide (Bio-Rad, 1610144), 60 mM Tris⋅HCl pH 8 (ThermoFisher, AM9856), 0.3 M NaCl (ThermoFisher, AM9759) and a 1:1,000 dilution of 0.1-µm-diameter blue fluorescent (350/440) beads (Life Technologies, F-8797). The beads served as fiducial markers for the alignment of images taken across multiple rounds of imaging. The coverslips were then washed again for 2 min with the same 4% PA gel solution supplemented with the polymerizing agents ammonium persulfate (Sigma, A3678) and TEMED (Sigma, T9281) at final concentrations of 0.03% (w/v) and 0.15% (v/v), respectively. The gel was then allowed to cast for 1.5 h at room temperature. The coverslip and the glass plate were then gently separated, and the PA film was incubated with a digestion buffer consisting of 50 mM Tris⋅HCl pH 8, 1 mM EDTA, 0.5% (v/v) Triton X-100 in nuclease-free water and 1% (v/v) proteinase K (New England Biolabs, P8107S). The sample was digested in this buffer for >36 h in a humidified, 37 °C incubator and then washed with 2× SSC 3 times. The samples were finally stained with an Alexa 488-conjugated anchor probe-readout oligonucleotide (Integrated DNA Technologies) and DAPI solution at 1 μg ml^–1. MERFISH measurements were conducted on a home-built system as previously described⁶⁰.

Immunofluorescence studies

For the immunofluorescence studies of the TNNT2:NLS-mKATE2 hPSC line, D25 hPSC-derived cardiac cells were dissociated, replated and then cultured for another 4 days before being fixed. The fixed cells were then immunostained, as previously described⁴¹, using an antibody for TNNT2 (A647 mouse anti-cardiac troponin T, BD 565744, 1:200). Nuclei were visualized using DAPI (1 µg ml^–1, Roche) staining. Immunofluorescent images were taken on a Nikon C2 confocal microscope.

Real-time quantitative PCR

RNA expression was measured by quantitative PCR (qPCR). RNA was extracted using TRIzol reagent (ThermoFisher) and a Direct-zol RNA MiniPrep kit (Zymo Research). cDNA was generated using 1,000 ng RNA mixed with iScript Reverse Transcription Supermix (Bio-Rad) and then diluted 1:10 in UltraPure DNase/RNase-free distilled water (ThermoFisher). qPCR was performed using Power SYBR Green master mix (ThermoFisher) according to the manufacturer’s recommendations, and a two-step amplification CFX_2stepAmp protocol on a Bio-Rad CFX Connect Real-Time PCR Detection system. Data were analysed using the 2^−ΔΔCt method. All gene expression was normalized to the expression of TATA box binding protein (TBP). Primer sequences are listed in Supplementary Table 22.

In vitro hPSC ventricular wall model

To create an in vitro hPSC ventricular wall model for studying ventricular wall morphogenesis, we bioprinted cardiomyocytes in multilayered constructs as shown in Fig. 5. To this end, we differentiated TNNT2:eGFP and TNNT2:NLS-mKATE2 hPSCs into D15 cardiomyocytes (hPSC-CMs) as described in the ‘hPSC cardiac differentiations and sample preparation’ section. D15 TNNT2:eGFP hPSC-CMs were further treated with NRG1 to create trabecular-like CMs as previously described⁴⁶. As controls, D15 TNNT2:eGFP and TNNT2:NLS-mKATE2 hPSC-CMs were treated with PBS. D30+ hPSC-CMs (>90% efficiency by flow cytometry) were then enzymatically dissociated with collagenase type IV (Gibco) and Accutase (ThermoFisher) and resuspended at 100 million cardiomyocytes per ml. The method to bioprint multilayered constructs involved printing D30+ hPSC-CMs that were treated with NRG1 or PBS into a rectangle with finger-like projections that was 500 × 700 × 600 µm (height × width × length) (inner-LV CC-like layer), followed by printing an adjacent rectangular structure (500 × 700 × 75 µm) (intermediate-LV CC-like layer) containing gelatin methacryloyl (GelMA)⁶¹ mixed with 100 ng ml^–1 of different combinations of SEMA3C, SEMA3D, SEMA6A or SEMA6B (R&D Systems) proteins as described in Fig. 5b. The concentration of SEMA3C used for the conditions in Fig. 5c was 1,000 ng ml^–1 because SEMA3C was located further from the inner-LV CC-like layer. The cell-encapsulated layer was fabricated using 6.25% GelMA and 0.33% LAP with 15 s of light exposure time, and the cells were mixed with the monomer solution directly before bioprinting. The adjacent layer containing the signalling factors was printed using 4% GelMA and 0.4% LAP with 15 s of light exposure time. Using a methacrylated coverslip fixed to the controller stage, a 20 µl cell–material mixture was placed into the space between the coverslip and a polydimethylsiloxane (PDMS) film attached to a glass slide. This cell–material mixture was then exposed to UV light (365 nm, 88 mW cm^–2) with a pattern generated by a digital micromirror device chip. After printing each layer, the construct was washed three times with warm PBS and aspirated dry. Finally, the multilayered construct was washed in both PBS and medium, and then stored in a cell culture incubator (37 °C, 5% CO₂). Medium was refreshed every other day.

Data analysis

Processing of raw sequencing reads

Raw sequencing reads were processed using the Cell Ranger (v.3.0.1) pipeline from 10x Genomics. Reads were demultiplexed and aligned to the human hg38 genome, and UMI counts were quantified per gene per cell to generate a gene–barcode matrix.

Cell filtering and clustering

After generating the gene–barcode matrix file from Cell Ranger, the individual count matrices were merged together and processed using the Seurat (v.4.0.1) R package⁶² (https://satijalab.org/seurat/). Further filtering and clustering analyses of the scRNA-seq cells were performed using the Seurat package, as described in the tutorials (https://satijalab.org/seurat/). Cells with at least 1,000 genes detected and a mitochondrial read percentage of less than 30% were used for downstream processing. Potential doublets were removed using DoubletFinder (v.2.0)⁶³ (https://github.com/chris-mcginnis-ucsf/DoubletFinder) using an anticipated doublet rate of 5%, which is the expected rate reported by 10x Genomics for the number of cells loaded onto the 10x Controller. For the aggregated dataset, gene expression was normalized for genes expressed per cell and total expression using the NormalizeData function. The top 3,000 variable genes were detected using the FindVariableFeatures function with default parameters. All of the genes were subsequently scaled using the ScaleData function, which utilizes a linear regression model to eliminate technical variability due to the number of genes detected, replicate differences and mitochondrial read percentage. Principal components were calculated using RunPCA, and the top 50 principal components (supported by ElbowPlot showing diminishing variance explained beyond the top 50 principal components) were used for creating the nearest neighbour graph utilizing the FindNeighbors function with k.param = 50. The generated nearest neighbour graph was then used for graph-based, semi-unsupervised clustering (FindClusters, default resolution of 0.8) and UMAP to project the cells into two dimensions. Marker genes were identified using a Wilcoxon rank-sum test (FindAllMarkers, default parameters) for one-versus-all comparisons for each of the cell clusters. Cell identities were assigned to the clusters by cross-referencing their marker genes with known cardiac cell type markers from both human and mouse studies, in addition to in situ hybridization data from the literature^{10,11,12,13,14,15}. On occasion, a cell cluster would emerge that expressed marker genes representing multiple populations, as well as contained cells with low UMI and gene counts that escaped the first filtering step. These cells were removed from downstream analyses. The clustering approach was then repeated for each compartment of cells (cardiomyocyte, mesenchymal, endothelial, neuronal and blood) as described above, and the clustering accuracy was evaluated using SCCAF (v.0.0.10)⁶⁴ with the following parameters: linear regression with L1 regularization with a 50/50 train-test split and a fivefold cross validation. For the adult ventricle scRNA-seq comparison, we combined previously published datasets³³ with our developing heart scRNA-seq dataset and re-ran the NormalizeData function to compare gene expression between these datasets.

Label transfer analysis

Cell annotations from the scRNA-seq dataset were compared to a recently published adult heart dataset¹⁸ utilizing scArches (v.0.5.9)⁶⁵. For scArches, both the adult and developing transcriptomic datasets were integrated using scVI (v.1.0.3)⁶⁶ (n_hidden=128, n_latent=50, n_layers=3, dispersion = ‘gene-batch’). A reference hierarchy tree was then constructed using the treeArches⁶⁷ workflow (https://docs.scarches.org/en/latest/treeArches_identifying_new_ct.html) with default parameters and “cell_state” labels on the adult heart published reference dataset¹⁸. Labels from the reference dataset were then transferred to the developing heart query dataset to predict the cell labels utilizing the scHPL.predict.predict_labels() function with default parameters. Rejected cells were calculated using the posterior probability (default option) with a 0.5 threshold.

Gene regulatory network analysis

To identify age-related changes in gene expression, we applied the pySCENIC (v.0.12.1)⁶⁸ gene regulatory network (GRN) inference tool to our scRNA-seq dataset. To this end, the scRNA-seq dataset was split by cell class, and pySCENIC analysis was performed to identify cell-class-specific regulons following the standard pipeline on GitHub (https://github.com/aertslab/SCENICprotocol/). In brief, we performed three steps to create a GRN for each cell class: (1) GRN inference using the GRNBoost2 algorithm, (2) transcription factor (TF) regulon predictions and (3) cellular enrichment area under the curve (AUC, measure of regulon activity) calculation for each cell. The resulting AUC matrix was then used to identify the regulons with the most significant change of activity over age by fitting a linear model to regulon activity and identifying regulons with the highest positive and negative rate of change. To find the functional pathways enriched in each set of regulons, we performed gene ontology enrichment analysis using the EnrichR Bioconductor package (v.3.1)⁶⁹. On the same regulons, we constructed a regulatory network with the top 100 non-redundant edges of the network by importance score, and visualized the edges, transcription factors and target genes using Cytoscape (v.3.8.0)⁷⁰. For the overall GRN of vCMs, we constructed a regulatory network with the top 50 transcription factors by centrality and then took the top 500 non-redundant edges of the network by importance score and visualized the edges and transcription factors using Cytoscape.

CCI analysis

We applied CellChat (v.1.6.1)⁷¹ to our scRNA-seq dataset to identify region-specific CCIs. Atrial cells and ventricular cells were divided based on their region of dissection (LA/RA for atrial and LV/RV/IVS for ventricular) and were analysed separately. Next, we followed the suggested workflow of CellChat, using its database of human ligand–receptor interactions (with the addition of the NRG1–ERBB2 signalling pathway owing to its known biological role during cardiac development^31,36,37), identifying overexpressed genes, computing interaction probabilities and discovering significant interactions based on default parameters. This pipeline was performed using all cell classes present in each region (except for platelet–red blood cells) to calculate potential CCIs.

Developmental trajectory analysis

To identify a developmental trajectory of vCMs within our scRNA-seq dataset, we used the Waddington-OT (v.1.0.8)⁷² package. The vCM cell class was isolated, which represents subpopulations C1–C11 of the cardiomyocyte compartment, and the corresponding cells were used for Waddington-OT trajectory inference as outlined in GitHub (https://broadinstitute.github.io/wot/tutorial/) without the optional step of estimating initial growth rates. Transport matrices were calculated for each adjacent pair of time points (9 p.c.w.–11 p.c.w., 11 p.c.w.–13 p.c.w., and 13 p.c.w.–15 p.c.w.) and then the trajectory was computed by calculating the descendent distributions at the 9 p.c.w stage. Normalized pseudotime values used for subsequent analyses were calculated by taking the quantile for each cell ranked by raw pseudotime value. Following the construction of the developmental trajectory, the resulting pseudotime for vCMs was utilized to order the GRN and CCIs of the vCMs by determining the expression-weighted pseudotime of each respective transcription factor and receptor or ligand expressed by vCMs as previously described⁷³.

MERFISH processing

For processing MERFISH data, individual RNA molecules were decoded using MERlin (v.0.6.1) as previously described⁸. Images were aligned across hybridization rounds by maximizing phase cross-correlation on the fiducial bead channel to adjust for drift in the position of the stage from round to round. Background was reduced by applying a high-pass filter, and decoding was then performed per pixel. For each pixel, a vector was constructed of the 22 brightness values from each of the 22 rounds of imaging. These vectors were then L2-normalized, and their Euclidean distances to each of the L2-normalized barcodes from MERFISH codebook were calculated. Pixels were assigned to the gene whose barcode they were closest to, unless the closest distance was greater than 0.512, in which case the pixel was not assigned a gene. Adjacent pixels assigned to the same gene were combined into a single RNA molecule. Molecules were filtered to remove potential false positives by comparing the mean brightness, pixel size and distance to the closest barcode of molecules assigned to blank barcodes versus molecules assigned to genes to achieve an estimated misidentification rate of 5%. The exact position of each molecule was calculated as the median position of all pixels consisting of the molecule.

Cellpose (v.1.0.2)⁷⁴ was used to perform image segmentation to determine the boundaries of cells and nuclei. Distinct RNA molecules were identified and assigned to individual cells as segmented by total polyadenylated RNA staining and DAPI staining, which allowed for the detection of cellular boundaries. The nuclei boundaries were determined by running Cellpose with the ‘nuclei’ model using default parameters on the DAPI stain channel of the pre-hybridization images. Cytoplasm boundaries were segmented with the ‘cyto’ model and default parameters using the polyT stain channel. RNA molecules identified by MERlin were assigned to cells and nuclei by applying these segmentation masks to the positions of the molecules. Any segmented cells that did not have any barcodes assigned were removed before constructing the cell-by-gene matrix.

smFISH computational analysis

Images were flatfield-corrected for the two gene channels (750 nm and 635 nm) and the fiducial marker (405 nm) channel. To reduce background noise for each hybridization round, the images of the preceding hybridization round were reduced in intensity and subtracted to obtain new background-subtracted images. The images were then locally normalized by subtracting a 15 × 15 blur from each pixel before undergoing maximum intensity projection into two dimensions. For transcript detection, the OpenCV function adaptiveThreshold was used with a block size of 41 pixels, and a subtracted constant ranging from −80 to −70 among our replicate smFISH experiments. This constant was empirically determined by choosing a value that ensured the resulting mask only captured visible fluorescent spots across diverse imaging planes for each gene. Using the regionprops function from Scikit-Image, we filtered out spots with an eccentricity value of 0 and cells with low pixel area (<4 pixels) to combat artefactual fluorescence detection. A global threshold was identified for the images of each gene by evaluating the values of features determined as nonspecific background (for example, irregular shape, low intensity). The coordinates of local brightness maxima that remained unattenuated after applying this global threshold were stored. Coordinates lying within the adaptiveThreshold mask boundaries were identified and counted as a single identified gene transcript. The images for each of the smFISH imaging rounds were aligned to the respective initial MERFISH hybridization round images to correct for microscopic drift using the fiducial marker channel. This was done by fitting spots to the fiducial bead markers of both sets of images, then minimizing the median distances between them. DAPI segmentation masks obtained from the MERFISH imaging were translated using this drift correction so that all identified gene transcript locations could accurately be assigned to the drift-corrected nuclei, which enabled reconstruction of a spatial mosaic of the cellular gene expression for each of our sequentially imaged gene targets. For the replicate smFISH experiments to validate MERFISH gene markers, each gene was imaged twice on the same colour channel but in different non-consecutive rounds, allowing for a more robust analysis by using the combination of the two images to reduce the effect of noise in the result. Genes were imaged on three colour channels: Alexa 750, Cy5 and Cy3. The genes were separated and analysed into batches of six, with the imaging being done in a pattern described as follows. In the first imaging round, genes A, B and C were imaged on the Alexa 750, Cy5 and Cy3 channels, respectively. On the second round, genes D, E and F were imaged on the same three channels. The third and fourth imaging rounds were then repeats of the first and second rounds. By imaging each gene twice, the data could be analysed as a pseudo-MERFISH experiment, whereby a codebook was designed with each gene having a barcode containing two ‘on’ bits in the two imaging rounds they were imaged. Using this codebook, the data was processed using the same method as the MERFISH data as previously described⁸.

Cell clustering analysis of MERFISH

With the cell-by-gene matrix, we followed a standard procedure as suggested in the Scanpy (v.1.8)⁷⁵ tutorial using Python (v.3.9) for processing MERFISH data. Count normalization, principal component analysis (PCA), neighbourhood graph construction and UMAP were performed using the default parameters in Scanpy. We performed Leiden clustering utilizing a resolution of 2 because the additional clusters gained at higher resolutions either did not have differentially expressed genes or were related to technical imaging artefacts. The top 20 differential genes identified by the rank_gene_groups() function were used to annotate each cluster. We further subclustered the vCM clusters using Leiden clustering at a default resolution of 1 to further annotate vCMs as compact and trabecular vCMs for both the left and right ventricles. To investigate the cellular populations in the ventricle specifically (both 13 p.c.w. and 15 p.c.w.), we manually defined the ventricular region, subsetted the MERFISH dataset to those cells populating the ventricle and performed Leiden clustering using a similar strategy to that used in the overall clustering (resolution of 5). Again, the top 20 genes identified by the rank_gene_groups() function were used to annotate each cluster.

Integration of the scRNAseq and MERFISH datasets

To integrate the scRNA-seq and MERFISH datasets, we first isolated both datasets to only the 238 MERFISH target genes interrogated by both modalities. We then utilized Scanpy’s implementation of Harmony to project both the scRNA-seq and MERFISH datasets into a shared PCA space⁷⁶. The dimensionality of the joint embedding was visualized using UMAP (min_dist=0.3, 30 nearest neighbours in Pearson correlation distance). The parameters matched those used in a previous publication of Harmony⁷⁶. To impute a complete expression profile and cell label for each MERFISH profile, we assigned the expression profile and cell label of the closest scRNA-seq cell to the MERFISH cell in the Harmony PCA space using the Euclidean distance metric (default number of PCs).

To evaluate the performance of the gene imputation method, we developed a predictability score for each gene which is the Pearson correlation between the imputed expression and measured image expression for all genes (Supplementary Fig. 14a). Because the shared embedding space is constructed using the 238 MERFISH target genes, it is expected that these genes would have higher predictability scores than genes not used in the construction. To avoid this bias, tenfold cross-validation was used to calculate independently the MERFISH and scRNA-seq gene predictability scores. To this end, a new shared embedding utilizing only 90% of the 238 MERFISH target genes was used to calculate the MERFISH and scRNA-seq gene predictability scores for the remaining 10% of genes that were not included for constructing the embedding. This process was repeated 10 times with a different 10% of genes being imputed by a different shared embedding each time to cover the full set of 238 genes. To calculate whether a predictability score represented a prediction that is a significant improvement over random prediction, we calculated predictability scores using a randomly connected neighbour graph. In other words, rather than predicting the expression from the cell with the most similar gene expression, the prediction was made from a randomly selected cell in the dataset, and then the predictability score was calculated between the measured expression values and these randomly predicted values. This process was repeated 100 times to estimate a normal distribution of predictability scores that result from random prediction. P values were then determined for the true predictability scores based on rejecting the null hypothesis that the true scores originated from the null normal distributions. Finally, these P values were corrected for multiple hypothesis testing using the Bonferroni method. We observed that the maximum scRNA-seq predictability score for a gene that failed this significance test (adjusted P value > 0.01) was 0.11.

Identifying CCs

We sought to define CCs that represented distinct shared cellular environments defined by the neighbouring cells in close proximity. To this end, we clustered each MERFISH cell based on the cell composition of neighbouring cells within a 150 μm zone, which represented a typical diffusion distance for extracellular signalling molecules⁷⁷. This zone sampled approximately 300 neighbours. The zone of each cell was therefore represented by a vector containing the relative proportions of each of the 27 identified cells in both the overall and ventricular subset of the MERFISH dataset. We then clustered the zones using Python’s scikit-learn (v.0.22) implementation of Kmeans with k = 13 for the overall MERFISH dataset or k = 9 for the ventricular subset of the MERFISH dataset, chosen by silhouette score. Thus, each MERFISH cell was assigned to 1 of the 13 or 9 CCs in the overall or ventricular subset of the MERFISH dataset, respectively.

To infer community-specific CCIs, cell annotations derived for each MERFISH cell were transferred to the nearest scRNA-seq profile in the Harmony joint embedding space and used for the pipeline of CellChat as described in the ‘CCI analysis’ section. The overall and ventricular communities were each analysed separately by analysing the scRNA-seq profiles assigned to those communities. For each overall or ventricular cellular community, we only considered cells that represented at least 5% or 3.5% of the community in the MERFISH dataset for CellChat CCI analysis.

Connectivity, ventricular wall depth and pseudotime analyses of vCMs

To visualize the similarity in the gene expression profiles of the vCMs, we isolated the vCM-LV-compact I, vCM-LV-compact II, vCM-LV-hybrid, vCM-LV-trabecular I, vCM-LV-trabecular II and vCM-LV/RV-Purkinje populations, and reperformed PCA, nearest neighbour graph construction and UMAP utilizing Scanpy with default parameters. We then used Scanpy’s implementation of partition based graph abstraction to construct a graph in which the nodes represent different vCMs, and the edges represent the degree of connectivity between the vCMs in the neighbourhood graph. This captured a measure of similarity between the vCMs.

To determine the distance of each MERFISH cell within the ventricular wall, the ventricular wall depth of each MERFISH cell was defined as the distance to the nearest epicardial cell or EPDC, which both lie on the outer layer of the heart. To compare ventricular wall depth between different ventricles, the wall depth values were normalized by dividing each value by the maximum depth of the corresponding ventricle. To identify depth correlated genes, we computed the Pearson correlation coefficient between expression and ventricular wall depth for each gene. We considered genes with a correlation greater than 0.2 to be depth correlated. Next, the diffusion pseudotime distance on the vCM nearest neighbour graph from the medoid of the vCM-LV-compact I cluster were calculated using Scanpy’s scanpy.tl.dpt() function with default parameters. We note that this metric is often reported as pseudotime to represent developmental changes, but in this case, we use it simply as a metric for expression similarity to vCM-LV-compact I cells. The scaled expression of the top genes correlating with pseudotime were plotted as a smoothed spline (Extended Data Fig. 6d) as previously described⁷⁸.

Migration distance measurement

The migration distance of the bioprinted hPSC-CMs was measured by using the D0 position as the starting point and calculating the distance migrated by the hPSC-CMs daily. In brief, brightfield and fluorescent (green and red) confocal images of the GFP⁺ and NLS-mKATE2⁺ hPSC-CMs were taken on a Leica SP5, with the brightfield images used to visualize the construct. Because of minor variations in size and cell number between printed constructs, we normalized the migration distances by first dividing the width of the construct into 15 even blocks for each image. Within each block, the distances from the D0 position to the furthest position (for each day) of the hPSC-CMs were calculated. We then averaged the distances measured for the 15 blocks to calculate the migration distance of each condition and line.

Statistics and reproducibility

Replicates and statistical tests are described in the figure legends. Sample sizes were not predetermined utilizing statistical methods. Tissue samples were not randomized, nor were the investigators blinded during collection as no subjective measurements were taken. Data for scRNA-seq and MERFISH were collected from all available samples and no randomization was necessary. For the studies utilizing human pluripotent stem cell lines, treatment with NRG1 was randomly assigned. For the animal studies, animals were randomly chosen from each genotype and stage. Blinding during analysis was not necessary as all of the results were analysed with the use of unbiased analysis and software tools that are not affected by the sample. All experiments were independently repeated at least three times with similar results, including experiments in Fig. 5e, Extended Data Figs. 2a and 12a and Supplementary Fig. 18a. To identify differentially expressed genes between clusters, a Wilcoxon rank-sum test was performed and the resulting P value was corrected using the Bonferroni procedure. For the scRNA-seq predictability scores, the P values were generated by using bootstrapping to generate a distribution of scores for each gene and then calculating (1– cumulative distribution function) to obtain the P value. For the migration distance, ventricular wall thickness and qPCR results, we used a one-way analysis of variance using R (v.4.2.0; https://www.r-project.org/).

The sample sizes for the violin plot in Fig. 2e are listed as follows: from top to bottom, n = 9,106, 7,661, 19,901, 3,791, 4,003, 60,810, 28,263, 16,369, 6,956, 21,087, 17,940, 5,135 and 27,613 cells. Fig. 4c: from left to right, n = 541, 849, 1,552, 719, 2,290, 1,112, 754, 499, 335, 55, 13, 701, 338, 177, 163 and 49 cells. Extended Data Fig. 5d: from left to right, n = 9,106, 7,661, 19,901, 3,791, 4,003, 60,810, 28,263, 16,369, 6,956, 21,087, 17,940, 5,135 and 27,613 cells. Extended Data Fig. 5e: from left to right, n = 27,613, 5,135, 17,940, 21,087, 7,661, 6,956, 3,791, 28,263, 9,106, 4,003, 60,810, 16,369 and 19,901 cells. Extended Data Fig. 8f: top panel from left to right (13 p.c.w./15 p.c.w.), n = 573/706, 976/160, 1,532/895, 354/303, 784/553, 720/NA, 187/NA, 1,440/866, 1,905/840 and 508/417 cells; bottom panel from left to right (13 p.c.w./15 p.c.w.), n = 711/274, 313/305, 548/711, 387/21, 1,444/938, 800/451, 557/409, 79/80 and 66/124 cells. Extended Data Fig. 9d: from left to right, n = 9,723, 18,908, 21,203, 8,042, 47,906, 16,225, 5,814, 6,307 and 18,592 cells. Extended Data Fig. 9e: from left to right, n = 18,592, 8,042, 5,814, 6,307, 47,906, 21,203, 18,908, 16,225 and 9,723. Extended Data Fig. 11e: from left to right, n = 81,880, 75,531, 34,953, 145,935, 19,949 and 18,485 RNA molecules. Violin plots consisting of cell numbers of ten or fewer are not shown and are labelled as ‘NA’ in those cases.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

March 13, 2024
Incomplete transcripts dominate the Mycobacterium tuberculosis transcriptome

[ad_1]

Bacterial strains and growth conditions

The Mtb H37Rv strain (obtained from C. Sassetti) was grown at 37 °C in a minimal medium (Difco Middlebrook 7H9 broth (BD, 271310) supplemented with 0.5% (v/v) glycerol, 0.05% (v/v) tyloxapol (Sigma, T8761), 0.2 g l⁻¹ casamino acids (BD, 223050), and 10% (v/v) OADC (oleic acid, albumin, dextrose and catalase; BD, 212351)). The double-auxotrophic Mtb mc²6206 strain (H37Rv ΔpanCDΔleuCD)⁴² (obtained from W. Jacobs Jr) was grown in the minimal medium with an additional 50 mg l⁻¹ l-leucine (Sigma, L8000) and 24 mg l⁻¹ pantothenic acid (Sigma, P5155). The Msm mc²155 strain (obtained from S. Fortune) was grown in the Middlebrook 7H9 medium supplemented with 0.2% (v/v) glycerol, 0.05% (v/v) Tween-80 (VWR, M126), and 10% (v/v) albumin–dextrose–catalase. Liquid Mtb and Msm cultures were grown at 37 °C in Nalgene sterile square PETG medium bottles with constant agitation. The solid Mtb culture was grown on 7H11 agar (Sigma, M0428) supplemented as described above except for tyloxapol.

CRISPR interference

Plasmid pIRL58 (Addgene, 166886) bearing the Streptococcus thermophilus CRISPR–dCas9 system (dCas9_Sth1)¹³ was used to modulate the RNA expression level of target genes in Mtb mc²6206 cells. Oligonucleotides for single guide RNAs (sgRNAs; Integrated DNA Technologies) were cloned into pIRL58. After verification by Sanger sequencing, pIRL58 and pIRL19 (Addgene, 163634, which supplied the L5 integrase function on a separate suicide vector) were co-transformed into Mtb cells by electroporation using GenePulser (BioRad) at 2,500 V, 700 Ω, and 25 μF. Single colonies were picked from the solid culture plates with 20 μg ml⁻¹ kanamycin (Goldbio, K-120) selection after 14–21 days of culture. Target gene knockdown was induced by adding 100 ng ml⁻¹ ATc (Sigma, 37919). The sgRNA and primer sequences are listed in Supplementary Table 5.

SEnd-seq

RNA isolation

Bacterial cells were quenched by adding 1× vol of GTC buffer (600 g l⁻¹ guanidium thiocyanate, 5 g l⁻¹ N-laurylsarcosine, 7.1 g l⁻¹ sodium citrate, and 0.7% (v/v) β-mercaptoethanol) to the culture medium immediately before collection and placed at room temperature for 15 min. Cell pellets were collected by centrifugation at 4,000g for 10 min at 4 °C, and then thoroughly resuspended in 100 μl TE buffer (10 mM Tris-HCl pH 8.0 and 1 mM EDTA). After the addition of 1 ml TRIzol reagent (Invitrogen, 15596) and 350 mg of glass beads (Sigma, G1145), the cells were immediately lysed in a screw-cap tube by bead beating with the Precellys Evolution homogenizer (Bertin Technologies, 02520-300-RD000) at 10,000 r.p.m. for 4× 45-s cycles with a 60-s interval and chilled with dry ice. After removal of the beads by spinning samples at 12,000 r.p.m. for 5 min at 4 °C, the liquid phase was transferred to a new tube. A 200 μl volume of chloroform was added, and the sample was gently inverted several times until reaching homogeneity. The sample was then incubated for 15 min at room temperature before spinning at 12,000g for 10 min at 4 °C. The upper phase (about 600 μl) was gently collected and mixed at a 1:1 ratio with 100% isopropanol. The sample was incubated for 2 h at −20 °C and then centrifuged at 14,000 r.p.m. for 15 min at 4 °C. The pellet was washed twice with 1 ml of 75% (v/v) ethanol, air dried for 5 min and dissolved in nuclease-free water. RNA integrity was assessed with 1% (m/v) agarose gel and the Agilent 2100 Bioanalyzer System (Agilent Technologies, 5067-4626). For antibiotic treatment conditions, Mtb mc²6206 cells were exponentially grown to an optical density at 600 nm (OD₆₀₀) of about 0.8 followed by treatment with a specific antibiotic (30 μg ml⁻¹ linezolid (Sigma, PZ0014), 40 μg ml⁻¹ clarithromycin (Sigma, C9742), 300 μg ml⁻¹ streptomycin (Sigma, S9137), or 50 μg ml⁻¹ rifampicin (Sigma, R3501)). At each time point following the treatment, 4 ml of cell culture medium was withdrawn and mixed quickly with 4 ml GTC buffer. The cells were then collected, and the RNA was isolated as described above.

Library preparation for total RNA SEnd-seq

A 5 μg quantity of total RNA was mixed with pooled spike-in RNAs used in our previous study³ at a mass ratio of 300:1 in a total volume of 12 μl. The RNA sample was incubated with a 5′-adaptor ligation mix (1 μl of 100 μM 5′ adaptor (Supplementary Table 5), 0.5 μl of 50 mM ATP, 2 μl DMSO, 5 μl of 50% PEG8000, 1 μl RNase Inhibitor (New England BioLabs, M0314), and 1 μl of High Concentration T4 RNA Ligase 1 (New England BioLabs, M0437)) at 23 °C for 5 h. The sample was then diluted to 40 μl with nuclease-free water and cleaned twice with 1.5× vol of Agencourt RNAClean XP beads (Beckman Coulter, A63987). Immediately following the 5′ adaptor ligation, the eluted RNA was ligated to the 3′ adaptor (Supplementary Table 5) using the same procedure. After incubation at 23 °C for 5 h, the reaction was diluted to 40 μl with water and cleaned twice with 1.5× vol of Agencourt RNAClean XP beads to remove excess adaptors. The sample was subsequently eluted with 0.1× TE buffer and subjected to ribosomal RNA removal with RiboMinus Transcriptome Isolation Kit (Thermo Fisher, K155004) following the manufacturer’s instructions. After recovery by ethanol precipitation, the RNA was reverse transcribed to cDNA with Eubacterium rectale maturase (recombinantly purified from Eco, obtained from A. M. Pyle)⁴³ and 5′-phosphorylated and biotinylated reverse transcription primer (Supplementary Table 5). After purification, the cDNA was circularized with TS2126 RNA ligase⁴⁴ (obtained from K. Ryan). Double-stranded DNA was synthesized by DNA PolI (New England BioLabs, M0209S). After enzyme inactivation and DNA purification with 1.5× vol of AMPure beads (Beckman Coulter, A63882), the DNA was subsequently fragmented by dsDNA Fragmentase (New England BioLabs, M0348S) at 37 °C for 15 min. The reaction was stopped by adding 5 μl of 0.5 M EDTA and incubated at 65 °C for 15 min in the presence of 50 mM dithiothreitol (DTT). Next, the DNA was diluted to 40 μl with TE buffer and purified with 1.5× vol of AMPure beads. The eluted DNA was used for sequencing library preparation with NEBNext Ultra II DNA Library Prep Kit (New England BioLabs, E7645). Biotinylated DNA fragments were enriched by 5 μl of Dynabeads M-280 Streptavidin (Thermo Fisher, 11205D) and further amplified for 12 cycles by PCR.

Library preparation for primary RNA SEnd-seq

A 5 μg quantity of total RNA was used for primary transcript enrichment with our previously published method³. In brief, the 5′-triphosphorylated RNA species was specifically capped with 3′-desthiobiotin-GTP (New England BioLabs, N0761) by the Vaccinia Capping System (New England BioLabs, M2080S). The RNA was subjected to 3′ adaptor ligation using the same procedure as described above and subsequently enriched with Hydrophilic Streptavidin Magnetic Beads (New England BioLabs, S1421). After washing thoroughly, the RNA was eluted and reverse transcribed to cDNA as described above. The remaining steps were the same as those for library preparation for total RNA SEnd-seq, except that the DNA library was amplified for 15 cycles.

Illumina sequencing

Following PCR amplification, each amplicon was cleaned by 1× vol of AMPure XP beads twice and quantified with a Qubit 2.0 fluorometer (Invitrogen). The amplicon size and purity were further evaluated on an Agilent 2200 Tape Station (Agilent Technologies, 5067-5576). Equal amounts of amplicon were then multiplexed and sequenced with 2 × 150 cycles on an Illumina NextSeq500 or NovaSeq6000 platform (Rockefeller University Genomics Resource Center).

SEnd-seq data analysis

Data processing

After quality filtering and Illumina sequencing adaptor trimming with FASTX-Toolkit (v0.0.13), the raw paired-end reads were merged to single-end reads by using FLASh software (v1.2.11). The correlated 5′-end and 3′-end sequences were extracted by the custom script (fasta_to_paired.sh) using the SeqKit (v2.4.0) and Cutadapt (v4.1) packages. The inferred full-length reads were generated by Bedtools (v2.31.0) and Samtools (v1.17) after mapping to the reference genome (NC_000913.3 for Eco, NC_008596.1 for Msm and NC_018143.2 for Mtb) with Bowtie 2 (v2.5.1). The full-length reads with an insert length greater than 10,000 nt were discarded. The mapping results were visualized using the IGV genome viewer (v2.4.10). Data analysis and visualization scripts used Python packages including Matplotlib (v3.7.1), Numpy (v1.24.3), Scipy (v1.10.1), bioinfokit (v0.3), and pyCircos (v0.3.0).

RNA coverage

Each full-length read was first mapped to the genome in a specific direction. Directional RNA coverage was quantified by summing the number of aligned reads at each mapped nucleotide position. When comparing RNA coverage between samples, data were normalized by the total non-ribosomal RNA amount in each sample. For the samples treated with translation inhibitors, the abundance of spike-in RNAs was used for normalization. Coding TUs and asRNAs with high levels of expression were defined as those with an average RNA coverage of at least 10 for the first 100 nt downstream of the TSS. The circos plot was generated using the Python package pyCircos (github.com/ponnhide/pyCircos, version 0.2.0). The RNA coverage plots were generated using Matplotlib package⁴⁵ and custom Python scripts.

TSS identification

TSSs were identified from the primary RNA SEnd-seq data using a custom Python script. Only positions with more than 10 reads starting at that position, and with an increase of at least 50% in read coverage compared to its upstream neighbouring position (for example, 50 reads at position −1 and 150 reads at position 0), were retained. Candidate TSS positions within 5 nt in the same orientation were grouped together, and the position with the largest amount of read increase was used as the representative TSS position. Motif analysis around the TSS regions (−40 nt to +5 nt) was carried out by MEME (v5.5.2)⁴⁶.

TTS identification

Potential TTSs were identified from the total RNA SEnd-seq data at genomic positions with more than 10 reads ending at that position (outside rRNA genes) and with a reduction of more than 40% in read coverage compared to its upstream neighbouring position (for example, 100 reads at position −1 and 50 reads at position 0).

TU annotation

TUs were used in this work to analyse the transcription of coding genes. The genome was first segmented into preliminary TUs that contained annotated genes of the same direction. A preliminary unit was further segmented into multiple units if it contained any internal TSS with a strong activity (>2-fold increase in RNA coverage between downstream and upstream of the site for log-phase cell sample). As such, each TU contains a major TSS (TU start site) and possibly additional minor TSSs (<2-fold increase in RNA coverage). The end site of a TU was set to 10 nt before the start of a following co-directional TU, or the middle position between opposite genes that belong to two convergent TUs. TUs shorter than 700 nt and TUs annotated with only rRNA or tRNA genes were excluded from further analysis.

Antisense transcript annotation

asRNAs were called if there existed a strong antisense TSS within a given coding TU or if an opposite-direction TSS was found within the non-annotated 400-nt region downstream of a coding TU. The end site of an asRNA was set to the position where the RNA coverage dropped below 25% of the peak value.

PF analysis

Each coding TU was assigned with an upstream zone (from 0 to 200 nt downstream of the TSS) and a downstream zone (from 500 nt downstream of the TSS to the end of the TU). If there was another qualified TSS located within the downstream zone, the region downstream of that TSS was excluded from analysis. The ratio between the average RNA intensity of the downstream zone and that of the upstream zone was calculated as the PF for the corresponding TU. For asRNAs, the upstream and downstream zones were defined as 0–200 nt and 500–700 nt downstream of the TSS, respectively. The lower and upper bounds of PF values were set to be 0.0 and 2.0, respectively.

Gene ontology analysis

The Database for Annotation, Visualization, and Integrated Discovery (DAVID; v2023q2; https://david.ncifcrf.gov/)⁴⁷ was used to carry out gene ontology analysis for Mtb genes with different PF values. The complete list of genes within each set was uploaded to DAVID under the headings of Cellular Compartment, Biological Process, and Molecular Function. Enriched categories with a P value < 0.05 were presented.

NET-SEnd-seq

Cell collection, lysis, and elongation complex pulldown protocols were adapted from a published study¹⁶ with modifications. Briefly, an ATc-inducible pIRL58 backbone plasmid bearing Mtb rpoC–6×His was transformed into Mtb mc² 6206 cells and the genome-integrated expression strain was picked as described above. For each pulldown sample, 55 ml of cell culture was prepared. When the cell culture reached the mid-log phase (OD₆₀₀ = 0.5), 100 ng ml⁻¹ ATc or an equivalent volume of solvent methanol was added to the medium, and the cells were cultured for another 12 h. After removing 4 ml of cell culture for total RNA extraction, the remaining cell culture was mixed with an equal volume of frozen 2× crush buffer (20 mM Tris-HCl pH 7.8, 10 mM EDTA, 100 mM NaCl, 1 M urea, 25 mM NaN₃, 2 mM β-mercaptoethanol, 10% ethanol, 0.4% NP40, and 1 mM phenylmethylsulfonyl fluoride). The cells were subsequently precipitated by centrifugation at 4,000g for 10 min at 4 °C, immediately frozen in liquid nitrogen, and stored at −80 °C for at least 1 day. After thawing on ice, the cells were washed twice with 25 ml of cold PBS pH 7.4 and once with 5 ml of cold lysis buffer (20 mM KOH-HEPES pH 7.9, 50 mM KCl, 0.5 mM DTT, 5 mM CaCl₂, 10% glycerol, 0.3 mM MgCl_2, and 2.5 mM imidazole). The cells were then resuspended in 2 ml of lysis buffer, transferred to two 2-ml lysing matrix B tubes (MP Biomedicals, 116911050), and immediately lysed by bead beating with the Precellys Evolution homogenizer at 10,000 r.p.m. for 4× 45-s cycles with 60-s interval and chilled with dry ice. After centrifugation at 13,000g for 5 min, the supernatant was collected into a new 15-ml RNase-free tube. Each lysing matrix B tube was subjected to an additional round of bead beating with 1 ml of fresh lysis buffer and the supernatants were combined. Next, the collected sample was treated with 1 μl TURBO DNase (Life Technologies, AM2238) and incubated at room temperature for 10 min. After centrifugation at 4,000g for 10 min at 4 °C, the supernatant was transferred to a new 15-ml tube and incubated with 40 μl pre-washed Ni-NTA beads (Qiagen, 30230) for 1 h at 4 °C with continuous shaking at 100 r.p.m. After immobilization, the beads were washed four times with 5 ml of wash buffer (20 mM Tris-HCl pH 7.8, 1 M betaine, 5% glycerol, 2 mM β-mercaptoethanol, and 2.5 mM imidazole) and five times with 5 ml of pre-elution buffer (20 mM Tris-HCl pH 7.8, 40 mM KCl, 5% glycerol, 2 mM β-mercaptoethanol, and 2.5 mM imidazole). The immobilized complex was subsequently eluted with 300 μl of the pre-elution buffer containing 0.3 M imidazole. The nucleic acids in the eluates were extracted once with 200 μl phenol/chloroform/isoamyl alcohol (25:24:1, v/v/v) and once with 200 µl chloroform. The top aqueous phase was collected and precipitated by 3× volumes of ethanol, 0.1× vol of 3 M sodium acetate pH 5.2, and 2 μl glycogen (Thermo Fisher, AM9510). After precipitation at −20 °C overnight and maximum-speed centrifugation for 20 min, the pellet was washed twice with 300 µl of 75% ethanol. The pellet was then dissolved in 50 µl nuclease-free water and treated with 0.5 U Turbo DNase at 37 °C for 15 min. The residual RNA was extracted by phenol/chloroform/isoamyl alcohol, precipitated by ethanol and recovered in 11.5 µl of nuclease-free water. A 1 µl volume of spike-in RNA was added to each RNA sample, and the RNAs were ligated to a 3′ adaptor. The remaining steps were the same as those described above for total RNA SEnd-seq. The DNA library was amplified for 16 cycles by PCR.

ChIP–seq

A 50 ml volume of mid-log phase Mtb cells (OD₆₀₀ = 0.8–1.0) were treated with 1% formaldehyde while the culture was agitated at room temperature for 30 min. Crosslinking was quenched by adding glycine to a final concentration of 250 mM for another 30 min while stirring at room temperature. The cells were pelleted by centrifugation at 4,000g for 10 min at 4 °C and washed three times with 20 ml of cold PBS and 0.1× protease inhibitor (Sigma, P8465). The cell pellet was stored at −80 °C for at least one day. After thawing on ice, the cells were washed once with 5 ml of IP lysis buffer (20 mM KOH-HEPES pH 7.9, 50 mM KCl, 0.5 mM DTT, 5 mM CaCl₂, and 10% glycerol) and resuspended in 2 ml of IP lysis buffer. The cells were then transferred to two 2-ml lysing matrix B tubes (MP Biomedicals, 116911050) and immediately lysed by bead beating with the Precellys Evolution homogenizer at 10,000 r.p.m. for 4× 45-s cycles with a 60-s interval and chilled with dry ice. After centrifugation at 13,000g for 5 min, the supernatant was collected into a new 15-ml RNase-free tube. Each lysing matrix B tube was subjected to an additional round of bead beating after adding 1 ml of fresh IP lysis buffer. After centrifugation at 4,000g for 10 min at 4 °C and sampling for input control, 4 ml of supernatant was transferred to a new 15-ml tube and incubated with 0.75 µl of micrococcal nuclease (New England BioLabs, M0247S) at 37 °C for 15 min with continuous shaking. The reaction was stopped by adding EDTA at a final concentration of 25 mM, and the supernatant was transferred to a new 15-ml tube after centrifugation at 4,000g for 10 min at 4 °C. A 3 µl volume of anti-Eco σ⁷⁰-factor antibody (BioLegend, 663208; 1:1,333 dilution) or 5 µl of anti-Eco RNAP β-subunit antibody (BioLegend, 663903; 1:800 dilution) was used to immunoprecipitate Mtb σ^A-factor and Mtb RNAP, respectively. After overnight incubation, 40 µl of pre-washed protein A/G agarose beads (Thermo Fisher, 26159) were added and incubated for 2 h at 4 °C and for another 30 min at room temperature. The beads were then washed ten times with 5 ml IPP150 buffer (10 mM Tris-HCl pH 8.0, 150 mM NaCl, and 0.1% NP40) and once with 5 ml TE buffer. Next, the DNA was eluted with 150 µl of elution buffer (50 mM Tris-HCl pH 8.0, 10 mM EDTA, and 1% SDS) followed by 100 µl TE buffer with 1% SDS. After thoroughly removing the beads by centrifugation at 2,000g for 5 min at 4 °C, the combined supernatants were incubated with 1 mg ml⁻¹ Pronase (Sigma, 537088) at 42 °C for 2 h and then at 65 °C for 9 h. The sample was cleaned twice with 200 µl of phenol/chloroform/isoamyl alcohol (25:24:1, v/v/v) and recovered by ethanol precipitation. Finally, the sequencing libraries for immunoprecipitated DNA and input control were prepared using the NEBNext Ultra II DNA Library Prep Kit. After sequencing and quality filtering, the reads were mapped to the Mtb genome using Bowtie 2. The ChIP–seq signals were extracted and plotted using custom Python scripts.

Analysis of deposited RNA-seq data

The RNA-seq datasets SRR5689224 and SRR5689225 (BioProject PRJNA390669)¹² from log-phase Mtb cells cultured in dextrose-containing medium were used to compare the RNA coverage between SEnd-seq and RNA-seq. The RNA-seq datasets SRR5061507, SRR5061514, SRR5061706 and SRR5061510 (BioProject PRJNA354066)¹⁸ from Mtb cells with Rho depletion were used to compare to the rho-knockdown SEnd-seq datasets. The deposited datasets were downloaded from the National Center for Biotechnology Information. After read extraction and quality filtering, the reads were mapped to the Mtb genome using Bowtie 2 (v2.5.1). The RNA intensities were extracted and plotted using custom Python scripts.

Analysis of deposited Ribo-seq data

Mtb Ribo-seq data were downloaded from the EMBL-EBI database (E-MTAB-8835)²¹. After read extraction and quality filtering, the reads were mapped to the Mtb genome using Bowtie 2. The directional ribosome binding signals were extracted and plotted using a custom Python script.

Immunoblot

Mtb cells were lysed with TRIzol reagent as described above, and protein samples were extracted following a TRIzol-based protein extraction protocol provided by the manufacturer. Immunoblotting was carried out as described previously⁴⁸. Antibodies against His-tag (Santa Cruz, sc-8036; 1:1,000 dilution), Mtb Rho (obtained from D. Schnappinger; 1:200 dilution), and Eco RpoB (BioLegend, 663903; 1:1,000 dilution) were used.

qPCR

A 1–10 μg amount of total RNA was treated with 0.5 μl of TURBO DNase (Life Technologies, AM2238) at 37 °C for 30 min to remove the genomic DNA. The sample was diluted to 100 μl with RNase-free water and then cleaned three times with 100 μl of H₂O-saturated phenol/chloroform/isoamyl alcohol (25:24:1, v/v/v). After ethanol precipitation, 1 μg of RNA was reverse transcribed to cDNA with the High-Capacity cDNA Reverse Transcription Kit (Thermo Fisher, 4368814) following the manufacturer’s instructions. qPCR was conducted using synthesized primers and the SYBR green master mix (Thermo Fisher, 4309155) on a QuantStudio5 Real-Time PCR System (Thermo Fisher). The relative RNA abundance was presented as the signal ratio between the target transcript and the reference 16S rRNA from the same sample using the formula: 2^{Ct(16S) − Ct(target)}, in which C_t denotes the cycle threshold.

Inducible lacZ transcription in Mtb

Plasmid pIRL58 was modified by removing the sgRNA expression cassette and replacing the dCas9_Sth1 gene body with the Eco lacZ coding region, allowing the synthesis of lacZ RNA under the control of ATc-inducible promoter P_tet. The modified plasmid was co-transformed into Mtb mc²6206 cells with pIRL19 as described above. Cells from a single colony of Mtb P_tet-lacZ after selection were exponentially grown to an OD₆₀₀ of about 0.8 followed by the addition of 100 ng ml⁻¹ ATc to induce lacZ transcription. After induction, 4 ml of cell culture was withdrawn at indicated time points and mixed with 4 ml GTC buffer in a new tube as sample t (St). One extra sample taken immediately before ATc addition was referred to as S0. After RNA isolation and TURBO DNase treatment as described above, 1 µg of total RNA was used to synthesize the cDNA for qPCR. The relative lacZ mRNA abundance at each time point is defined as 2^{Ct(S0) − Ct(St)}, in which Ct denotes the cycle threshold.

In vitro transcription

DNA fragments were amplified by PCR from Mtb genomic DNA with primer sets listed in Supplementary Table 5. An AP3 promoter sequence was inserted into one end of the fragment and an intrinsic terminator (derived from TsynB in pIRL58) was placed at the other end. The DNA fragment was then incorporated into the pUC19 plasmid. The plasmid templates were prepared from Eco DH5α cells and subsequently treated with 2 µl RNase A (Thermo Fisher, EN0531) for 30 min and 2 µl Proteinase K (New England BioLabs, P8107S) for 1 h. The plasmid templates were cleaned three times with phenol/chloroform/isoamyl alcohol (25:24:1, v/v/v) and recovered by ethanol precipitation.

To prepare templates with a preformed bubble, the DNA fragment containing the intrinsic terminator was amplified from the plasmid DNA described above by PCR. The product was cleaned with QIAQuick PCR purification kit (Qiagen, 28104) and phenol/chloroform/isoamyl alcohol (25:24:1, v/v/v). The bubble template was constructed by ligating a DNA adaptor (NEBNext adaptor for Illumina) to each end of the DNA fragment using NEBNext Ultra II DNA Library Prep Kit. After XbaI digestion (cut site immediately after the terminator), the DNA template was purified using AMPure XP beads.

Purified Mtb RNAP, σ^A-factor, NusA, and NusG were prepared as described previously^38,49,50. The in vitro transcription mixture contained 2 μl of 10× transcription buffer (200 mM Tris-acetate pH 7.9, 0.5 M potassium acetate, 100 mM magnesium acetate, 10 mM DTT, and 50 µg ml⁻¹ BSA), 1 μl RNase inhibitor, 0.5 pmol of DNA template, and 2 pmol of Mtb RNAP holoenzyme (or core RNAP alone) in a 20 μl volume. The mixture was incubated at 37 °C for 15 min before the addition of rNTPs (100 μM each). At indicated time points, the reaction was quenched by adding EDTA at a final concentration of 20 mM and 2 μl of Proteinase K and incubating for 30 min. The reaction was then diluted to 100 μl with RNase-free H₂O and cleaned three times with phenol/chloroform/isoamyl alcohol (25:24:1, v/v/v). After ethanol precipitation and resuspension with 30 μl RNase-free H₂O, 0.5 μl DNase I (New England BioLabs, M0303S), 3.5 μl of DNase buffer, and 1 μl RNase inhibitor were added. After incubation at 37 °C for 30 min, the RNA product was cleaned three times with phenol/chloroform/isoamyl alcohol (25:24:1, v/v/v) and recovered by ethanol precipitation. Half of the RNA was converted to cDNA with the High-Capacity cDNA Reverse Transcription Kit and evaluated by qPCR as described above. RNA abundances were normalized to a diluted plasmid DNA sample with a concentration of 0.033 ng ml⁻¹.

Statistics

Statistical analyses were conducted with Excel (version 16.178.3) or GraphPad Prism (version 10.1.0). GraphPad Prism (version 10.1.0) or the Python Matplotlib package (version 3.7.1) was used for plotting.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

February 28, 2024
Spatial transcriptomics reveal neuron–astrocyte synergy in long-term memory

[ad_1]

Mice

All animal experiments were conducted following protocols approved by the Administrative Panel on Laboratory Animal Care at Stanford University. The TRAP2:Ai14 mouse line was a gift from the Luo laboratory at Stanford. TRAP2¹⁰ mice were heterozygous for the Fos^2A-icreER allele, and homozygous for Ai14 in the C57BL/6 background. Gt(ROSA)26Sor^{tm1.1(CAG-cas9*,-EGFP)Fezh/J} mice were acquired from Jackson Laboratory. Mice were group-housed (maximum 5 mice per cage) on a 12 h light:dark cycle (07:00 to 19:00, light) with food and water freely available. Mice were kept with ambient temperature at 21.1 ± 1.1 °C and humidity at 55 ± 5%. Male mice 49–56 days of age were used for all the experiments. Mice were handled daily for 3 days before their first behavioural experiment. The animal protocol no. 20787 was approved by Stanford University APLAC and IACUC. All surgeries were performed under avertin anaesthesia and carprofen analgesia, and every effort was made to minimize suffering, pain and distress.

Genotyping

The following primers: TCCTGGGCATTGCCTACAAC (forward), CTTCACTCTGATTCTGGCAATTTCG (reverse) and ACCCTGCTGCGCATTG (reporter) were used for genotyping of the Fos^2A-icreER allele; CTGAGCTCACCCACGCT (forward), GGCTGCCTTGCCTTCTCT (reverse), ACTGCTCACAGGGCCAG (reporter) for wild-type allele; CGGCATGGACGAGCTGTA (forward), CAGGGCCGGCCTTGTA (reverse) and AATTGTGTTGCACTTAACG (reporter) were used for genotyping of the Rosa-Ai14 allele; TTCCCTCGTGATCTGCAACTC (forward), CTTTAAGCCTGCCCAGAAGACT (reverse) and CCGCCCATCTTCTAGAAAG (reporter) for Rosa wild-type allele.

Fear conditioning

The fear conditioning training was conducted according to previously described methods⁹. Each mouse was placed individually in the fear conditioning chamber (Coulbourn Instruments), which was positioned at the centre of a sound-attenuating cubicle. Prior to each session, the chamber was cleaned with 10% ethanol to provide a background odour, while a ventilation fan generated background noise at around 55 dB. The training began with a 2-min exploration period, after which the mice received three tone-foot shock pairings separated by 1-min intervals. Each tone, an 85 dB 2-kHz sound, lasted for 30 s, and was followed by a 2-s foot shock of 0.75 mA, with both ending simultaneously. Following each pairing, the mice remained in the chamber for an additional 60 s before being returned to their home cages. For context recall, the mice were reintroduced to the original conditioning chamber for 5 min, 16 days after the training. Injections of 4-hydroxytamoxifen injections were administered immediately prior to the recall experiments, within 30 min. In the HC and NR groups, 4-hydroxytamoxifen was injected at a similar time to the other two groups during the recall. The behaviour of the mice was recorded and analysed using FreezeFrame software (version 4; Coulbourn Instruments), with motionless bouts lasting over 1 s being considered as freezing.

Brain tissue dissociation and flow cytometry

Basolateral amygdala was microdissected from live sections cut by a vibratome (300 μm thick). Tissue pieces were enzymatically dissociated using a papain-based digestion system (LK003150, Worthington). In brief, tissue chunks were incubated with papain (containing L-cysteine), DNase I, and kynurenic acid for 1 h at 37 °C and 5% CO₂. After incubation, tissues were triturated with 300 µm glass pipette tips, then 200 µm glass pipette tips, and 100 µm glass pipette tips. Cell suspensions were then centrifuged at 350g for 10 min at room temperature, resuspended in 1 ml EBSS with 10% v/v ovomucoid inhibitor, 4.5% v/v Dnase I, and 0.1% v/v kynurenic acid, and centrifuged again. The supernatant was removed, and 1 ml artificial cerebrospinal fluid (ACSF) was added to the cells. ACSF contained 2.5 mM KCl, 7 mM MgCl2, 0.5 mM CaCl₂, 1.3 mM NaH₂PO₄, 110 mM choline chloride, 24 mM NaHCO₃, 1.3 mM sodium ascorbate, 20 mM glucose, and 3 mM sodium pyruvate, 2 mM thiourea, and 13.2 mM trehalose. Cells were then passed through a 70 μm cell strainer to remove debris. Hoechst stain (1:2,000; H3570, Life Technologies) was added and incubated in the dark at 4 °C for 10 min. Samples were centrifuged (350g for 8 min at 4 °C) and resuspended in 0.5 ml of ACSF and kept on ice for flow cytometry. Live cells were sorted using the BD Vulcan into 384-well plates (Bio-Rad) directly into lysis buffer, oligodT, and layered with mineral oil. After sorting, the plates were immediately snap frozen until reverse transcription.

Sequencing

The Smartseq3 protocol was used for whole-cell lysis, first-strand synthesis and cDNA synthesis, as previously described with modifications. Following cDNA amplification (23 cycles), the concentration of cDNA was determined via Pico Green quantitation assay (384-well format) and normalized to 0.4 ng µl⁻¹ using the TPP Labtech Mosquito HTS and Mantis (Formulatrix) robotic platforms. In-house Tn5 were used for cDNA tagmentation. Libraries were amplified using Kapa HiFi. The libraries were then sequenced on a Novaseq (illumina), using 2 × 100-bp paired-end reads and 2 × 12-bp index reads, with an average of 2 million reads per cell.

Bioinformatics and data analysis for scRNA-seq

Sequences from Nextseq or Novaseq were demultiplexed using bcl2fastq, and reads were aligned to the mm10 genome augmented with ERCC (External RNA Controls Consortium) sequences, using STARsolo 2.7.9a. We applied standard algorithms for cell filtration, feature selection and dimensionality reduction. In brief, genes that appeared in fewer than five cells, samples with fewer than 2,000 genes and samples with less than 50,000 reads were excluded from the analysis. Out of these cells, those with more than 10% of reads as ERCC or more than 20% mitochondrial were also excluded from analysis. Counts were log-normalized and then scaled where appropriate. Canonical correlation analysis (CCA) function from the Seurat⁷⁰ package was used to align raw data from multiple experiments. The top 20 canonical components were used. After alignment, relevant features were selected by filtering expressed genes to a set of 2,000 with the highest positive and negative pairwise correlations. Genes were then projected into principal component space using the robust principal component analysis. DEG analysis was done by applying the Mann–Whitney–Wilcoxon test on various cell populations.

To find memory-induced genes in each type of neurons, series of strict criteria were applied. First, we removed the background activation by excluding the DEGs resulted from FR versus NF among tdT negative neurons. This guarantees their specificity that DEGs are activity-dependent, rather than a general increase in all cells caused by experience. Second, DEGs must be differentially expressed when FR TRAPed cells are compared to NR and HC controls, ensuring that the DEGs were unique to neuronal ensembles associated with memory recall, and not a result of baseline activity (HC) or activity remaining from the initial fear learning (NR). Finally, each DEG had to meet the criteria of being expressed in a quater of cells and exhibiting at least a 1.75-fold change. By adhering to these standards, a total of 107 DEGs were recognized as ‘remote-memory-associated DEGs’ across 6 distinct neuron types, BLA.Int.Pvalb was not included in the analysis due to insufficient numbers of cells. EnrichR was used for GO, KEGG and REACTOME pathway analysis and classification of enriched genes (log₂FC > 0.5 and P < 0.05) in each subpopulation.

scRNA-seq data from mPFC cells were mapped to mm10 genome with full-length tdTomato construct (including Woodchuck Hepatitis Virus Posttranscriptional sequence included in Ai14 line⁷¹), which improved the sensitivity in calling tdT⁺ cells. Data from BLA and mPFC cells were integrated using CCA. TRAPed neurons from the each integrated population were analysed, except B-P.Int.Pvalb and B-P.Int.Gpr88 neurons, due to limited cell number. DEGs with P < 0.05 (Mann–Whitney–Wilcoxon test) were considered as significant DEGs (highlighted in orange in Fig. 5d and Extended Data Fig. 12f).

After unbiased clustering astrocytes, RNA velocyto⁴⁰ and Monocle3⁴¹ were applied to infer astrocytic trajectory. DEGs between FR and NF conditions were estimated using Mann–Whitney–Wilcoxon test on each astrocyte population. R, RStudio, Python were used for data analysis.

RNAscope

The RNAscope multiplex fluorescent reagent kit v2 (323100, ACD) and RNAscope 4-Plex probes were used to conduct the RNAscope experiment according to the manufacturer’s guidelines. The probes employed were either obtained from available stocks or specially created by ACD.

Gene selection for MERFISH measurements

We used a combination of single-cell RNA sequencing data and literature to select genes for MERFISH. Our selection criteria involved identifying cell-type-marker genes for a particular cell population using a one-vs-all approach. To do this, we performed a Mann–Whitney–Wilcoxon test for each gene between the cells within the cell population and all other cells not in that population, and corrected the resulting P values for multiple hypothesis testing to obtain false discovery rate-adjusted P values. A gene was considered a cell-type marker for a specific cell population if it met the following criteria: (1) it was expressed in at least 30% of cells within the specified population; (2) the false discovery rate-adjusted P value < 0.001; (3) gene expression in the specified population was at least fourfold higher than the average expression in all cells not in that population; and (4) expressed in a fraction of cells within the specified population that was at least 2 times higher than any other population of cells. We then sorted the marker genes for each population by fold change in expression relative to cells outside the population, and saved the top five marker genes for each population to use for marker selection. In addition to these markers, known genes related to microglia, astrocytes and OPCs from the literature and included. Finally, DEGs from remote memory-associated genes were added to the panel with a total number of 158 genes.

Tissue processing for MERFISH and RNAscope

Brain tissue samples were processed using a fixed-frozen protocol for both MERFISH and RNAscope. In brief, mice were euthanized using CO₂ and perfused with cold 4% paraformaldehyde. Brain tissue was dissected and followed by incubation at 4 °C in 4% paraformaldehyde overnight, 15% sucrose for 12 h, and 30% sucrose until sink. Brain tissue was frozen in OCT using dry ice and stored at −80 °C until sectioning. Sectioning was performed on a cryostat at −18 °C. Slices were removed and discarded until BLA region was reached.

Slices with 10 μm in thickness were captured onto Superfrost slides for RNAscope and MERSCOPE slides for MERFISH. The same anatomical region was identified for imaging post hoc after sample preparation, before the start of RNAscope or MERFISH imaging.

Sample preparation and MERFISH imaging

Slides with tissue sections were processed according to MERSCOPE protocol (Vizgen). In brief, slides with tissue sections were washed three times in PBS, and then stored in 70% ethanol at 4 °C for 18 h to permeabilize the tissue. Tissue slices from the same mouse were cut at the same time and distributed onto four coverslips. After permeabilization, the samples were removed from 70% ethanol and washed with Sample Prep Wash Buffer (PN 20300001), then incubated with Formamide Wash Buffer (PN 20300002) at 37 °C for 30 min. Gene Panel Mix (RNA probes) was incubated for 48 h at 37 °C. After hybridization, the samples were washed in Formamide Wash Buffer for 30 min at 47 °C for a total of 2 times to remove excess encoding probes and polyA-anchor probes. Tissue samples were then cleared to remove lipids and proteins that contribute fluorescence background. In brief, the samples were embedded in a thin 4% polyacrylamide gel and were then treated with Clearing Premix (PN 20300003) for 36 h at 37 °C. After digestion, the coverslips were washed in Sample Prep Wash Buffer 2 times and stain with DAPI/PolyT mix for 15 min. Slides were washed with Formamide Wash Buffer followed by Sample Prep Wash Buffer before imaging. Finally, slides were loaded to MERSCOPE Flow Chamber and imaged at both 20× and 63× magnification.

MERFISH data processing

MERFISH imaging data were processed with MERlin⁷² pipeline with cell segmentation using CellPose⁷³, a deep learning-based cell segmentation algorithm based on DAPI staining. Decoding molecules were then assigned to the segmented nuclei to produce a cell-by-gene matrix that list the number of molecules decoded for each gene in each cell. The MERFISH expression matrix for each sample was concatenated with the normalized, log-transformed with Scanpy⁷⁴ and integrated using Harmony⁷⁵ and Leiden⁷⁶ clustering was applied to separate the cells into distinct clusters. TRAPed neurons were assigned based on tdTomato expression. DEGs from a comparison of FR-TRAPed and NF-TRAPed conditions were estimated using Mann–Whitney–Wilcoxon test. Peri-engram cells were computed as follows: for each engram cell (tdT⁺), its peri-engram cells were counted within a radius of 30 μm.

CalEx injection and behavioural experiments

AAVs carrying CalEx⁵¹ or tdTomato were generated by Addgene based on the vector pZac2.1-GfaABC₁D-mCherry-hPMCA2w/b (AAV5, Addgene 111568) or pZac2.1 gfaABC₁D-tdTomato (AAV5, Addgene 44332). Stereotaxic procedure of viral microinjection has been described previously. In brief, mice with fear training (within 12 h or after 24 h) were anaesthetized and placed onto a stereotaxic frame (model 1900, KOPF). Mice were injected with Carprofen (5 mg kg⁻¹) subcutaneously before and after surgery. AAVs carrying hPMCA2w/b (CalEx) or control (tdTomato) viruses were loaded via a glass pipette connected with a 10 μl Hamilton syringe (Hamilton, 80308) on a syringe injection pump (WPI, SP101I) Bevelled glass pipettes (1B100–4; World Precision Instruments) filled with viruses were placed into the BLA (1.3 mm posterior to the bregma, 3.4 mm lateral and to the midline, and 4.6 mm from the pial surface). Either 0.3 μl of AAV5 GfaABC₁D mCherry-hPMCA2w/b (7 × 10¹² viral genomes (vg) per ml) or 0.3 μl AAV5 GfaABC₁D tdTomato (7 × 10¹² vg ml⁻¹) were injected at 100 nl min⁻¹. Glass pipettes were withdrawn after 10 min and scalps were cleaned and sutured with sterile surgical sutures. Mice were allowed to recover in clean cages for 7 days. behavioural experiments (recall) were performed three weeks after surgeries. Schematic illustrations (Figs. 1a and 4a,f and Extended Data Fig. 7h,o) created with BioRender.com.

Open field

Mice were placed in the centre of 40 × 40 cm white box and allowed to freely explore for 15 min. Videos were recorded and analysed by BIOBSERVE III software. The 20 × 20 cm region in the centre was defined as the central zone. The total distance travelled and the activity exploring the centre area were analysed to evaluate the subject’s locomotor ability and anxiety levels.

Oligos and antibodies

For quantitative PCR analysis, specific primers were designed to amplify the Igfbp2 gene: Igfbp2 FW (GTCTACATCCCGCGCTG) and Igfbp2 RV (GTCTCTTTTCACAGGTACCCG). Additionally, for CRISPR–Cas9 gene editing, six gRNAs (Igfbp2 guides 1–6) were selected to target distinct regions of the Igfbp2 gene. These gRNAs were designed based on predicted specificity and efficiency: Igfbp2 guide 1 (CTACGCTGCTATCCCAACCC), Igfbp2 guide 2 (GCCAGACGCTCGGGCGTGCA), Igfbp2 guide 3 (AGAAGGTCAATGAACAGCAC), Igfbp2 guide 4 (GCCCTCCTGCCGTGCGCACA), Igfbp2 guide 5 (CTCTCGCACCAGCTCGGCGC), and Igfbp2 guide 6 (CGTAGCGTCTGGGCGCAGCG).

Antibodies targeting mCherry (Thermo Fisher M11217) and cFOS (Synaptic Systems 226308) were applied for immunostaining following manufacturers’ manuals.

Inclusion and ethics statement

We, the authors of this manuscript, recognize the importance of inclusion and ethical considerations in scientific research. Our work is guided by the principles of fairness, transparency, and respect for human dignity.

We affirm our commitment to promoting diversity and inclusivity in science, recognizing that diverse perspectives, backgrounds, and experiences enrich research and enhance scientific discovery. We have made efforts to ensure that our study is conducted in a manner that respects and includes individuals of all races, ethnicities, genders, sexual orientations, abilities, and other aspects of human diversity.

We have obtained all necessary ethical approvals and have followed appropriate guidelines and regulations for the research conducted. We have taken measures to protect the privacy and confidentiality of research participants, including obtaining informed consent and ensuring data security.

We acknowledge the potential for harm in scientific research and have taken steps to minimize any potential harm to research participants or others affected by our work. We have carefully considered the potential implications of our research and have taken responsibility for ensuring that our work is conducted in a manner that upholds ethical and moral standards.

We recognize that scientific research has the potential to impact society in profound ways and we are committed to engaging in responsible research practices that promote the well-being of individuals and society as a whole.

In summary, we affirm our commitment to inclusive and ethical research practices and recognize our responsibility to conduct research that is conducted with integrity, respect, and social responsibility.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

February 7, 2024
Study unmasks secrets of glioma’s invasive margins

[ad_1]

High-grade gliomas are cancerous tumors that spread quickly in the brain or spinal cord. In a new study led by Mayo Clinic, researchers found invasive brain tumor margins of high-grade glioma (HGG) contain biologically distinct genetic and molecular alterations that point to aggressive behavior and disease recurrence. The findings suggest insights into potential treatments that could modify the course of the disease.

The study published in Nature Communications, profiled 313 tumor biopsies from 68 HGG patients using genomics (study of genes), transcriptomics (study of gene expression at the mRNA level) and magnetic resonance imaging (MRI).

Glioma is a growth of cells that starts in the brain or spinal cord. The invasive margins of HGG have long remained a mystery due to the difficulties in surgical biopsy of these regions. The aggressive nature of most gliomas, and the visual and textural similarities between the affected regions and normal tissue, create a challenge for neurosurgeons during removal of the tumor. Some glioma cells may get left behind.

The cells in a glioma look like healthy brain cells called glial cells. As a glioma grows, it forms a mass of cells called a tumor. The tumor can grow to press on brain or spinal cord tissue, causing a range of symptoms. There are many types of glioma. Some grow slowly and aren’t considered to be cancers. Others are considered cancerous. Malignant gliomas grow quickly and can invade healthy brain tissue.

Leland Hu, M.D., a neuroradiologist at Mayo Clinic in Arizona, says the study also shows that MRI techniques, such as dynamic susceptibility contrast and diffusion tensor imaging, can help distinguish between the genetic and molecular alterations of invasive tumors, which is important for clinically characterizing areas that are difficult to surgically biopsy.

“We need to understand what is driving tumor progression,” says Dr. Hu. “Our results demonstrate an expanded role of advanced MRI for clinical decision-making for high-grade glioma.”

The study also provides insight into resistance to treatment that could improve future outcomes.

Our hope is that these clinical MRI techniques will lead to improved diagnosis, prognosis and treatment. We are looking at this research through the lens of therapeutic decision-making for patients.”

Nhan Tran, Ph.D., cancer biologist, Department of Cancer Biology at Mayo Clinic, Arizona

The entire dataset, including genomics, transcriptomics and MRI, is publicly available to other groups and institutions as a resource to fuel new discoveries beyond what Dr. Hu and colleagues have reported in the initial manuscript.

Sources:

Journal reference:

Hu, L. S., et al. (2023). Integrated molecular and multiparametric MRI mapping of high-grade glioma identifies regional biologic signatures. Nature Communications. doi.org/10.1038/s41467-023-41559-1.

[ad_2]

Source link

January 31, 2024