Emx2 underlies the development and evolution of marsupial gliding membranes

[ad_1]

The P. breviceps sugar glider breeding colony

All experiments performed were approved by the IACUC committee at Princeton University. Captive-born, adult sugar gliders were obtained from the US pet trade and thereafter maintained in a breeding colony at Princeton University. The colony is kept under a 12 h–12 h light–dark cycle (temperature, 20–27 °C; humidity, 30–70%). Animals are fed daily a diet consisting of dried food, fruits and protein. Animals are housed in breeding pairs or trios. Female sugar gliders are inspected daily for pouch young by gently palpating the maternal pouch. Pouch young discovered by palpation are visually examined by briefly anaesthetizing the mother with isoflurane and gently everting the pouch to expose the neonate. For tissue collection, joeys are gently detached from the nipple, euthanized and processed in the laboratory. Both male and female joeys were used in all experiments. More details on our breeding colony can be found elsewhere¹⁴.

Scanning electron microscopy analysis

Sugar glider joeys were fixed and stored in 2% glutaraldehyde at 4 °C. After making several small incisions in the abdomen to increase infiltration, we treated the embryos 1% osmium tetroxide for 90 min at room temperature followed by critical point drying. We sputter-coated all specimens with a 1-Å-thick coating of palladium and then imaged them on a SU3500 scanning electron microscope under a high vacuum.

Sample acquisition, genome sequencing and genome assemblies

The following samples were obtained from the ABTC collection of the South Australian Museum, following all established protocols and guidelines from museum authorities: A. pygmaeus (ABTC 77168, liver); D. trivirgata (ABTC 49304, kidney); D. pennatus (ABTC 44094, liver); P. volans (ABTC 83627, kidney); P. peregrinus (ABTC 138055, muscle); Pseudocheirus occidentalis (ABTC 7808, kidney); Pseudochirops cupreus (ABTC 45036, kidney); Pseudochirops corinnae (ABTC 49246, kidney); T. rostratus (ABTC 7742, kidney). A kidney sample from a female sugar glider, P. breviceps, was obtained from the Princeton University breeding colony. The samples were processed for genome sequencing and assembly using a combination of Illumina short-read sequencing and Hi-C for scaffolding. The full pipeline for sequencing and assembly used here has been described in detail previously^18,49. Using this approach, we were able to generate genome assemblies for 13 out of the 15 species. For D. trivirgata and T. rostratus, we were not able to generate a Hi-C assembly owing to poor sample quality. Thus, after generating short-read sequencing data for these two species, we performed contig assembly using MEGAHIT (v.1.1.4.2)⁵⁰ and used Redundans (v.0.14a)⁵¹ for haplotype purging, preliminary scaffolding, and gap closing. Lastly, we scaffolded against the Hi-C sugar glider genome with ragtag v.2.0.1 (parameters: -g 50 -m 10000000 -f 200)⁵². This approach is particularly suitable in our case, given the short phylogenetic divergence among petauroids. The representation of mammalian universal single-copy orthologues in the different genomes was assessed with BUSCO (v.5.4.4)²⁷ using the curated mammalian v.10 database¹⁹. The assembly metrics for all genome assembles in this study are reported in Supplementary Table 1.

ATAC–seq analysis

ATAC–seq was performed on live single-cell suspensions from the patagium primordium, which were prepared as follows: tissue was dissected from P5 sugar glider joeys, placed into 1× PBS, transferred into 0.2% dispase/RPMI and incubated at 37 °C for 1 h. After this incubation, the epidermis and dermis were separated using forceps. The dermis was washed once in 1× PBS, transferred to 0.2% collagenase/RPMI and incubated at 37 °C for 2 h. The resulting dermal cell suspension was passed through a 40 μm mesh filter, and the remaining enzyme was washed by adding 30 ml of 0.2% bovine serum albumin (BSA)/1× PBS and centrifuging the cells for 10 min at 500 rcf (4 °C). The supernatant was discarded, and the pellet was resuspended in 0.2% BSA/1× PBS and filtered again into a 1.5 ml Eppendorf tube. After determining cell viability using Trypan Blue (Sigma-Aldrich), a total of 100,000 live cells per sample (n = 2) was set aside and used for library preparation.

Library preparation was performed according to the Omni-ATAC method⁵³. In brief, cells were lysed for 3 min on ice. The transposition reaction on the permeabilized nuclei was performed using TDE1 transposase (Illumina) at 37 °C for 60 min, and then purified using the Zymo DNA Clean and Concentrator-5 kit (Zymo). Illumina sequencing adapters and barcodes were added to the transposed DNA fragments by PCR amplification. The purified ATAC–seq library products were examined on Agilent Bioanalyzer DNA High Sensitivity chips for size distribution, quantified using the Qubit fluorometer (Invitrogen) and pooled at equal molar amounts. The ATAC–seq library pool was sequenced on the Illumina NovaSeq 6000S Prime flowcell as paired-end 61 nucleotide reads. We generated 62,387,513 and 59,728,230 reads for libraries 1 and 2, respectively. Raw sequencing reads were filtered by the NovaSeq control software and only the pass-filter reads were used for further analysis. Raw ATAC–seq reads were trimmed using NGmerge v.0.2_dev and mapped to the Hi-C P. breviceps assembly using Bowtie2 (v.2.4.2)⁵⁴. Alignments were further processed by removing duplicates using picard MarkDuplicates SNAPSHOT v.2.21.4 (http://broadinstitute.github.io/picard), and samtools (v.1.12)⁵⁵ was used to filter reads and convert files into BAM format. Peak calling on each biological replicate (n = 2) was conducted using MACS2 (v.2.2.7.1)⁵⁶ (parameters: –nomodel -q 0.05 –keep-dup all –shift -100 –extsize 200 -g 2456432000 –nolambda). To assess the concordance of peak calls between our replicates, we used the Irreproducible Discovery Rate IDR (v.2.0.4.2)⁵⁷ and only those peaks passing a false-discovery rate threshold of 0.05 were used. We then used BEDTools intersect (v.2.27.1)⁵⁸ to remove regions of the ATAC peak calls that directly overlapped annotated exons.

ChIP–seq analysis

ChIP assays were performed using flash-frozen samples from the patagium primordium of P5 sugar glider joeys. For each reaction, patagium primordia from 6–7 joeys were pooled into a single tube. Approximately 20 mg of tissue was cross-linked with 1% formaldehyde and chromatin was extracted and sheared. The following antibodies were used in the ChIP assays: anti-H3K27ac antibody (Abcam, ab4729, GR232896-1, 4 μg), anti-EMX2 antibody (Novus, NBP2-39052, 27711) and a negative control anti-IgG antibody (Millipore, 12-370, 297424, 4 μg). For H3K27Ac ChIP–seq, chromatin was divided into two experimental samples (each experimental sample consisting of pooled tissue from 6–7 joeys) and one control sample, and ChIP assays were performed in triplicate for each of the experimental samples and in duplicate for the negative control sample. In all cases, we used 7 µg of chromatin and 4 µg of antibody. ChIP DNA was then processed into three standard Illumina ChIP–seq libraries (two experimental and one control) and sequenced to generate the following number of paired end reads: 56,614,872 (control); 56,957,310 (experimental 1); and 70,392,940 (experimental 2). EMX2 ChIP–seq analysis was performed in a similar manner, except that only one experimental (pooled tissue from 6–7 joeys) and one control library were generated and sequenced to generate the following number of single-end reads: 43,909,990 (control) and 41,672,698 (experimental). Raw ChIP–seq reads were trimmed using NGmerge v.0.2_dev and mapped to the Hi-C P. breviceps assembly using Bowtie2 (v.2.4.2)⁵⁴. Alignments were further processed by removing duplicates using picard MarkDuplicates SNAPSHOT v.2.21.4, and samtools (v.1.12)⁵⁵ was used to filter reads and convert files into BAM format. Broad peaks were processed and broad peaks on each biological replicate (n = 2) were called with MACS2 (v.2.2.7.1)⁵⁶ (parameters: –broad -f BAMPE -g 2456432000 -q 0.05 –nolambda), using the input as a background control. Broad peak calls could not be filtered using IDR and, instead, peaks between the two replicates were concatenated and overlapping peaks were merged using BEDTools merge (v.2.27.1)⁵⁸. To identify enriched motifs in our EMX2 ChIP–seq data, we scanned all called ChIP peaks using the Simple Enrichment Analysis (SEA) program⁵⁹ from the MEME suite (v.5.5.4)⁶⁰.

Computational analyses

Identifying GARs

To identify GARs, we used a statistical phylogenetic test for acceleration along a specific branch of a phylogeny, as implemented in PhyloP²⁵. PhyloP requires two inputs: (1) a species tree, typically estimated from genome-wide data; and (2) a multiple-sequence alignment for each genomic region to be tested for acceleration. To obtain a species tree, we aligned orthologous coding sequences from the seventeen species included in our analysis (Supplementary Tables 1 and 2) and extracted fourfold degenerate sites. We then used RAxML (v.8.2.12)⁶¹ (parameters: -f a -x 50217 -p 50217 -# 1000 -o Pcine,Vursi -m GTRGAMMA) and Phylofit (RPHAST suite v.1.6.9)⁶² to produce a guide species tree and mod file, respectively (Fig. 1c). In parallel, we generated a tree using the first and second codon positions and found that it was identical to that produced by fourfold degenerate sites (Supplementary Fig. 3). To obtain a set of candidate sequences (that is, candidate cis-regulatory elements) that we could then test for acceleration, we focused on overlapping peaks between our ATAC–seq and our ChIP–seq sugar glider datasets. Peaks were considered overlapping if they had at least 1 bp in common between both datasets. Using this approach, we identified 52,169 candidate cis-regulatory elements. The size distribution of the candidate cis-regulatory elements is shown in Supplementary Fig. 4.

To identify orthologues of the 52,169 sugar glider candidate cis-regulatory elements across the 15 other diprotodont genomes examined in our study, we used a comparative annotation approach. First, we annotated conserved coding genes in each species by lifting-over gene model from the high-quality RefSeq annotation of the koala genome to each other genome using LiftOff v.1.6.3 (parameters: -d 4). We then conducted a second lift-over of sugar glider candidate cis-regulatory elements to each other species using the same procedure but with the addition of a flanking sequence to improve candidate cis-regulatory element mappability and reduce the chances of multi-mapping (parameter: -flank 1). We next used synteny anchoring⁶³ to validate candidate orthologues of sugar glider candidate cis-regulatory elements. For each of the 15 non-sugar-glider marsupial genomes, we created a list of candidate cis-regulatory element orthologues and their flanking genes (excluding genes that were not annotated in the reference sugar glider genome). Then, for each candidate cis-regulatory element orthologue in each species, we compared the identities of their flanking genes to those in the sugar glider genome. We considered elements to be orthologues of their reference sugar glider candidate cis-regulatory element candidate if (1) the first flanking gene, upstream or downstream, matched that in the sugar glider genome and (2) if those flanking genes in the target species were no greater than four times the distance from the candidate cis-regulatory element measured in the sugar glider genome. Candidate cis-regulatory element orthologues that passed this synteny check were then extracted from their respective genomes using gffread (v.0.12.7)⁶⁴.

Candidate cis-regulatory element orthologues across all species were then combined into a multi-fasta file and aligned using MAFFT v.7.453 (parameters: –adjustdirectionaccurately –localpair –maxiterate 1000). As our downstream evolutionary analyses were based on nucleotide substitution rates, alignment filtering was designed to address two key considerations: first, that estimated substitution rates in very gappy alignment regions containing a large number of gaps may not be reliable; and, second, that filtering based on similarity risks biasing the results of analyses that are predicated on measuring sequence divergence. We therefore used a filtering approach similar to that implemented by a previous study⁶⁵ and the Avian Phylogenomics Project⁶⁶. First, we trimmed the flank region from each non-sugar-glider target species by removing alignment columns outside the bounds of the sugar glider reference candidate cis-regulatory element sequence. Next, gappy alignment ends were trimmed using a 20 bp sliding window until 75% of species used in our analyses had a window with less than or equal to 5 gaps (Ns). To mitigate internal gap columns, we first used TrimAI v.1.4.rev15 (parameter: –gappyout) and then removed individual orthologue sequences that, after our gap filtering, still contained greater than 25% gaps or retained any individual gaps greater than 10% of the alignment length. Finally, we retained only those alignments that included sequences from at least five species, including at least one glider and non-gliding sister species and least one species from each outgroup. GAR alignments are shown in Supplementary Data 6. We next used phyloP²⁵ from the RPHAST suite (v.1.6.9)⁶² (method=“LRT”, mode=“ACC”) to identify sequences exhibiting accelerated substitution rates, testing each of the three gliding species (P. breviceps, A. pygmaeus and P. volans) independently. PhyloP performs comparisons on a per-species basis. That is, it compares a given genomic region (ATAC/ChIP peaks in our case) to the average genome substitution rate in the corresponding species. Thus, each species will have its own set of accelerated regions. Once we had a set of accelerated elements for each species, the resulting lists were compared to each other to establish which of those showed overlap among the glider species.

The superfamily Petauroidea is part of the Diprodontia, the largest extant order of marsupials that also includes the superfamily Phalangeroidea and the suborders Vombatiformes and Macropodiformes. Although Vombatiformes are an accepted outgroup, the phylogenetic relationships among the other three groups remain unclear^13,67,68. In agreement with a previous study¹³, our analysis using whole-genome data, both from fourfold degenerate sites and first and second codon positions, supports the placing of Petauroidea + Macropodiformes as sister groups (Fig. 1c and Supplementary Fig. 3 (topology 1)). However, other studies have placed Petauroidea + Phalangeroidea (topology 2)⁶⁸ or Petauroidea + Phalangeroidea (topology 3)⁶⁷ as sister groups. To verify that topology had no effect on our results, we conducted additional phyloP analyses forcing topologies 2 and 3. Our analysis recovered around 97–99% of the elements in each glider species, including all the ones around Emx2. Specifically, topologies 2 and 3 recovered, respectively, 99.4% and 97.4% of elements for A. pygmaeus; 99.6% and 97.1% of elements for P. breviceps; and 99.8% and 97.9% of elements for P. volans. Thus, while resolving the phylogenetic relationships among Diprotodonts falls outside the scope of our study, our analyses show that our results are not affected by differences in topology.

Gene enrichment analysis

To test for enrichment of accelerated sequences near genes differentially expressed in the patagium, we first associated candidate cis-regulatory elements with their putative target genes using contact domains determined from our Micro-C data. First, contact domains called with windows sizes of 10, 25 and 50 kb were integrated using an approach described previously⁶⁹. In brief, contact domain calls were combined and filtered such that fully overlapping and fully nested contact domains across all call sets were merged, whereas single and partially overlapping contact domains were retained. For all genes, we grouped all transcripts, sorted them by length and selected the largest transcript. In other words, we used the longest transcript as a representative for each gene. The TSS for each gene was determined based on the location of the first annotated exon. If this exon did not start with an ATG, it was considered the 5′ untranslated region, and the TSS was annotated as the site 1 bp directly upstream of the first exon. For exons that did begin with an ATG codon, the TSS was estimated to be approximately 1 kb upstream from the translation start site. For each remaining TAD, its overlap with annotated TSSs and a candidate cis-regulatory element was calculated using BEDTools (v.2.27.1)⁵⁸, using the intersect function with parameter -wo. We then assigned each candidate cis-regulatory element to the nearest gene TSS using BEDTools closest v.2.27.1, using the default settings, and created a table that included information on whether the sequences assigned to each gene were GARs. For this analysis, we merged the list of GARs from each of the three gliding species, and all of the gene–enhancer annotations were conducted based on sugar glider genome coordinates. Out of 24,495 genes annotated in the sugar glider genome, we tested 11,044. The remaining genes were either not assigned to contact domains or had no candidate cis-regulatory elements assigned to them. For each gene we computed the one-tailed hypergeometric P value of observing a given number of GARs from the total number of candidate cis-regulatory elements near a gene, where the number of trials is the total number of GARs and candidate cis-regulatory elements in our dataset. This analysis yielded a total of 1116 genes enriched for GARs.

Our contact domain-based gene enrichment analysis represents a biologically meaningful approach because it relies on three-dimensional enomic interactions occurring when the patagium primordium is developing. However, previous studies have also used a distance-based strategy, in which genes are assigned to candidate cis-regulatory elements based on physical proximity along the chromosome⁶⁵. To compare the outcomes of using a contact domain-based and a distance-based method, we reanalysed our data using a distance-based approach. As described above, we computed the probability of observing a given number of accelerated candidate cis-regulatory elements (GARs) based on a binomial distribution, where the number of trials is the total number of candidate cis-regulatory elements assigned to a gene and the probability of success is the proportion of all accelerated candidate cis-regulatory elements (GARs) over the total number of candidate cis-regulatory elements. Among the 24,495 genes annotated in the sugar glider genome, we tested 15,178 (the remaining genes did not have any candidate cis-regulatory elements assigned to them) and found that 1,638 were enriched for GARs. Of the 1,638 genes enriched, 27 were upregulated in the developing patagium compared with both the dorsal and shoulder skin. Of these 27, 21 were recovered with our contact domain-based analysis, demonstrating that the two approaches yield comparable results.

Finally, to establish the extent to which our results were being biased by contact domain size, we examined the size distribution of our contact domains alongside the corresponding number of enhancers. Our analysis revealed a low correlation between contact domain size and the number of enhancers (R² = 0.05) (Supplementary Fig. 5a). The majority of contact domains had 25 or fewer enhancers and were between 100,000 and 1,000,000 bp in size (Supplementary Fig. 5b,c). Notably, the contact domains containing our genes of interest are no larger than other contact domains and do not contain the greatest number of enhancers. Moreover, the largest contact domain (8,000,000 bp) did not contain the highest number of enhancers, and the contact domain containing the highest number of enhancers (~4,000,000 bp and 449 enhancers) does not contain any of our 59 genes of interest. Together, this analysis supports the notion that longer contact domains do not contain more active enhancers.

Conservation between petauroid marsupials

To examine whether Emx2-associated GARs are evolutionarily conserved elements, we analysed all marsupial genomes shown in Fig. 1c and generated conservation scores for genomic regions surrounding these elements. In brief, ~20 kb genomic regions surrounding Emx2-associated GAR orthologues were extracted from each species. These regions were then aligned with MAFFT v.7.453 (parameters: –genafpair, –ep 0 and –maxiterate 1000). Per-base conservation scores were then calculated using phyloP (parameters: –method LRT and –wig-scores). Conservation scores were then visualized against the reference sugar glider sequence in IGV as a heat map.

Conservation between marsupials, laboratory mice and humans

We assessed the conservation of Emx2-associated GARs in eutherian mammals using the laboratory mouse (mm10) and human (hg38) genomes. We used the UCSC genome browser blat tool^70,71 to identify orthologous regions between human, mouse and marsupial GARs. The P. breviceps ATAC peaks sequences were used as queries. For 5 out of the 6 GARs (that is, GAR 41701, GAR 16519, GAR 51182, GAR 32020 or GAR 13585), there were no hits that had either a strong score or that were located on the same chromosome as Emx2, even though the syntenic relationships among eutherians and marsupials in the region surrounding Emx2 were conserved. For GAR 11730, located in the Emx2 promoter, we identified a clear eutherian orthologue. We then used blat to search for these same P. breviceps sequences in the genome of the grey short-tailed possum (M. domestica), a non-petauroid marsupial from the New World. We were able to identify high-confidence hits for all six GARs. We then used the multiz alignment conservation scores in human and mouse and found that, other that the promoter element, the rest of the GARs had either no or very sparse conservation. We used both NCBI BLASTn and discontinuous mega-blast⁷² to identify orthologous sequences for our six GARs. We performed a search in both mouse and human databases using the default parameters. Subsequently, we selected the top hit from each species located on the same chromosome as Emx2. We then conducted a reciprocal best BLAST against our P. breviceps database. Our findings mirrored the results obtained through our blat analysis, as we could only confidently recover an orthologous element for GAR 11730.

Gene Ontology and KEGG pathway analysis

Genes were examined for enrichment of KEGG Pathway and GO Biological Process terms using the Enrichr web server⁷³. Protein–protein predicted associations were assessed using the STRING web server³⁸.

Transcription-factor-binding analysis

Transcription-factor-binding motif analysis was performed using the MEME suite (v.5.5.4)⁶⁰. The P. breviceps GAR 11730 sequence was tested for enrichment of motifs using the D. trivirgata sequence used as a control. We also conducted the inverse to search for potential P. breviceps loss of motifs. Transcription factor motif scans across the orthologous GARs were performed using XSTREME²⁸. The sequence of each species was run using the default parameters. For indel analysis, aligned sequences were scanned by eye to identify P. breviceps-specific insertions and deletions. These regions, plus 5–10 bp flanking sequences, were then run through XSTREME, altering parameters to account for the length of the sequences used (-nmotifs 10 -minw 4 -maxw 30). Tomtom⁷⁴ was used to identify genes associated with the identified motifs.

To determine whether candidate genes were potentially regulated by Emx2, we scanned the contact domains containing our candidate genes and used Bedtools intersect v.2.27.1 to determine whether there was overlap between the ATAC peaks associated to each gene and the Emx2 ChIP–seq peaks.

For transcription factor motif conservation analysis, we first used FIMO³⁷ (MEME suite v.5.5.4) to identify sites matching the de novo identified sugar glider EMX2–binding motif, originally identified from the ChIP–seq experiment (ATTARCNV), with a P value of 0.01 or less. We then created a multiple-sequence alignment of the ATAC peaks from all of our species and looked for these identified sites. If they were present in at least half of the species, we considered the site conserved. If a site was not conserved in at least half of the species, we then assessed whether it displayed glider-specific conservation.

scRNA-seq analysis of laboratory mouse data

An existing scRNA-seq dataset⁷⁵ from dorsal skin of mouse embryos at E12.5, E13.5, E14.5 and E15.5 was reprocessed using the Seurat package (v.4.3.0)⁷⁶. Libraries from each timepoint were filtered to contain only cells with >200 and <4,000 expressed genes and <5% mitochondrial gene counts. After the filtering steps, Seurat Objects generated for reads from each timepoint were merged, gene counts were normalized and variable features were identified for each timepoint. The objects were then integrated by selection of integration features and anchors (n = 30 dimensions) and the integrated object was scaled. Significant principal components (n = 30) were identified and used to generate a uniform manifold approximation and projection (UMAP) for dimensional reduction. User specified dimensions were used to define neighbours and clusters. Cell type annotation was performed as previously described⁷⁵.

Dermal fibroblasts identified at all timepoints in the integrated object were subset based on established markers (Lum, Pdgfra, Crabp1 and Lox). The integrated object was split and reclustered, after normalization of the gene counts and reselection of variable features, integration features and integration anchors. Significant principal components, UMAP generation and neighbour/cluster identification were performed as described above. After subclustering, the expression of Emx2 and 13 patagium upregulated transcription factors was examined in each cluster using violin plots. For transcription factors showing expression in the subcluster marked by Emx2 (cluster 2), co-expression with Emx2 was explored by plotting normalized counts of each gene with a blended matrix of normalized expression of the two genes.

Micro-C analysis

Micro-C libraries were prepared using the Dovetail Micro-C Kit, according to the protocol suggested by the manufacturer (Dovetail Genomics). In brief, single-cell suspensions of the patagium primordium of P5 sugar glider joeys were generated as described for ATAC–seq and frozen at −80 °C. A total of ~1,000,000 pooled primary cells, corresponding to approximately 4–5 joeys, was used to generate each library. After thawing the cells, chromatin was fixed using disuccinimidyl glutarate and formaldehyde. Cross-linked chromatin was then digested in situ with micrococcal nuclease (MNase). After digestion, cells were lysed with SDS and chromatin fragments were bound to chromatin capture beads. Chromatin ends were then repaired and ligated to a biotinylated bridge adapter followed by proximity ligation of adapter-containing ends. After proximity ligation, cross-links were reversed, associated proteins were degraded, DNA was purified and a sequencing library was generated using Illumina-compatible adapters. In total, three libraries were generated. Before PCR amplification, biotin-containing fragments were isolated using Streptavidin beads and the libraries were sequenced on the Illumina HiSeq X platform to generate 660,352,634 (2 × 150 bp) reads.

Raw Micro-C reads were analysed according to the Dovetail documentation (Dovetail Genomics). Reads from the three libraries were pooled and mapped to the Hi-C P. breviceps assembly using BWA (v.0.7.15-r1188)⁷⁷. Subsequently, pairtools v.0.3.0 was used to remove PCR duplicates, sort and record valid ligation events. Moreover, pairtools select v.0.3.0 was used to determine unique pairs (with the ‘(pair_type==“UU”) or (pair_type==“UR”) or (pair_type==“RU”) or (pair_type==“uu”) or (pair_type==“Uu”) or (pair_type==“uU”)’ option) and samtools (v.1.12)⁵⁵ was used to convert files into BAM format. The Hi-C matrix was produced using juicer tools (v.1.22.01)⁷⁸ and contact domains (at 10 kb, 25 kb and 50 kb) were called using juicer tools arrowhead (v.1.22.01) (parameters: -k KR –ignore-sparsity). We used Mustache (v.1.0.1)⁷⁹ to conduct loop calling (at 5 kb and 10 kb) on individual scaffolds (parameters: -d 250000 -st 0.7 -pt 0.05) and Higlass-manage v0.8.0⁸⁰ to visualize the data. We used Cooler (v.0.8.5)⁸¹ to prepare files and generate cooler multi-resolution contact maps. Clodius (v.0.3.5) and Higlass-manage (v.0.8.0)⁸⁰ were used to format and ingest other files.

Generation of immortalized sugar glider fibroblasts

The method for generating immortalized dermal fibroblasts has been described in detail previously^75,82. In brief, a skin biopsy was obtained from the trunk region (encompassing dorsal and lateral skin) of a P10 sugar glider joey and fat/connective tissue was scraped away. The sample was digested overnight at 4 °C in a solution containing HBSS without Ca²⁺ and Mg²⁺ (Thermo Fisher Scientific), dispase (500 caseinolytic units per ml (Corning)), and an antibiotic/antimycotic solution (100 μg ml⁻¹ streptomycin, 100 IU ml⁻¹ penicillin and 250 ng ml⁻¹ amphotericin B (HyClone)). After removing and discarding the epidermis, the dermis was cut into small pieces and fibroblasts were expanded as described previously⁷⁵. To generate an immortalized dermal fibroblast cell line, cells were transduced with undiluted γ-retroviral pseudoparticles according to procedures described previously⁷⁵. Cells were verified as negative for mycoplasma by testing with the MycoAlert Mycoplasma Detection Assay kit (Lonza) according to the manufacturer’s instructions. We profiled our cell line using RNA-seq and found that it displays robust levels of Emx2 and other key genes that we previously identified as being expressed in the patagium tissue (such as Wnt5a, Tbx3, Tbx5, Hand3 and Osr1)¹⁴. Moreover, we did not detect expression of genes like Shh or Pax5, a result that is also consistent with our previous transcriptional analysis of patagium tissue¹⁴ (Supplementary Fig. 6 and Supplementary Data 7). These results suggest that our sugar glider cell line provides an adequate in vitro model to test hypotheses about the upstream and downstream regulation of Emx2.

Luciferase assays

GAR analysis

For each candidate GAR, we synthesized the sequence from the glider species in which the sequence was found to be accelerated as well as from the corresponding non-gliding sister species (Twist Biosciences). The size of the sequence was defined by the overlap between the ATAC and the H3K27Ac ChIP peak identified in the sugar glider tissue. GARs and corresponding orthologues were cloned either into the pGL4.23 Luciferase Enhancer Reporter Vector (Promega) (GARs 41701, 16519, 32020, 51182, 13585) or the pGL4.10 Luciferase Promoter Reporter Vector (Promega) (GAR 11730) using In-fusion cloning (Takara). All of the resulting constructs were verified by Sanger sequencing. Immortalized sugar glider cells were seeded at a density of 5,000 cells per well and, the next morning, were transfected with the experimental constructs by using Lipofectamine (Invitrogen) (300 nl Lipofectamine to 200 ng plasmid DNA per well). To control for transfection, a control Renilla reporter vector (Promega 4.73) was co-transfected into each well (20 ng). After 48 h, cells were collected and processed using the DualGlo Luciferase Assay System (Promega) according to the protocol guidelines, and luciferase production was measured using a SpectraMax L luminometer (Molecular Devices). Luciferase activity was normalized relative to Renilla activity.

Wnt5a–Emx2 interaction analysis

We synthesized a 241 bp fragment corresponding to the region immediately downstream of the sugar glider Wnt5a first exon and a second one, in which the three Emx2 binding sites in this sequence were replaced by G bases (that is, ATTA to GGGG) (Twist Biosciences). We used In-fusion cloning (Takara) to clone each of the fragments into the pGL4.10 Luciferase Promoter Reporter Vector (Promega), in front of the luciferase coding sequence. Co-transfection experiments were carried out with either a GFP expression plasmid (Addgene 11153) or an Emx2 expression plasmid (Origene, NM_010132). All of the resulting constructs were verified by Sanger sequencing. For co-transfection experiments, we used 300 nl Lipofectamine to 200 ng plasmid DNA per well (100 ng of Luciferase vector and 100 ng of co-transfection GFP/Emx2). Cell seeding and all downstream experiments and analyses were performed as described above.

Immunostaining

For sugar glider and mouse immunofluorescence and IHC analysis, tissue samples were collected and fixed in 4% paraformaldehyde (PFA) overnight, washed in 1× PBS and incubated in 30% sucrose at 4 °C overnight. The samples were then embedded in optimal cutting temperature (OCT), flash-frozen and cryosectioned (16 μm thickness) using the Leica CM3050S Cryostat. The slides were kept at −80 °C until use. Antibody stains were performed on tissue sections using standard procedures. In brief, slides were washed with 1× PBS with 0.1% Tween-20 (PBT) and blocked with 1× PBT/3% BSA for 1 h. Rabbit anti-EMX2 (Novus NBP2-39052; 1:50), chicken anti-GFP (Novus Biologicals, NB100-1614, 1:200) or rabbit anti-KRT14 (BioLegend, 905301, 1:1,000) primary antibodies were diluted in 1× PBT/3% BSA and slides were incubated at 4 °C overnight. The next morning, the slides were washed several times with 1× PBT and incubated with secondary antibodies (Alexa-Fluor 488 (Thermo Fisher Scientific, ab150169, dilution 1:500); goat anti-rabbit Biotinylated (Vector Labs, R.T.U. (BP-9100-50; ready to use dilution). The reactions were visualized with HRP–streptavidin and the AEC substrate kit (Vector Labs: 568SK4200) or Alexa-dye-conjugated secondary antibodies (Thermo Fisher Scientific). The slides were washed several times with 1× PBT and mounted for imaging. AEC images were taken on a Nikon NiE upright microscope and fluorescence images on the Nikon A1R confocal microscope. NIS Elements v5 (Nikon) was used to acquire microscopy images.

Statistics and reproducibility

Micrographs shown in Fig. 3b,c,g, 4a and 5a–c,e and Extended Data Figs. 5b, 7a,b and 8 constitute representative data from at least three biological samples.

In vitro shRNA experiments

The RNAi Consortium (TRC) mouse lentiviral library carries several Emx2 shRNA constructs⁸³. We aligned the laboratory mouse and sugar glider Emx2 coding sequence and designed five different sugar glider species shRNA constructs targeting the same regions that were previously used for targeting the laboratory mouse locus⁸³. Moreover, we used the scrambled control recommended by the RNAi Consortium⁸³ (a list of all sequences is provided in Supplementary Table 5). We cloned the different sequences into the LV-GFP plasmid³⁴, which is designed for RNU6-1 promoter-driven shRNA expression. Large-scale production of VSV-G pseudotyped lentivirus was performed using calcium phosphate transfections in HEK293FT cells and the helper plasmids PMD2.G and psPAX2 (Addgene plasmids 12259 and 12260, respectively), as described previously³⁴. For viral infections, we plated cells in six-well dishes at 300,000 cells per well, and incubated them overnight with unconcentrated lentiviruses in the presence of polybrene (100 μg ml⁻¹). The medium was replaced the next morning and cells were allowed to grow for 5 days. After this period, cells were sorted using FACS, grown for five additional days and sorted using fluorescence-activated cell sorting (FACS) a second time. Once cells reached confluency, RNA was extracted using the Zymo Directzol Kit (Zymo Research). The ability of the different shRNA constructs to induce Emx2 downregulation was determined using qPCR, using the primers listed in Supplementary Table 5. When introduced into cultured sugar glider immortalized dermal fibroblasts, the different shRNAs reduced Emx2 mRNA levels to 27% (shEmx2-1), 48% (shEmx2-2), 5% (shEmx2-3), 47% (shEmx2-4) and 38% (shEmx2-5), compared to a scrambled control shRNA (shScram). We selected shEmx2-2 and the shScram control for downstream analyses.

In-pouch lentiviral transgenesis

To develop sugar glider transgenesis, we used the LV-GFP plasmid³⁴. For subsequent shRNA experiments, we used shEmx2-3 (the most effective of the constructs that we tested) and the shScram control. Large-scale production of lentiviruses was performed as described for in vitro work, except that lentiviruses were subjected to ultraconcentration according to established protocols^34,84. For injections, we anaesthetized female sugar gliders, gently exposed a P3 joey from inside the pouch and intradermally injected ~2.5 μl of concentrated virus into the interlimb region (one side only) using a 33-gauge needle. After injection, the joey was gently placed back inside the pouch, and the mother was monitored until fully awake from the effect of anaesthesia. Several days later (P6 for RNA-seq/qPCR analysis and P9 for histology/measurements), females were anaesthetized, joeys were collected by gently detaching their jaw from the nipple, placed in a temporary container and brought back to the laboratory, where they were euthanized and processed for downstream analysis.

Phenotypic measurements

After confirming GFP expression by visually inspecting the tissue under a dissecting scope, joeys were fixed, embedded, cryosectioned and stained with DAPI. We used Fiji v.2.1.0 to measure the area of the left and right patagium in at least three non-consecutive sections per sample (n = 4 (shEmx2-3) and n = 4 (shScram) samples) and calculated the control/experimental ratio. Each measurement was performed three independent times and the results were averaged. All counts were performed in a blinded manner (that is, the researcher performing the measurements was unaware of which images corresponded to which genotypes/experimental conditions). Statistical differences were established using a mixed effects model one-way ANOVA test.

Tissue collection for qPCR and RNA-seq analysis

We injected the lateral skin of P3 joeys with either the shEmx2-3 or shScram lentivirus. Joeys were euthanized at P6 and infected patagium tissue, as visualized using GFP, was dissected and preserved in RNAlater. RNA was extracted using the RNeasy fibrous tissue mini kit (Qiagen) according to the manufacturer’s protocol. For qPCR, we used n = 5 shEmx2-3 samples and n = 5 shScram samples. For RNA-seq, we used n = 5 (shEmx2-3) and n = 5 (shScram) samples.

qPCR analysis

We used the qScript cDNA SuperMix (Quanta BioSciences) to generate complementary DNA (cDNA) and then performed qPCR using PerfeCTa SYBR Green FastMix (Quanta BioSciences). We assayed gene expression in triplicate for each sample and normalized the data using the housekeeping gene Actb. Primers used for qPCR were designed using the sugar glider genome and are reported in Supplementary Table 5. We analysed data from all qPCR experiments using the comparative C_t method and established statistical significance of expression differences using two-tailed t-tests.

Bulk RNA-seq

Tissues

RNA was extracted as described above and RNA-seq libraries were prepped using the TruSeq RNA Library Prep kit v2 (Illumina) and sequenced on the NovaSeq 6000 system (2 × 65 bp, paired-end). Pairwise differential expression analyses between the transcriptomes of shEmx2-3 or shScram skin were performed using DeSeq2 v.1.34.1 from BioConductor (https://bioconductor.org/)⁸⁵. Only genes differentially expressed with an adjusted P < 0.05 were considered.

Cells

Immortalized P. breviceps cells were grown in a six-well dish until confluent. RNA was collected using the TRizol reagent according to the manufacturer’s protocol. FASTQ reads were trimmed using Trimmomatic v.0.39 and aligned to the P. breviceps genome using STAR v.2.7.9a⁸⁶. Counts were generated using featureCounts (v.2.0.1)⁸⁷ (featureCounts -p -t transcript -g transcript_id -O –minOverlap 10) and RPKM was calculated using the calculation reads for a gene/(all reads for the sample/1,000,000) × (1,000/length of gene).

Whole-mount in situ hybridization analysis

The Emx2 mouse riboprobe was a generous gift from T. Capellini and has been previously described³³. Whole-mount in situ hybridizations was performed using previously described protocols⁸⁸. In brief, mouse E11.5 and E13.5 (CD-1) embryos were post-fixed with 4% PFA in 1× PBS, washed with 1× PBS, treated with 20 μg ml⁻¹ proteinase in 1× PBT for 45 min and incubated overnight with the Emx2 riboprobe at 65 °C. The next morning, the probe was washed with MABT (Maleic acid, NaCl, Tween-20) and incubated overnight with secondary anti-DIG antibodies (1:2,000) diluted in MABT, 2% Boehringer Blocking Reagent and 20% heat-treated sheep serum. After washing several times with MABT, the signal was developed by incubating with NBT/BCIP. Once a signal had developed sufficiently, the reaction was stopped by washing several times with PBT and fixing in 4% PFA overnight. The embryos were visualized using a SMZ18 stereo microscope (Nikon). NIS Elements v5 (Nikon) was used to acquire microscopy images.

HCR in situ hybridization

The sugar glider Wnt5a coding region was used to generate 20 probe binding sequences for in situ hybridization chain reaction (HCR). HCR was performed using the standard protocol for fixed frozen tissue sections available from Molecular Instruments. For visualizing Wnt5a and EMX2 co-expression, sections hybridized with the Wnt5a probe were subsequently incubated with the EMX2 antibody, according to the procedure described for immunostainings.

Stereoseq analysis

We analysed Emx2 and Wnt5a co-expression in a previously generated spatial Stereo-seq dataset⁸⁹. After examining all available datasets, we chose samples from E14.5 embryos, as this timepoint exhibited the most significant expression of both Emx2 and Wnt5a. We analysed the E1S1 sample, as it contained both the highest mean expressions of Emx2 and Wnt5a, as well as the highest percentage of spots with non-zero expression of Emx2 and Wnt5a. All of the datasets were processed and analysed using Scanpy².

We analysed the brain and craniofacial regions separately. First, we extracted the x coordinates and y coordinates of the spatial spots from the entire tissue, which we denote as X and Y, respectively. We reflected $Y\mapsto -Y$ so that the y coordinates were positive. We defined the brain region as the collection of spots, B, that were originally annotated as brain and where Y > 425. The latter threshold was chosen to remove any spots that may have been mislabelled as the brain region due to cell type deconvolution in the original study.

From trial and error and visual inspection, we defined the craniofacial region as the collection of points, C, such that:

$$\begin{array}{l}C=\{X,Y:X\le 165\,{\rm{and}}\,425 < Y < 540.96+1.22X-132.39\,{\rm{or}}\,165\\ < X < 275\,{\rm{or}}\,425\, < Y < 462-1.23(X-201.62)\}\end{array}$$

To focus specifically on craniofacial gene expression, we further removed any points that were originally annotated as belonging to either the brain or meninges regions⁵¹. To determine the likely cellular identities of spots expressing Emx2, we performed spatially informed dimensionality reduction using SpaceFlow⁹⁰. The latent embedding produced by SpaceFlow was used instead of principal component analysis to generate spatially coherent clusters, that is, clusters where spots are grouped based on both gene expression similarity and spatial proximity. We then generated a k-nearest neighbours (k-NN) graph (k = 15) to characterize spot–spot similarity. Using the k-NN graph, we used the Leiden algorithm to generate spatially resolved clusters that balanced spatial proximity with gene expression similarity, setting the clustering resolution parameter to Resolution=1.0. After constructing spatial clusters, we used differential expression analysis to determine the potential cellular identities of Emx2- and Wnt5a-expressing regions.

We used Mann–Whitney U-tests to identify spatial DEGs for each spatial cluster obtained by SpaceFlow. Specifically, we defined two particular groups of clusters. First, we aggregated the clusters in which Emx2 and Wnt5a were co-expressed significantly into a ‘Emx2 on/Wnt5a on’ cluster. Second, we aggregated the clusters in which Wnt5a was expressed significantly but Emx2 was absent into a ‘Emx2 off/Wnt5a on’ cluster. We then performed two separate differential expression tests, one between the Emx2 on/Wnt5a on cluster and the other SpaceFlow clusters and another between the Emx2 off/Wnt5a on cluster and the other clusters to identify the DEGs specific to Emx2^highWnt5a^high regions, enabling us to determine the likely cellular identities of these regions.

RNAscope

Frozen tissue sections (12 μm) of mouse embryos at E14.5 and E16.5 were used for RNA in situ hybridization using the RNAscope kit v2 (323100, Advanced Cell Diagnostics), according to the manufacturer’s instructions. The Mus musculus Wnt5a and Emx2 probes (316791 and 319001-C3, respectively; Advanced Cell Diagnostics) were used and labelled with OPAL 520 reagent (FP1487001KT, Akoya Biosciences) and TSA Vivid 650 dye (7536, Advanced Cell Diagnostics), respectively. For all of the experiments, slides were mounted using VECTASHIELD Antifade Mounting Medium with DAPI (H-1200-10; Vector Laboratories). Fluorescent images were captured with a Fluoview3000 laser-scanning microscope (Olympus) with UPLSAPO ×20/0.75 NA and ×40/1.25 NA Silicone objectives (Olympus).

Mouse transgenics

All mouse experiments performed were approved by the IACUC committee at Princeton University. All mouse strains were provided with food and water ad libitum and kept on a 12 h–12 h light–dark cycle, at a temperature of 20 °C and 60% humidity. Experiments were carried out in both males and females.

Emx2 overexpression

The Rosa^Emx2-GFP strain was a gift from D. Wu (NIH) and has been previously described⁴². The Emx2^cre strain was obtained from Riken (RBRC02272) and has been previously described⁴⁶. The Pdgfra^creERT2 strain was obtained from JAX (032770) and has been previously described⁴³. The FVB/N769Tg(tetO-Wnt5a)17Rva/J and B6.Cg Gt(ROSA)26Sortm1(rtTA*M2)Jae/J strains were obtained from JAX (022938 and 006965, respectively) and have been previously described^91,92. Both male and female mice were used in all experiments. For Emx2 induction, P30 mice were intraperitoneally injected with tamoxifen (100ul; 20 mg ml⁻¹) during 5 consecutive days. Then 7 days after the last injection, mice were intraperitoneally injected with EdU (200ul; 10 nM) and euthanized 4 h later. For Wnt5a induction, P30 mice were placed on 1 mg ml⁻¹ doxycycline-containing water ad libitum for 7 days (doxycycline water was replaced every other day). Skin tissue was collected, fixed in 4% PFA at 4 °C overnight and embedded in OCT for cryosectioning.

Phenotypic measurements

Epidermal thickness was quantified using Fiji v.2.1.0 by measuring tissue stained with haematoxylin and eosin from the base of the epidermis to the top of the epidermis. Measurements were taken exclusively from interfollicular regions (10 measurements per dorsal skin section, 3 dorsal skins sections per sample, 4 samples per treatment). Cell density and cell proliferation were quantified on tissue sections by counting the number of DAPI⁺ cells per surface area (measurements were done on three dorsal skin sections per sample, four samples per treatment). Statistical significance was assessed using a general mixed-effects model one-way ANOVA test (fixed effect = treatment; random effect = individual/sample).

LacZ enhancer reporter assays

We performed all mouse enhancer reporter assays using enSERT methodology, a recently developed strategy that uses CRISPR–Cas9 technology for site-directed insertion of enhancer reporter transgenes into the Igs2 intergenic safe-harbour site^44,93. As constructs are integrated into a specific genomic site, this method allows for reproducible and efficient testing of enhancer-reporter activity. After the constructs were synthesized, as described for our luciferase assays, we cloned them into a previously described LacZ targeting vector (Addgene, 139098)⁹³ containing a minimal Shh promoter. The resulting vectors were injected into mouse zygotes (FVB/NCrl strain; Charles River) in pools of 4–5 constructs, together with Igs2 gRNA and Cas9 protein (IDT), followed by embryo transfer into pseudopregnant females, as described previously⁴⁴. Embryos were collected at E9.5 and E11.5. After collection, the embryos were screened for the corresponding GAR insertion using junction PCR and Sanger sequencing, and stained with X-gal according to previously established protocols⁴⁴. At least 2 transgenic embryos with integration at the Igs2 locus per construct were analysed.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

Emx2 underlies the development and evolution of marsupial gliding membranes

The P. breviceps sugar glider breeding colony

Scanning electron microscopy analysis

Sample acquisition, genome sequencing and genome assemblies

ATAC–seq analysis

ChIP–seq analysis

Computational analyses

Identifying GARs

Gene enrichment analysis

Conservation between petauroid marsupials

Conservation between marsupials, laboratory mice and humans

Gene Ontology and KEGG pathway analysis

Transcription-factor-binding analysis

scRNA-seq analysis of laboratory mouse data

Micro-C analysis

Generation of immortalized sugar glider fibroblasts

Luciferase assays

GAR analysis

Wnt5a–Emx2 interaction analysis

Immunostaining

Statistics and reproducibility

In vitro shRNA experiments

In-pouch lentiviral transgenesis

Phenotypic measurements

Tissue collection for qPCR and RNA-seq analysis

qPCR analysis

Bulk RNA-seq

Tissues

Cells

Whole-mount in situ hybridization analysis

HCR in situ hybridization

Stereoseq analysis

RNAscope

Mouse transgenics

Emx2 overexpression

Phenotypic measurements

LacZ enhancer reporter assays

Reporting summary

Comments

Leave a Reply Cancel reply

More posts