Connecting single-cell transcriptomes to projectomes in the mouse visual cortex

Industry Updates

July 1, 2026

Connecting single-cell transcriptomes to projectomes in the mouse visual cortex

Animal care and use

Experimental procedures that involved the use of mice were all conducted with approved protocols in accordance with NIH (US National Institutes of Health) guidelines. They were also approved by the Allen Institute for Brain Science Institutional Animal Care and Use Committee.

Mice were housed with five or less mice per cage and were maintained on a 12-h light–dark cycle, in a humidity-controlled and temperature-controlled room with water and food available ad libitum.

Transgenic mice and sparse labelling

Transgenic driver and reporter mice used in Patch-seq and WNM studies are listed in Supplementary Table 1 (Patch-seq only) and Supplementary Table 2. Characterization of the expression pattern of many of the transgenic mouse lines can be found in the AIBS Transgenic Characterization database (http://connectivity.brain-map.org/transgenic/search/basic)⁵¹. Many of the brains used for WNM studies were described in a previous article²⁸. Additional brains were sparsely and robustly labelled for WNMs studies using Supernova virus, which was provided as a gift by M. Luo as pAAV-TRE-fDIO-GFP-IRES-tTA (Addgene plasmid #118026; http://n2t.net/addgene:118026; RRID: Addgene 118026), and variants.

Tissue processing and slicing procedure

For preparation of acute brain slices, adult male and female mice (postnatal day 45 (P45)–P70 of age) were first fully anaesthetized by 5% isoflurane inhalation. Intracardiac perfusion was then performed with 25–50 ml of ice-cold cutting artificial cerebrospinal fluid (ACSF; 0.5 mM calcium chloride (dehydrate), 25 mM d-glucose, 20 mM HEPES buffer, 10 mM magnesium sulfate, 1.25 mM sodium phosphate monobasic monohydrate, 3 mM myo-inositol, 12 mM N-acetyl-l-cysteine, 96 mM N-methyl-d-glucamine chloride, 2.5 mM potassium chloride, 25 mM sodium bicarbonate, 5 mM sodium l-ascorbate, 3 mM sodium pyruvate, 0.01 mM taurine and 2 mM thiourea (pH 7.3), which had been continuously bubbling with a mixture of 95% O₂–5% CO₂). Sections (350 µm) were sliced on a vibrating microtome (Compresstome VF-300 vibrating microtome, Precisionary Instruments or VT1200S Vibratome, Leica Biosystems), either coronally or at a 17° angle from the coronal plane. For the VIS, this latter slice angle helps to maximize the integrity of neuronal processes. To optimize registration to the CCFv3, a block-face image was collected before each section was cut (Mako G125B PoE camera with custom integrated software). Immediately after slicing, brain slices were placed in warm (34 °C) oxygenated cutting ACSF for 10 min, then allowed to further recover in holding ACSF (2 mM calcium chloride (dehydrate), 25 mM d-glucose, 20 mM HEPES buffer, 2 mM magnesium sulfate, 1.25 mM sodium phosphate monobasic monohydrate, 3 mM myo-inositol, 12.3 mM N-acetyl-l-cysteine, 84 mM sodium chloride, 2.5 mM potassium chloride, 25 mM sodium bicarbonate, 5 mM sodium l-ascorbate, 3 mM sodium pyruvate, 0.01 mM taurine and 2 mM thiourea (pH 7.3)), bubbling with a mixture of 95% O₂–5% CO₂ at room temperature until transferred to the microscope for recordings.

Patch-clamp recording

Slices were bathed in warm (34 °C) recording ACSF (2 mM calcium chloride (dehydrate), 12.5 mM d-glucose, 1 mM magnesium sulfate, 1.25 mM sodium phosphate monobasic monohydrate, 2.5 mM potassium chloride, 26 mM sodium bicarbonate and 126 mM sodium chloride (pH 7.3) and continuously bubbled with 95% O₂–5% CO₂. The bath solution contained blockers of fast glutamatergic (1 mM kynurenic acid) and GABAergic synaptic transmission (0.1 mM picrotoxin). Thick-walled borosilicate glass (G150F-3, Warner Instruments) electrodes were manufactured (Narishige PC-10) with a resistance of 4–5 MΩ. Before recording, the electrodes were filled with approximately 1.0–1.5 µl of internal solution with biocytin (110 mM potassium gluconate, 10.0 mM HEPES, 0.2 mM ethylene glycol-bis (2-aminoethylether)-N,N,N′,N′-tetraacetic acid, 4 mM potassium chloride, 0.3 mM guanosine 5′-triphosphate sodium salt hydrate, 10 mM phosphocreatine disodium salt hydrate, 1 mM adenosine 5′-triphosphate magnesium salt, 20 µg ml⁻¹ glycogen, 0.5 U µl⁻¹ RNAse inhibitor (2313A, Takara) and 0.5% biocytin (B4261, Sigma), pH 7.3). The pipette was mounted on a Multiclamp 700B amplifier headstage (Molecular Devices) fixed to a micromanipulator (PatchStar, Scientifica).

Electrophysiology signals were recorded using an ITC-18 Data Acquisition Interface (HEKA). Commands were generated, signals processed and amplifier metadata were acquired using MIES (https://github.com/AllenInstitute/MIES/), written in Igor Pro (Wavemetrics). Data were filtered (Bessel) at 10 kHz and digitized at 50 kHz. Data were reported uncorrected for the measured liquid junction potential of −14 mV between the electrode and bath solutions. Before data collection, all surfaces, equipment and materials were thoroughly cleaned in the following manner: a wipe down with DNA away (Thermo Scientific), RNAse Zap (Sigma-Aldrich) and finally with nuclease-free water.

After formation of a stable seal and break-in, the resting membrane potential of the neuron was recorded (typically within the first minute). A bias current was injected, either manually or automatically using algorithms within the MIES data acquisition package, for the remainder of the experiment to maintain that initial resting membrane potential. Bias currents remained stable for a minimum of 1 s before each stimulus current injection.

To be included in the analysis, neurons needed to have a more than 1 GΩ seal recorded before break-in and an initial access resistance of less than 20 MΩ and less than 15% of the cell R_input. To stay below this access resistance cut-off, cells with a low input resistance were targeted with larger electrodes. For an individual sweep to be included for analysis, the following criteria were applied: (1) the bridge balance was less than 20 MΩ and less than 15% of R_input; (2) bias (leak) current within ±100 pA; and (3) root mean square noise measurements in a short window (1.5 ms, to gauge high-frequency noise) and longer window (500 ms, to measure patch instability) of less than 0.07 mV and less than 0.5 mV, respectively.

After electrophysiological recording, the pipette was centred on the soma or placed near the nucleus (if visible). A small amount of negative pressure was applied (approximately −30 mbar) to begin cytosol extraction and to attract the nucleus to the tip of pipette. After approximately 1 min, the soma visibly shrank and/or the nucleus was near the tip of the pipette. While maintaining negative pressure, the pipette was slowly retracted; slow, continuous movement was maintained while monitoring the pipette seal. Once the pipette seal reached more than 1 GΩ and the nucleus was visible on the tip of the pipette, the speed was increased to remove the pipette from the slice. The pipette containing internal solution, cytosol and the nucleus was removed from pipette holder, and its contents were expelled into a PCR tube containing the lysis buffer (634894, Takara). Metadata for all Patch-seq neurons including in this study are in Supplementary Table 1.

Electrophysiology feature analysis

Electrophysiological features were measured from responses elicited by short (3 ms) current pulses and long (1 s) current steps as previously described^4,21. Action potentials were detected, and the threshold, peak, fast trough and width (at half-height) were calculated for each action potential along with the ratio of the peak upstroke dV/dt to the peak downstroke dV/dt (upstroke:downstroke ratio). Several voltage trajectories (the initial action potential elicited by the lowest-amplitude current pulses and steps, the derivatives of those action potentials, and the interspike interval) were analysed as previously described. Action potential features across responses to long current steps were averaged in time bins and concatenated across step amplitudes; bins without action potentials had interpolated values from their neighbours. This was done for steps starting at a given rheobase for a cell and increasing at 10-pA intervals. Sweeps from intervals without data were interpolated from sweeps at neighbouring intervals. Subthreshold responses to hyperpolarizing current steps were analysed as before by downsampling to 10-ms bins and concatenating responses from different stimulus amplitudes (ranging from −90 pA to −10 pA). Sparse principal component analysis was performed separately on data from each of these categories (for example, action potential waveform, action potential features across current steps) and sparse principal components (sPCs) that exceeded 1% adjusted explained variance were kept. This yielded 62 sPCs in total from 12 data categories. The components were z-scored and combined to form the reduced dimension electrophysiology feature matrix.

cDNA amplification and library construction

For Patch-seq experiments, the collected nuclear and cytosolic mRNA were reverse transcribed, and the resulting cDNA was sequenced using the SMART-Seq v4 method previously described¹⁶. We used the SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing (634894, Takara) to reverse transcribe poly(A) RNA and amplify full-length cDNA according to the manufacturer’s instructions. We performed reverse transcription and cDNA amplification for 20 PCR cycles in 0.65-ml tubes, in sets of 88 tubes at a time. At least one control eight strip was used per amplification set, which contained four wells without cells and four wells with 10 pg control RNA. Control RNA was either Mouse Whole Brain Total RNA (MR-201, Zyagen) or control RNA provided in the SMART-Seq v4 kit. All samples proceeded through Nextera XT DNA Library Preparation (FC-131-1096, Illumina) using either Nextera XT Index Kit V2 Set A–D (FC-131-2001, FC-131-2002, FC-131-2003, FC-131-2004) or custom dual-indexes provided by Integrated DNA Technologies (IDT). Nextera XT DNA Library prep was performed according to the manufacturer’s instructions except that the volumes of all reagents including cDNA input were decreased either to 0.4× or to 0.2× by volume. Each sample was sequenced to approximately 500,000 to 1 million reads.

Sequencing data processing

Fifty-base pair paired-end reads were aligned to the mm10 GENCODE vM23/Ensembl 98 reference genome, downloaded from 10X cell ranger (refdata-cellranger-arc-mm10-2020-A-2.0.0). Sequence alignment was performed using STAR aligner (v2.7.1a) with default settings. PCR duplicates were masked and removed using STAR option ‘bamRemoveDuplicates’. Only uniquely aligned reads were used for gene quantification. Gene counts were computed using the R Genomic Alignments package⁵² summarizeOverlaps function using ‘IntersectionNotEmpty’ mode for exonic and intronic regions separately. Exonic and intronic reads were added together to calculate total gene counts; this was done for both the reference dissociated cell dataset and the Patch-seq dataset. Data were analysed as counts per million reads (CPM).

Transcriptomic mapping and analysis

We followed the procedures previously used⁴ to assign transcriptomic types to Patch-seq neurons by mapping Patch-seq transcriptomes to a reference dataset of single-cell RNA-seq transcriptomes obtained from dissociated cells collected by Tasic et al.¹⁶. We used the same reference taxonomy here as in Gouwens et al.⁴, starting with the 24,411 dissociated cells from VISp and ALM regions and 4,020 differentially expressed genes from Tasic et al.¹⁶, but keeping only neuronal cells from the VISp region and their corresponding T-types (13,464 cells encompassing 93 cell types). We note that the T-types and subclasses that used the ‘PT’ nomenclature in the original study have been renamed ‘ET’ here to be consistent with a recently generated whole-brain taxonomy¹.

Mapping to the VISp reference taxonomy

We mapped the transcriptomes of Patch-seq samples to the reference taxonomy using the methods previously described for inhibitory neurons⁴. In brief, for each Patch-seq transcriptome, we traversed the reference hierarchical transcriptomic tree, computing the correlation of its expression of select marker genes at each branch point of the tree with the expression profile of the reference dissociated cell types below that branch point. We chose the more correlated branch and repeated the process until the leaves (that is, T-types) of the hierarchical tree were reached. This procedure was bootstrapped with 100 iterations at each branch point using a random subsampling (70%) of markers and reference cells. We defined a mapping probability based on the fraction of times that a cell mapped to a leaf or node of the reference taxonomy. The T-type with the highest mapping probability was assigned to that Patch-seq cell.

Mapping to the whole-brain reference taxonomy

We also mapped Patch-seq cells to a recently generated whole-mouse brain taxonomy³ and examined the correspondence between transcriptomic types assigned from this taxonomy and the VISp-derived reference taxonomy. Here we used the hierarchical approximate nearest neighbour (HANN) method implemented in the scrattch-mapping package (https://github.com/alleninstitute/scrattch-mapping). This method involved traversing the taxonomy hierarchy, selecting offspring node-differentiating marker genes at each node, and finding the approximate nearest neighbour T-type using marker gene correlation as the distance metric.

Assessing mapping quality

As in Gouwens et al.⁴, we evaluated the T-type mappings by considering the confidence with which a Patch-seq transcriptome mapped to one or more reference T-types, and the expected level of ambiguity between reference T-types. We classified mapping quality measures (based on the correlation and the Kullback–Leibler divergence between the mapping probability distributions of Patch-seq cells and the reference mapping probability distribution; see Gouwens et al.⁴) into ‘highly consistent’, ‘moderately consistent’ and ‘inconsistent’ categories. Excitatory neurons (n = 1,528) that passed our quality control criteria for both electrophysiological and transcriptomic data were included in this study. Of these cells (n = 1,277) mapped to T-types with ‘high consistency’, a similar fraction to what we found for inhibitory neurons using the same method⁴. In this study, we excluded cells with inconsistent mapping from further analyses.

Visualization of reference cells and Patch-seq cells

For visual comparison of reference dissociated and Patch-seq cells, we selected the 7,339 dissociated FACS-sorted neurons from the VISp from the Tasic et al.¹⁶ reference dataset that were within the glutamatergic branch of the hierarchy (32 T-types) and used 1,398 differentially expressed genes (the top 50 differentially expressed genes in each direction for all pairwise cluster comparisons within only those excitatory types). The log₂(CPM + 1) values of these differentially expressed genes were combined across the Patch-seq and reference cells and reduced to 20 components with principal component analysis (PCA). Three ‘technical bias’ principal components were removed as they were found to be correlated with the collection method (Pearson’s r = 0.65, 0.45 and 0.45). We visualized the variation in the remaining 17 principal components in two dimensions using UMAP⁵³.

Dimensionality reduction for continuous transcriptomic variation

Because Patch-seq transcriptomes are known to suffer from increased contamination and gene dropout^11,13,54, we defined transcriptomic dimensions from reference dissociated cells collected from the mouse VIS. For each transcriptomic subclass (L2/3 IT, L4 and L5 IT, L6 IT, L5/L6 IT Car3, L5 ET, L5 NP, L6 CT, and L6b), we identified highly variable genes using Brennecke’s method (https://github.com/AllenInstitute/scrattch.hicat/). We then performed PCA and omitted principal components with a tolerance below 0.01 (that is, principal components with standard deviations ≤ 0.01 times the standard deviation of the first principal component), which resulted in 3–7 principal components per subclass. Data from Patch-seq cells assigned to the different MET-type groups were projected into this lower dimensional space using the gene loadings from PCA.

Calculation of VISpm-projecting versus VISal-projecting transcriptomic signature

To examine whether the genes identified as differentially expressed between VISpm-projecting and VISal-projecting L2/3 cells⁴⁷ were related to other cellular properties, we projected L2/3 IT Patch-seq data into a principal component space derived from these genes. We first identified the cells in the Kim et al.⁴⁷ study with the best transcriptomic quality, selecting cells identified as ‘L23 AL’ or ‘L23 PM’ with highly consistent or moderately consistent quality when mapped to the same reference VISp taxonomy as Patch-seq cells. We limited analysis to genes with a log fold change > 1 and adjusted P threshold < 0.05, combining the genes from Zinbwave-EdgeR or Zinbwave-DESeq2 analyses in the study. We then performed PCA, reducing the log-transformed expression of the resulting 838 differentially expressed genes in 345 upper cortical layer neurons to 20 features (total explained variance = 0.21). We projected Patch-seq data mapping to L2/3 IT T-types onto this common principal component space for further comparison of cellular properties with this HVA projection-associated transcriptomic signature.

Differential gene expression analysis

Differentially expressed ion channels were identified using the scrattch.hicat package (https://github.com/AllenInstitute/scrattch.hicat/) as previously described¹⁶, except that the proportion of cells expressing the gene in each type were not required to differ by 0.7 or more, as we did not want to limit our identified genes to only those expressed in an on/off manner. Only genes that were identified as being differentially expressed in both the reference and the Patch-seq datasets were included.

Morphological reconstruction

Biocytin histology

Neurons were filled with biocytin via the patch pipette. To visualize the label, a horseradish peroxidase enzyme reaction using diaminobenzidine as the chromogen was used after the electrophysiological recording. A 4,6-diamidino-2-phenylindole (DAPI) stain was also used to identify cortical layers as previously described²¹.

Imaging

Slices from Patch-seq experiments were mounted on slides and imaged as previously described²¹. In brief, operators captured images on an upright AxioImager Z2 microscope (Zeiss) equipped with an Axiocam 506 monochrome camera and 0.63× optivar lens. Two-dimensional tiled overview images were also captured (Zeiss Plan-NEOFLUAR 20X/0.5) in brightfield transmission and fluorescence channels. Higher-resolution image stacks of individual cells were acquired in the transmission channel only for the purpose of morphological reconstruction. Light was transmitted using an oil-immersion condenser (1.4 NA). High-resolution, multi-tile image stacks were captured (Zeiss Plan-Apochromat ×63/1.4 Oil or Zeiss LD LCI Plan-Apochromat ×63/1.2 Imm Corr) at an interval of 0.28 µm (1.4 NA objective) or 0.44 µm (1.2 NA objective) along the z axis. Image tiles were stitched in ZEN software and exported as single-plane TIFF files.

Anatomical location of Patch-seq cells

Layer and anatomical location were determined based on DAPI-stained overview images mentioned above. The soma position of reconstructed neurons, as well as the pia, white matter and L1–L6b borders (using DAPI for reconstructed neurons) were drawn and used in subsequent analyses. Individual cells were manually aligned to the CCFv3 by matching the overview image of the slice with a ‘virtual’ slice at an appropriate location and orientation within the CCFv3. Laminar locations were calculated by finding the path connecting the pia and white matter that passed through the coordinate of the cell, identifying its distance to the pia and white matter, as well as the position within its layer, then aligning those values to an average set of layer thicknesses.

Computer-assisted morphological reconstruction of Patch-seq neurons

Dendritic reconstructions were performed for a subset of neurons with good-quality transcriptomics, electrophysiology and labelling. Reconstructions were generated based on 63X image stacks described above. Stacks were run through a Vaa3D-based image processing and reconstruction pipeline⁵⁵. An automated reconstruction of the neuron was produced using TReMAP⁵⁶. Alternatively, initial reconstructions were created manually using the reconstruction software PyKNOSSOS (Ariadne-service) or through the citizen neuroscience game Mozak (Mozak.science)⁵⁷. Automated or manually initiated reconstructions were then extensively manually corrected and extended using a range of tools (for example, virtual finger or polyline) in the Mozak extension (Z. Popovic, Center for Game Science, University of Washington) of Terafly tools^58,59 in Vaa3D. Where possible, the local axon was also reconstructed. After 3D reconstruction, morphological features were calculated (Supplementary Table 5 and also as previously described^4,21).

Automated morphological representations

All neurons that were eligible for reconstruction were automatically segmented and post-processed to produce a quantifiable neuron reconstruction using the approach described in Gliko et al.⁶⁰. These automated reconstructions were used to make inferred MET-type assignments for cells from T-types that split across different MET-types (see below).

MET-type definition

We defined MET-types starting from our Patch-seq dataset of cells with all three data modalities (transcriptomic, electrophysiological and morphological data from a manual reconstruction; n = 389 cells) using a modified method from that used in a previous study⁴. We first used electrophysiological and morphological features to define ME clusters by several clustering methods and defined consensus clusters from the combined results^4,21. We then constructed a graph where nodes represented either cells or ME-type–T-type combinations. Edges connected cell nodes to ME-type–T-type nodes with edge weight equal to the T-type mapping probability of the cell (see ‘Mapping to the reference dataset’ above) and ME-cluster mapping probability (by subsampled random forest classification). Cells were also connected to each other by the average of their pairwise correlation across all three modalities; only the top 1%, 1.5% or 2% of correlation edges were used. We then used the Leiden community detection algorithm⁶¹ to group strongly connected nodes into MET-types (see Extended Data Fig. 1f). The procedure was repeated with 20 different random seeds for each of the three edge weight cut-offs. Final MET-types were defined from consensus clustering (as with ME types) from across the 60 total runs. Cells that did not reliable co-cluster with other cells in the consensus cluster (less than 50% average co-clustering rate) were not given a final MET-type assignment (n = 5). This procedure resulted in 384 cells being assigned to 17 MET-types.

We observed that nearly all T-types were strongly associated with a single MET-type; therefore, T-types were used to infer MET-type labels for an additional 1,090 Patch-seq neurons that lacked a manually curated reconstruction. For the handful of T-types that split across two MET-types (L6 IT VISp Penk Col27a1, L6 IT VISp Penk Fst, L6 IT VISp Col18a1, L6 IT VISp Col23a1 Adamts2, L5 ET VISp Krt80 and L5 ET VISp Lgr5), we used either an electrophysiology-based random forest classifier (L5 ET-types) or a morphology-based random forest classifier using features from automated morphological reconstructions (L6 IT-types) to assign the final MET-type label (91 neurons from those L6 IT T-types lacked an automated reconstruction and hence were not assigned an inferred MET-type).

Sparse RRR

To identify the transcriptomic expression patterns that best predicted the electrophysiological and morphological phenotypes of the Patch-seq neurons, we performed sparse RRR^5,14. With this method (schematic in Fig. 4a), a multivariate regression was performed to predict either electrophysiological or morphological features from gene expression data using a small number of latent factors as an intermediate layer (RRR), combined with an elastic net regularization to select a sparse set of contributing genes and constrain the model weights¹⁴.

For this study, we re-implemented the original sparse RRR Python code (https://github.com/berenslab/patch-seq-rrr) in the R language using the glmnet package⁶² for improved performance when selecting hyperparameters and integration with other aspects of our code base. We performed sparse RRR separately for neurons in each transcriptomic subclass (L2/3 IT, L4 and L5 IT, L6 IT, L5/L6 IT Car3, L5 ET, L5 NP, L6 CT, and L6b) and for each data modality (electrophysiology and morphology). For the fits, we first selected highly variable genes for each subclass using Brennecke’s method (https://github.com/AllenInstitute/scrattch.hicat/). We also selected features to fit in the multivariate sparse RRR by first performing elastic net fits based on gene expression on individual features and keeping features fit with a cross-validated R² > 0.1. Next, we used cross-validation to select the optimal rank (that is, number of latent factors), alpha (parameter controlling the trade-off between ridge and lasso penalties) and lambda (overall penalty strength) hyperparameters for each subclass and modality (see Extended Data Fig. 12a–c). For each subclass and modality, we also held out 15% of the cells as a test dataset that was not a part of hyperparameter selection or final regression fitting.

After performing sparse RRR with the selected hyperparameters, we visualized the results using side-by-side bi-plots^4,14 with a few selected highly correlated genes and features (see Fig. 4 and Extended Data Fig. 13). The electrophysiological sparse RRR was performed on the electrophysiology sPCs (see above); to visualize the relationship between the latent factors and traditionally defined electrophysiology features (for example, mean action potential width), we predicted the value of the sPC with the highest correlation to the traditional feature and estimated its value using a linear fit between the sPC and the traditional feature. To estimate latent factors for WNM neurons, we used the sparse RRR weights on the matched morphological feature set (as when calculating the morphology side of the paired bi-plots using Patch-seq data).

fMOST imaging

As previously described²⁸, resin-embedded, GFP-labelled brains underwent chemical reactivation to recover GFP fluorescence and facilitate wide-field or two-photon block-face imaging^32,63,64. For the entire mouse brain, a 15–20 TB dataset containing 10,000 coronal planes of 0.2–0.3 µm x–y resolution and 1-µm z sampling rate was generated within 2 weeks. Tissue was prepared and imaged as previously described^65,66. A 40X water-immersion lens with NA 0.8 was used to provide an optical resolution (at 520 nm) of 0.35 µm in xy axes and voxel size of 0.35 × 0.35 × 1.0 µm, appropriate for neuron reconstruction. GFP was imaged with an excitation wavelength of 488 nm and a bandpass emission filter of 510–550 nm.

WNM reconstruction and analysis

Vaa3D-TeraVR was used for WNM reconstructions of fMOST images. All dendrites and the complete local and long-range axonal arbor was traced using the virtual finger or polyline tool. Special care was taken to mark all putative axonal terminals, which were identified based on a large, well-labelled bouton, for secondary review by an experienced annotator. For this quality control step, the entire reconstruction was reviewed using Tera-VR. At high magnifications, the axon proximal to the soma or the main branches of distal axon collaterals were carefully examined for missed branches. Post-processing steps were run on completed reconstructions to ensure that there were no errors (that is, breaks or loops).

fMOST image registration to CCF

Whole-brain fMOST images were registered to the average mouse brain template of CCFv3 (ref. ⁶⁷) by one of two methods: BrainAligner²⁸ or DeepMAPI⁶⁸. For the DeepMAPI method, reconstruction data were supplied for several neurons labelled across individual brains and registration was performed iteratively as previously described. For the BrainAligner method, in brief, images were downsampled by 64 × 64 × 16 (x, y, z), and outer contours were affine-aligned using the robust landmark points matching algorithm. Intensity was then normalized by matching the local average intensity of raw fMOST images to that of the CCFv3, and local alignment was then iteratively deformed. As a final step, mBrainAligner was used, as necessary, to manually or semi-automatically adjust the boundaries of brain regions. With either method, once images were CCF aligned, the reconstructed neurons were transformed into the CCFv3 space using the generated deformation fields. DeepMAPI-registered reconstructions were used in the WNM analyses presented throughout the paper. With this method, VIS WNM projection targets largely agreed with what has previously been described in the literature for population studies^43,69,70,71; differences may result from issues with registration accuracy, particularly for smaller structures (for example, SCig), differences in the location and/or type of neurons labelled, and/or the type of method used.

Calculating the WNM projection matrix

CCF-registered reconstructions were translated such that all somas were positioned in the left hemisphere. SWC files were subsequently resampled to ensure uniform spacing between nodes. To quantify the pattern of axonal projection targets, a projection matrix was derived based on total axonal length per anatomical target structure (see structure list below). Target regions were represented in both ipsilateral and contralateral hemispheres. To better reflect the targets that were actually innervated by a neuron versus those with just fibres of passage, only CCF leaf structures containing a branch and tip node were included. For cortical targets, the total axon length was calculated by summing the axon lengths across all leaf regions under a given parent region that met the branch-and-tip criteria (for example, VISp1, VISp2/3, VISp4, VISp5, VISp6a and VISp6b are leaf structures under the parent region VISp). Examples are listed in the following section. Regions with non-zero values were reported as ‘targets’. For alternative approaches to target identification, please see refs. ^{11,25,27,28,72}.

CCFv3 nomenclature and abbreviations of target brain regions referred to in this study

Isocortex

Primary motor area (MOp), secondary motor area (MOs), primary somatosensory area, nose (SSp-n), primary somatosensory area, barrel field (SSp-bfd), primary somatosensory area, lower limb (SSp-ll), primary somatosensory area, mouth (SSp-m), primary somatosensory area, upper limb (SSp-ul), primary somatosensory area, trunk (SSp-tr), primary somatosensory area, unassigned (SSp-un), supplemental somatosensory area (SSs), gustatory areas (GU), visceral area (VISC), dorsal auditory area (AUDd), primary auditory area (AUDp), posterior auditory area (AUDpo), ventral auditory area (AUDv), VISal, VISam, VISl, VISp, posterolateral visual area, VISpm, VISli, VISpor, anterior cingulate area, dorsal part (ACAd), anterior cingulate area, ventral part (ACAv), prelimbic area (PL), orbital area, lateral part (ORBl), orbital area, ventrolateral part (ORBvl), agranular insular area, dorsal part (AId), agranular insular area, posterior part (AIp), agranular insular area, ventral part (AIv), retrosplenial area, lateral agranular part (RSPagl), retrosplenial area, dorsal part (RSPd), retrosplenial area, ventral part (RSPv), anterior area (VISa), VISrl, temporal association areas (TEa), perirhinal area (PERI) and ectorhinal area (ECT).

Olfactory areas

Anterior olfactory nucleus (AON), taenia tecta, dorsal part (TTd) and piriform area (PIR). Olfactory areas (OLF)-unspecified corresponds to areas of the OLF that have not been assigned to a child structure.

Hippocampal formation

Field CA3 (CA3), dentate gyrus, molecular layer (DG-mo), entorhinal area, lateral part (ENTl), entorhinal area, medial part, dorsal zone (ENTm), parasubiculum (PAR), postsubiculum (POST), presubiculum (PRE) and area prostriata (APr). Hippocampal formation (HPF)-unspecified corresponds to areas of the HPF that have not been assigned to a child structure.

Cortical subplate

Claustrum (CLA), endopiriform nucleus, dorsal part (EPd), lateral amygdalar nucleus (LA), basolateral amygdalar nucleus, anterior part (BLAa) and basolateral amygdalar nucleus, posterior part (BLAp). Cortical subplate (CTXsp)-unspecified corresponds to areas of the CTXsp that have not been assigned to a child structure.

Cerebral nuclei (CNU)

Striatum (STR), caudoputamen (CP), lateral septal nucleus, caudal (caudodorsal) part (LSc), lateral septal nucleus, rostral (rostroventral) part (LSr), central amygdalar nucleus, capsular part (CEAc), central amygdalar nucleus, lateral part (CEAl) and globus pallidus, external segment (GPe).

Thalamus

Ventral anteriolateral complex of the thalamus (VAL), ventral medial nucleus of the thalamus (VM), ventral posterolateral nucleus of the thalamus (VPL), ventral posteromedial nucleus of the thalamus (VPM), ventral posteromedial nucleus of the thalamus, parvicellular part (VPMpc), posterior triangular thalamic nucleus (PoT), subparafascicular area (SPA), peripeduncular nucleus (PP), medial geniculate complex, dorsal part (MGd), medial geniculate complex, ventral part (MGv), medial geniculate complex, medial part (MGm), dorsal part of the lateral geniculate complex, shell (LGd-sh), dorsal part of the lateral geniculate complex, core (LGd-co), dorsal part of the lateral geniculate complex, ipsilateral zone (LGd-ip), lateral posterior nucleus of the thalamus (LP), posterior complex of the thalamus (PO), posterior limiting nucleus of the thalamus (POL), suprageniculate nucleus (SGN), ethmoid nucleus of the thalamus (Eth), anteroventral nucleus of thalamus (AV), anteromedial nucleus, dorsal part (AMd), anterodorsal nucleus (AD), lateral dorsal nucleus of thalamus (LD), intermediodorsal nucleus of the thalamus (IMD), submedial nucleus of the thalamus (SMT), perireunensis nucleus (PR), paraventricular nucleus of the thalamus (PVT), parataenial nucleus (PT), nucleus of reuniens (RE), xiphoid thalamic nucleus (Xi), central medial nucleus of the thalamus (CM), central lateral nucleus of the thalamus (CL), parafascicular nucleus (PF), posterior intralaminar thalamic nucleus (PIL), reticular nucleus of the thalamus (RT), intergeniculate leaflet of the lateral geniculate complex (IGL), intermediate geniculate nucleus (IntG), ventral part of the lateral geniculate complex (LGv), dubgeniculate nucleus (SubG) and lateral habenula (LH). Thalamus (TH)-unspecified corresponds to areas of the TH that have not been assigned to a child structure.

Hypothalamus

Periventricular hypothalamic nucleus, intermediate part (PVi), supramammillary nucleus (SUM), posterior hypothalamic nucleus (PH), subthalamic nucleus (STN), zona incerta (ZI) and fields of Forel (FF). Hypothalamus (HY)-unspecified corresponds to areas of the HY that have not been assigned to a child structure.

Midbrain

Superior colliculus, optic layer (SCop), superior colliculus, superficial grey layer (SCsg), superior colliculus, zonal layer (SCzo), inferior colliculus, central nucleus (ICc), inferior colliculus, dorsal nucleus (ICd), Inferior colliculus, external nucleus (ICe), nucleus sagulum (SAG), parabigeminal nucleus (PBG), substantia nigra, reticular part (SNr), ventral tegmental area (VTA), midbrain reticular nucleus (MRN), superior colliculus, motor related, deep grey layer (SCdg), superior colliculus, motor related, deep white layer (SCdw), superior colliculus, motor related, intermediate white layer (SCiw), superior colliculus, motor related, intermediate grey layer (SCig), periaqueductal grey (PAG), precommissural nucleus (PRC), anterior pretectal nucleus (APN), medial pretectal area (MPT), nucleus of the optic tract (NOT), nucleus of the posterior commissure (NPC), olivary pretectal nucleus (OP), posterior pretectal nucleus (PPT), retroparafascicular nucleus (RPF), cuneiform nucleus (CUN), red nucleus (RN), lateral terminal nucleus of the accessory optic tract (LT) and pedunculopontine nucleus (PPN). Midbrain (MB)-unspecified corresponds to areas of the MB that have not been assigned to a child structure.

Pons

Nucleus of the lateral lemniscus (NLL), arabrachial nucleus (PB), dorsal tegmental nucleus (DTN), posterodorsal tegmental nucleus (PDTg), pontine central grey (PCG), pontine grey (PG), pontine reticular nucleus, caudal part (PRNc), tegmental reticular nucleus (TRN), superior central nucleus raphe (CS), laterodorsal tegmental nucleus (LDT), nucleus incertus (NI), pontine reticular nucleus (PRNr), subceruleus nucleus (SLC) and sublaterodorsal nucleus (SLD). Pons (P)-unspecified corresponds to areas of the P that have not been assigned to a child structure.

Medulla

Nucleus of the trapezoid body (NTB), gigantocellular reticular nucleus (GRN), magnocellular reticular nucleus (MARN) and nucleus prepositus (PRP). Medulla (MY)-unspecified corresponds to areas of the MY that have not been assigned to a child structure.

Generating local morphology

To analyse the local axon of the WNM, axon nodes that were more than 500 µm from the soma in the x–z dimensions were excised. Any orphaned segments were also removed. In 115 cells, a fraction of superficial axon nodes were registered outside of the cortex. To correct this, the cell was translated along the streamline passing nearest to the soma until the stopping criterion was met. The stopping criterion was that either all superficial axon nodes were in the cortex or that the soma was at the L6b–white matter boundary.

Calculating morphological features in WNM

To extract local features for WNM data, a CCF-driven protocol was developed to replicate the Patch-seq laminar annotations. A 2D slice was drawn through the CCF, which passed through the soma of a given cell. The slice was drawn such that it minimized the curvature of the cortex at both the pial and white matter surfaces. Cortical layers were annotated on the 2D slice using the CCF structure annotations.

Morphological features were then extracted as described for Patch-seq cells. However, in this study, we did not differentiate between apical and basal dendrites when calculating interaction features with local axons (such as the percentage of overlap).

A subset of morphological features were extracted from the complete axon arbor, including complete axon depth, complete axon height, complete axon width, complete axon maximum branch order, complete axon maximum Euclidean distance, complete axon maximum path distance, complete axon mean contraction, complete axon number of branches and complete axon number of tips. Other features were derived from the projection matrix, including the total number of projection targets, total projection length, total length in VIS, total length in the ipsilateral VIS, the length of axon within the soma-containing structure, total number of targets in the VIS, total number of targets in the contralateral VIS and proportion of the total axon length within the soma-containing structure.

Morphological feature alignment

Reconstructions from the WNM dataset were uprighted and cut to imitate the slicing that occurs when preparing Patch-seq samples from theVIS. The slice thickness and average soma depth-in-slice for Patch-seq cells were 350 µm and 48.2 ± 12.6 µm, respectively. To imitate this, WNM reconstructions were positioned 48.2 µm into a 350-µm thick rostral-caudal bounding box. Any dendrites extending beyond this rostrocaudal-bounding box were excised.

To further align the Patch-seq and WNM datasets, we minimized the Chamfer distance between 2D point clouds constructed from morphological features. Let F = {f₁, f₂, …, f_K} denote the set of dendritic morphological features. For each feature f_k ∈ F, we constructed a depth-by-feature point cloud for Patch-seq (P) and WNM (W):

$${P}_{k}=\{{p}_{1}^{(k)},\,\ldots ,{p}_{n}^{(k)}\}\subset {{\mathbb{R}}}^{2}\,{\rm{and}}\,{W}_{k}=\{{w}_{1}^{(k)},\,\ldots ,{w}_{m}^{(k)}\}\subset {{\mathbb{R}}}^{2}$$

where each point is defined as:

$${p}_{n}^{(k)}=({d}_{n},\,{f}_{k}^{(n)})\,{\rm{and}}\,{w}_{m}^{(k)}=({d}_{m},\,{f}_{k}^{(m)})$$

with d_n and d_m denoting the soma depth, and f_k⁽ⁿ⁾ and f_k^(m) representing the value of feature f_k for the n-th Patch-seq and m-th WNM cell, respectively. The Chamfer distance (CD) between P_k and W_k is defined as:

$${\rm{CD}}({P}_{k},{W}_{k})=\frac{1}{|{P}_{k}|}\sum _{p\in {P}_{k}}\mathop{\min }\limits_{w\in {W}_{k}}{\Vert p-w\Vert }_{2}^{2}+\frac{1}{|{W}_{k}|}\sum _{w\in {W}_{k}}\mathop{\min }\limits_{p\in {P}_{k}}{\Vert w-p\Vert }_{2}^{2}$$

A linear transformation ${T}_{k}(f)={\alpha }_{k}\,f+{\beta }_{k}$ was fit for each feature f_k, where f represents only the feature value component ${f}_{k}^{(m)}$ of a point ${w}_{m}^{(k)}$, not the depth component ${d}_{m}$. The transformation was applied to the feature component of each WNM point to produce an aligned point cloud:

$${T}_{k}({W}_{k})=\{({d}_{m},{T}_{k}({f}_{k}^{(m)}))|{w}_{m}^{(k)}=({d}_{m},\,{f}_{k}^{(m)})\in {W}_{k}\}$$

The parameters α_k and β_k were optimized by minimizing the Chamfer distance between the transformed WNM point cloud and the Patch-seq reference point cloud. This alignment procedure was repeated independently for each feature in F.

Multistep MET-type prediction of WNM

A systematic multimodal–multistep approach was developed to predict MET-types within the WNM dataset. The method first assigned a broad axonal projection class label — namely, IT-NP-L6b, ET or CT — and then routed cells to specialized classifiers trained to predict Patch-seq–defined MET-types from dendritic features.

WNM broad projection class assignment (step 1)

Long-range axonal projection patterns were used to first assign a broad projection class label (ET, CT or IT-NP-Lb) to WNMs. A simple decision tree determined the class from the projection target pattern of each neuron: for example, neurons that projected to the thalamus but not to other deep brain structures were assigned to the CT projection class (Extended Data Fig. 2a (1)).

To validate the broad projection class assignments, a secondary projection class label was assigned based on dendritic morphology features. A cross-validated random forest classifier was trained in a leave-one-out framework using both WNM and Patch-seq data. For Patch-seq cells, projection class training labels were given by aggregating their MET-type labels (for example, L2/3 IT→IT-NP-L6b, L5 ET-3→ET and L6 CT-1→CT), whereas WNM cells had projection class assignments from the previous step (Extended Data Fig. 2a (2)). Dendritic-derived labels for WNM cells were obtained through a leave-one-out procedure: for each WNM cell, the classifier was fit using dendritic features from all Patch-seq samples and all WNM cells but the held-out one and then used to predict the class label for the held-out cell. Most WNM cells had matching classifications from both procedures (n = 328 cells).

In the small number of cases in which the projection-derived and dendritic-derived labels conflicted (n = 13 cells), local axon morphology was used to resolve the discrepancy (Extended Data Fig. 2a (3)). WNMs with consistent dendritic and projection class labels (n = 328 cells) were used as a reference dataset, establishing projection class-specific distributions within local axon feature space. For each conflicting sample i, a silhouette score s_k(i) was calculated with respect to each candidate projection class k, defined as:

$${s}_{k}(i)=\frac{{b}_{k}(i)-{a}_{k}(i)}{\max \{{a}_{k}(i),{b}_{k}(i)\}}$$

where a_k(i) is the average distance between sample i and all reference cells in class k (intra-projection class distance), and b_k(i) is the average distance between i and reference cells in the next-closest class (inter-projection class distance). The final local axon-derived projection class label ${\hat{k}}_{i}$ was assigned as:

$${\hat{k}}_{i}={\rm{\arg }}\mathop{\max }\limits_{k}{s}_{k}(i)$$

Only the 328 reference cells were used to compute distances; projection class centroids and reference distributions were not updated after resolving the conflicting labels for the other 13 cells. All three major projection classes were considered as candidates for assignment; however, the local axon-based assignment always agreed with either the projection-derived or dendrite-derived class. The local axon-based assignment thus served as a tie-breaker when assigning the final broad projection class label for each WNM cell.

Training MET-type classifiers (step 2)

Using Patch-seq data, specialized MET-type classifiers were trained to predict the MET-types within each major projection class using dendritic morphology features (Supplementary Table 5). In cases in which MET-types showed subtle morphological separability, closely related MET-types were initially grouped into aggregate classes to improve classifier performance. Specifically, L5 IT-1, L5 IT-2 and L5 IT-3 Pld5 were grouped into an ‘L5 IT Agg.’ class; L6 IT-1, L6 IT-2 and L6 IT-3 were grouped into an ‘L6 IT Agg.’ class; and L5 ET-2 and L5 ET-3 were grouped into an ‘L5 ET-2/3 Agg.’ class. A hierarchical classification strategy was then applied in which cells assigned to an aggregate category were passed through a secondary classifier to determine the final MET-type label. Cross-validation confusion matrices generated from the Patch-seq training data (Extended Data Fig. 2c) demonstrated the reliability and predictive accuracy of the hierarchical classification framework.

Predicting MET-type for WNM cells (step 3)

With broad projection class labels established for each WNM cell, each cell was routed to the appropriate specialized MET-type classifier (Extended Data Fig. 2b). These classifiers, trained exclusively on Patch-seq dendritic morphology features, were designed to distinguish MET-types within each broad projection class (for example, the CT classifier differentiating between L6 CT-1 and L6 CT-2 MET-types). Over 500 iterations, the Patch-seq training data were sampled without replacement at 95%, with selection probabilities proportional to MET-type class size. During each iteration, using the hyperparameters defined above, a new classifier was trained on the sub-sampled dataset and used to predict MET-types for all projection-class-appropriate WNM cells. The most frequently predicted MET-type across iterations was used as the final assignment, and prediction probabilities were reported as the fraction of iterations a cell was assigned to that label.

Cells assigned to a L6 IT MET-type that had a soma in the white matter were reassigned to the L6b class. One cell, which exhibited notably sparse apical dendrite obliques given its overall local morphology, was initially labelled as L5 NP. However, owing to its prominent contralateral projections, it was reassigned to L4/L5 IT.

Logistic regression models

The probabilities of individual VISp WNM neurons projecting to specific target regions were modelled by logistic regression. Only regions that were targeted by at least four VISp cells in the dataset per subclass were used, and only subclasses that had at least 10 VISp cells were analysed. For each region, binomial generalized linear models were fit using the morphological latent factors (calculated using weights derived from sparse RRR fits of Patch-seq data) and/or the cortical surface location as predictors. Models were fit separately for each subclass. Each region was fit by only latent factors, only cortical location, and by both latent factors and cortical location. To select among the model types for each area, the AICc was calculated for each model, and the model with the lowest AICc was chosen. Regions in which none of the fitted models had an AICc lower than a null model were not analysed further. For each selected model, we calculated a pseudo R² = 1 − log(L_model)/log(L_null) where L_model was the likelihood of the data with the selected model, and L_null was the likelihood of the data with a null model⁷³ to estimate the variance explained by the selected model by leave-one-out cross-validation.

To estimate the effects of latent factors on target projection probability (Fig. 6a,c), probabilities were calculated at the average cortical location of the neurons in the examined subclass. For the effects of cortical location on target projection probability (Fig. 6b,d), the probabilities were calculated using the median latent factor values for neurons of that subclass. Individual cell predictions were made using models trained on all other cells of its subclass (that is, a leave-one-out strategy).

Statistics and research design

No statistical methods were used to predetermine sample sizes, but the sample sizes here are similar to those reported in previous publications. No randomization was used during data collection as there was a single experimental condition for all acquired data. The different stimulus protocols were not presented in a randomized order. Data collection and analyses were not performed blind to the conditions of the experiments as there was a single experimental condition for all acquired data.

Correlations were measured by the non-parametric Spearman rank correlation coefficient unless otherwise noted. The Kruskal–Wallis test followed by post-hoc Dunn’s test was used to identify significant differences across multiple groups. The P values of multiple comparisons (for example, correlations between all morphological latent factors and axonal features) were adjusted by the Benjamini–Hochberg method⁷⁴ for a family-wise error rate of 0.05.