Two subsets of human marginal zone B cells resolved by global analysis of lymphoid tissues and blood

B cells generate antibodies that are essential for immune protection, but their subgroups are poorly defined. Here, we perform undirected deep profiling of B cells in matched human lymphoid tissues from deceased transplant organ donors and blood. In addition to identifying unanticipated features of tissue-based B cell differentiation, we resolve two subsets of marginal zone B (MZB) cells differing in cell surface and transcriptomic profiles, clonal relationships to other subsets, enrichment of genes in the NOTCH pathway, distribution bias within splenic marginal zone microenvironment, and immunoglobulin repertoire diversity and hypermutation frequency. Each subset is present in spleen, gut-associated lymphoid tissue, mesenteric lymph nodes, and blood. MZB cells and the lineage from which they are derived are depleted in lupus nephritis. Here, we show that this depletion is of only one MZB subset. The other remains unchanged as a proportion of total B cells compared with health. Thus, it is important to factor MZB cell heterogeneity into studies of human B cell responses and pathology. Description Human MZB cell subsets differ in cell surface, transcriptome, clone relatives, mutation burden, microanatomy, and relevance to disease. Two subsets of human marginal zone B cells B cells are immune regulators that secrete antibodies, present antigen, and secrete cytokines. Despite their importance in humoral immunity, human B cell subsets in lymphoid organs are poorly defined. Siu et al. performed deep profiling of B cell subsets in matched samples across human spleen, appendix, and lymph nodes. They identified organ-specific differences in activated B cell populations and resolved two populations of marginal zone B cells, MZB-1 and MZB-2. These differ in tissue abundance and distribution, transcriptional profiles, immunoglobulin repertoire diversity, and frequency of SHM. MZB-1 but not MZB-2 was reduced in the blood of patients with systemic lupus erythematosus (SLE). These findings represent a comprehensive analysis of human B cell subsets and suggest distinct roles of MZB-1 and MZB-2 in health and disease.


INTRODUCTION
B cells maintain health by generating affinity-matured and innatelike antibody responses and as immune regulators that present antigen and secrete cytokines. B cell responses occur in lymphoid tissues that differ in fundamental microarchitecture and mechanisms of antigen acquisition (1). For example, gut-associated lymphoid tissue (GALT) in the Peyer's patches and appendix chronically sample predominantly particulate antigens from the gut lumen via a specialized follicle-associated epithelium. This results in sustained germinal center (GC) responses (2). In contrast, lymph nodes (LNs) receive soluble or complexed antigens via afferent lymphatics (3). Apart from mesenteric LNs (mLNs) that are also associated with modulating immunity on mucosal surfaces and that are chronically stimulated by gut-derived antigen, most LNs in healthy adults tend to be quiescent and lack GCs (4,5). Similarly, the spleen, which receives antigens delivered from the blood, is also generally immunologically quiescent in a healthy human (6).
Despite the intrinsic importance of tissue microanatomy and antigen acquisition for B cell responses, there has been a tendency to depend on studies of blood by flow cytometry, which is highly directed, to establish definitions of human B cell subsets and changes within these subsets in disease (7). Blood, which only accommodates a small fraction of all lymphocytes, contains a conglomerate population of cells with different migratory biases as demonstrated by differences in expression of receptors for tissue site-associated endothelial ligands (8,9). Some subsets in blood, such as the circulating marginal zone B (MZB) cells, are named according to splenic microanatomy by shared phenotype, but the extent to which they are truly analogous is unclear (10)(11)(12)(13). In addition, although the connectivity between different tissues that are bridged by migrating cells has been identified by repertoire analysis and deep phenotypic comparisons, the extent to which different B cell subsets share clonal relatives and whether they remain tissue resident or migrate between tissue sites are unknown (14)(15)(16)(17).
Here, we determine the characteristics and relative distribution of human B cell subsets in matched samples of spleen, mLN, and appendiceal GALT using deep phenotyping, single-cell transcriptomics, and clonotype analysis. The strategy was to use experimentally unconnected methods to group cells and then to infer identities of the populations thus identified from published work. We observe several subsets of activated B cells (AcBs) in tissues and uncover evidence for tissue-based maturation from transitional (TS) to naive B cells. We then focus our attention on two clear subsets of MZB cells in tissues and blood and that we designate MZB-1 and MZB-2, because the nature of such cells is a controversial area needing further clarification (18).
Cells with the phenotype CD27 + IgM + IgD + have previously been demonstrated to develop from precursors expressing CD45RB via a pathway involving ligation of NOTCH2 (9,13,(17)(18)(19)(20)(21)(22). Despite this, these cells are often referred to as "unswitched memory" cells, especially those in blood (23)(24)(25). Going forward, it will be important to distinguish not only the unique derivation of CD27 + IgM + IgD + cells but also variants of MZB cells that arise from and differentiate along different cellular pathways and that are associated differently with disease into our understanding of B cell responses and pathology.
We analyzed eight matched samples of appendix, mLN, and spleen from deceased transplant organ donors with no known disease (table S2). After quality control (fig. S1, A and B), data were batchnormalized using the internal spleen control (fig. S1C) (26,27). We then made a three-way undirected comparison of the differential abundance of B cells in the tissues according to both position on the Uniform Manifold Approximation and Projection (UMAP) and intensity of expression of all markers except immunoglobulin G (IgG), using the Cydar package (27). Rather than clustering, "hyperspheres" generated by Cydar representing the cells with both similar position on the UMAP and similar median marker intensity were analyzed (28). Only hyperspheres containing 100 cells or more were used for downstream analysis (fig. S1D).
The expression of markers within individual hyperspheres, including CD27, IgD, IgM, and IgA that are helpful for designation of broad B cell classifications, was visualized on the UMAP (Fig. 1A) (19,21,22,29). Cell counts of hyperspheres identified similar distribution between donors within spleen and appendix samples, whereas mLN samples were more variable, falling between the other organs or overlapping with GALT ( fig. S1E). The cell count for individual hyperspheres on the UMAP demonstrated similar cell densities, with a slightly higher number of cells within CD27 − and CD27 + IgM high hyperspheres than those representing class-switched cells (fig. S1F). A three-way comparison of differential abundance of hyperspheres between tissues was performed. Hyperspheres with significantly different abundances between tissues were predominantly CD27 + IgM + , suggesting that the predominant intertissue variability is among unswitched, antigen-experienced cells (Fig. 1, A and B). For technical reasons, we were not able to use IgG in our analysis. However, a region of the UMAP in between IgA + -and IgM + -expressing cells that was IgA − IgM − IgD − and mostly CD27 + did not contain hyperspheres representing significant abundance differences between tissues; this likely corresponded to IgG-expressing B cells.
We next compared the phenotypic features of cells in the hyperspheres that differed significantly in abundance between tissues, using hierarchical clustering and k-means. This identified five major CD27 + clusters of hyperspheres and one CD27 − cluster (Fig. 1C). The hyperspheres in each cluster were color-coded according to the dendrogram and located on the UMAP for each tissue (Fig. 1D).
The single hypersphere in cluster 2, which was CD27 − , had significantly higher abundance in spleen, contained cells expressing IgM, IgD, CCR7, and CD10, although it did not express high levels of characteristic TS B cell markers CD38 or CD24. This small cluster was most closely linked in the dendrogram to the two further clusters of hyperspheres (clusters 3 and 4) that were also both relatively more abundant in spleen and that had the CD27 + IgM + IgD + phenotype of MZB cells. Although both phenotypically resembled MZB cells, one subset, which was designated MZB-1, had significantly higher expression of CCR7, BAFFR, CD24, and CD27 than the other, which was designated MZB-2 (Fig. 1D). The remaining clusters of hyperspheres 1, 5, and 6, which were all most abundant in GALT and mLN, comprised IgA-expressing B cells including some CD10-expressing GC variants (cluster 1), IgM-expressing GC cells (cluster 5), and IgM-only cells with the phenotype CD27 + IgM + IgD − (cluster 6) (Fig. 1, B to D).
We were interested in understanding MZB-1 and MZB-2 groups of cells, which Cydar identified as areas of interest, in more detail. To quantify the abundance differences between MZB-1 and MZB-2 more directly, B cell subsets were identified and grouped into bubbles using viSNE and SPADE as shown in the schematic diagram ( Fig. 1E and fig. S1G). The MZB subset was subdivided on the basis of CCR7 expression (Fig. 1F). BAFFR, CD24, and CD27 expression agreed with the phenotypes associated with the two MZB subsets found in the unsupervised analysis (Fig. 1G). MZB-1 cells were significantly more abundant than MZB-2 cells when data from the three tissues were combined [P < 0.05, analysis of variance (ANOVA)]. Both MZB subsets were more abundant in the spleen than in the other tissues (Fig. 1H). Therefore, undirected analysis of relative abundance of B cells defined by phenotype between tissues identified that most differences were in the CD27 + subsets and suggested the existence of MZB heterogeneity.

B cell complexity across tissues dissected by single-cell transcriptomics
To understand B cell variability in tissues in more depth, three representative sets of tissues from those analyzed by CyTOF were selected for analysis by single-cell RNA sequencing (scRNA-seq) (table S2). To facilitate identification of B cell subsets, sorted CD19 + cells were surface-labeled with Total-Seq-C antibodies before capture on the 10X Chromium controller ( fig. S2A and table S1). Gene expression, antibody detection tag (ADT), and V(D)J libraries were then prepared according to the manufacturer's instructions and sequenced. After quality control and normalization, data from all tissues and donors were integrated ( fig. S2, B to D). After checking that cells did not separate according to donor on a UMAP ( fig. S2D), cell clusters were subsequently identified according to the expression of 3000 variable genes. The expression of key genes and cell surface markers was visualized on the UMAP ( Fig. 2A).
Fifteen distinct clusters were identified and named by reference to transcript profile and cell surface phenotype (tables S3 and S4), visualized on the UMAP (Fig. 2B), and summarized in Fig. 2C. Clusters including TS, naive, activated naive (aNAV), GC, and class-switched memory B cells were identifiable according to previous studies (22,30). Markers that identified them are shown in table S3. The aNAV subset, which is a reported precursor to antibody-secreting cells derived from the extrafollicular response, was distinguished from naive and TS B cell subsets by its high expression of CD19 and ITGAX (CD11c) and low expression of CXCR5, CD24, and CD38 ( fig. S2E) (21,23).
Four clusters of cells that expressed markers of activation but no classic markers of GC (BCL6 and BCL7a) were identified and designated AcB1 to AcB4. AcB1 resembled GC B cells most closely, expressing JUN, JUNB, and FOS. AcB2 had strong expression of the proliferation antigen PCNA but lacked other markers of activation. AcB3 was designated "activated" because it expressed the long noncoding RNA MALAT1 (table S3). MALAT1 is linked to class switching through its role in the alternative-nonhomozygous end joining pathway (31). This cluster also expressed the ZEB2 transcription factor associated with aNAV and the extrafollicular response (7). AcB4 had a strong signature of interferon-regulated genes, suggesting prior activation via this route. Two small nonadjacent clusters were termed double negative (DN) because they lacked both CD27 and IgD (Fig. 2C). However, they did not separate according to the classical features that discriminate between subsets of DN cells in blood ( fig. S2F) (23). Two clusters of MZB cells defined by cell surface CD27 + IgM + IgD + phenotype and the presence of CD27, IGHM, IGHD, CD1C, and PLD4 transcripts were identified (Fig. 2, A to C) (22).
Frequencies of somatic hypermutation (SHM) of IGHV genes of cells in clusters were generally consistent with expected properties   . Statistics for differential abundance between matched tissues for each subset were assessed by estimated marginal means with Bonferroni correction for multiple comparisons. *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001. according to cluster definitions by reference to previously published work ( Fig. 2D and fig. S3A) (17,32). The extent of SHM was highest in memory B cells and lowest in naive and TS B cells. GC B cells had undergone relatively few rounds of SHM. Clusters AcB1 to AcB3 each had undergone moderate SHM. Cells in MZB-2 had significantly lower levels of SHM than MZB-1 across all tissues ( fig. S3, B and C). The relative abundance of the B cell subsets between tissues was compared (Fig. 2, E and F). GC cells and AcB1 cells were both more abundant in the appendix than in mLN or spleen. The frequency of GC cells in these frozen samples of GALT, about 15%, was very similar to frequencies observed in freshly isolated GALT in a previous study (17). IgM-only B cells were more abundant in mLN than appendix as were AcB3 cells. The MZB-1 subset was proportionally underrepresented in appendix compared with spleen with mLN containing similar numbers as the latter. MZB-2 cells were lower in frequency than MZB-1 cells but not significantly more or less proportionately abundant at any tissue site. Thus, this analysis identified complexity in B cell subsets in tissues and provided further evidence for the existence of MZB subsets.

Developmental relationships between B cell subsets in tissues
We investigated the relationships between cells in subsets using B cell antigen receptor (BCR) and RNA velocity analysis. Using the Immcantation pipeline for analysis of adaptive immune receptor repertoire in heavy-chain genes, clones of two or more cells were identified with the following criteria: identical IGH V-J gene usage, identical CDR3 length, and minimum sequence similarity in CDR3 based on the Hamming distance between each sequence and its nearest neighbor ( fig. S4, A and B) (33). Clonal abundance (cells in clones as a proportion of the entire repertoire) was greater in spleen than other tissues ( fig. S4C). Of the antibody isotypes, IgG tended to have greater clonal abundance ( fig. S4C). Clonal diversity between isotypes, tissues, and subsets was visualized using a diversity profile curve; spleen and IgG had lower diversity compared with other tissues and isotypes ( fig. S4D). There was no significant CDR3 length difference across isotypes, tissues, or subsets ( fig. S4E). A total of 29,614 cells with both transcriptomic and IGH sequences were observed across all subsets, tissues, and donors including 4482 clones (groups of two or more related cells; table S5). Of the 4482 clones, 4025 were observed within a single tissue, and 457 were observed across two or three tissue sites. We considered the possibility that expectation is shaped by data from bulk sequencing protocols that pool identical sequences in a sample to a single sequence. We therefore looked at the frequencies of cells in clones with intraclonal identity or intraclonal variation. Of the 3696 clones observed in single tissue sites, 3393 (91.8%) contained identical clone members and would therefore have been pooled as single sequences in bulk sequencing data. Of the 457 clones with members in more than one tissue, 116 (25.4%) contained identical clone members. The B cell subset containing cells with the greatest tendency to be found in local clones in each tissue was AcB2, which was designated as an activated subset because of its expression proliferation antigen, PCNA (Fig. 3A).
We used a correlation matrix to evaluate the frequency of clone sharing between different B cell subsets to identify potential developmental relationships between subsets within tissue sites (Fig. 3B). Significant clone sharing was observed between TS and naive B cells in both appendix and spleen. Clone members were unmutated and only observed locally within the same tissue and not shared between different tissues. When both heavy-and light-chain rearrangements were captured, these were shared by clone members (fig. S5, A and B). Thus, this indicates proliferation once heavy-and light-chain genes have rearranged, rather than proliferation at the pre-B stage, and is consistent with the hypothesis that TS cells mature to naive B cells in GALT and spleen involving cell division ( fig. S5, A and B). The lack of shared clones between tissues suggests that clone members are coclustered spatially and that clone members become diluted once they leave the tissue site. The existence of such local clones involving immature cells in tissues is consistent with a low level of proliferation at these stages as observed previously (9,34).
In the appendix, MZB-1 and IgM-only B cell subsets were significantly clonally related, as were GC cells and DN-B cells, which locate adjacent to each other on the UMAP plot. In spleen, significant clonal relationships were observed between the DN-B and AcB1 subsets that were also proximal in the UMAP. No significant clonal relationships between subsets were observed in mLN (Fig. 3B).
Circos plots were used to visualize clone sharing between subsets across tissues. The MZB clones most likely spanning tissues were those containing MZB-1, although it should be noted that the tendency for a clone to span tissues is related to both the properties of the cells and the sizes of the clones that will both contribute to the probability of identifying them. In two of the three donors, clones containing MZB-1 spanned all three tissues. Clones containing widely disseminated memory cells and IgM-only B cells were also observed ( Fig. 3C).
Potential developmental trajectories were analyzed by RNA velocity. The value and restrictions of this method are debated (35). However, it provides a useful tool to enable speculation on developmental continuity between clusters of cells. The analysis largely supported the findings above ( Fig. 3D) (36). Transcriptomic progression from TS to naive B cells was observed in GALT and spleen. Developmental connections between GC, AcB1, and DN-B were apparent. AcB2 that has the greatest tendency to contain clonally related cells and has a proliferation signature, potentially reflecting an extrafollicular response, appeared to have a connection to aNAV, supporting other studies (23). There was evidence of memory B cell association with GC in GALT (32). MZB-1 and MZB-2 showed no notable connection (Fig. 3D). The cluster AcB3 characterized by the expression of MALAT1 and ZEB2 appeared to be a common "destination" for developmental trajectories, despite no evidence on the repertoire analysis of developmental associations with other subsets, possibly representing a transcriptomic state, rather than a stage in progression of specific subsets.
This analysis suggests tissue-based homeostatic proliferation during TS to naive B cell maturation. No evidence of clonal relationship between MZB-1 and MZB-2 was observed.

Human MZB subsets have different derivation and occupy different microanatomical space
The above analyses indicate the presence in humans of two subsets of MZB cells that, so far, differ in their phenotype (Fig. 1C), SHM ( Fig. 2D and fig. S3B), transcriptome (Fig. 2, B and C), their clonal relationships with other clusters (Fig. 3B), and the tendency for clone members to be identified at multiple sites (Fig. 3, A and C) (although it should be noted that the tendency for clone members to be observed at multiple sites is influenced by population and clone size). No suggestion of a developmental link between MZB-1 and MZB-2 was observed by RNA velocity analysis (Fig. 3D). Differentially expressed genes between MZB-1 and MZB-2 subsets are illustrated in a volcano plot (Fig. 4A). Genes selectively expressed by MZB-2 cells tended to be associated with cellular activation such as the human leukocyte antigen alleles, CD83 and MIF (37,38). In addition, MZB-2 cells selectively expressed the RNA helicase DDX21, which has been implicated in recognition of viral RNA (39). CD83 is consistently more abundantly expressed in the appendix in both MZB-1 and MZB-2 subsets. The preferential expression of CD37 by MZB-1 cells is consistent with its potential higher mobility (Fig. 4, A and B) (40).
MZB development in mice and humans involves ligation of NOTCH2 on B cells by delta-like 1 (19,41,42). We therefore sought evidence of enrichment of genes in the NOTCH pathway in the MZB subsets and found it in the MZB-2 subset only (Fig. 4C). The term "marginal zone" originally referred to the zone of B cells on the periphery of the white pulp and in the perifollicular region in human GALT (12,43). We therefore asked whether the subset complexity in MZB cells we observe by analysis of cells in suspension was also present in the splenic MZ by imaging mass cytometry (44).
The GC, composed of B cells, T cells, and macrophages, had areas of high B cell proliferation (Fig. 5A). The GC was surrounded by a mantle zone of IgD + CD27 − naive B cells (Fig. 5B). The mantle was, in turn, surrounded by IgD +/− CD1c + CD27 + cells, corresponding phenotypically to MZB cells identified in cell suspension (Fig. 5, B to D, and fig. S6A). Visualization of masked cells highlighted that the most peripheral B cell subset in the MZ was CD1c − and likely corresponded to memory B cells, which also had a peripheral distribution in GALT in an earlier study (Fig. 5, B and D) (17). DDX21 expression, a feature of MZB-2 cell subset (Fig. 4B), was significantly higher in CD1c + than CD1c − cells in the MZ (Fig. 5E).
Consistent with the expression of DDX21 by an MZB subset, we observed punctate nuclear and cytoplasmic DDX21 staining in a proportion of MZB cells (Fig. 5F). We identified a distribution bias in DDX21-expressing CD1c + cells in the MZ, with these being closer to the center than cells lacking DDX21. Relative to the distance between a cell and the center point (average X and Y position of all cells per image), the proportion of high and low DDX21 expression in CD1c + cells was compared (Fig. 5, G to K, and fig. S6B). If there was no spatial bias along the radial axis, then the expected slope of the weighted linear regression would be zero. In other words, on the basis of the linear regression slopes, DDX21 had a radial-biased distribution, whereas DNA did not (Fig. 5J). Visualization of DDX21high compared with DDX21-low MZB masked cells illustrated this radial-biased distribution (Fig. 5K).
In summary, only MZB-2 was enriched in genes of the NOTCH pathway. Microanatomically, CD1c + MZB and memory B cells occupied different locations within the MZ. In addition, of the intermingled CD1c-expressing MZB subsets, DDX21 + MZB-2 cells located significantly closer to the GC.

Analysis of MZB subsets in blood
We and others have previously studied MZB cells in blood (11,22,30,45) and identified a differentiation pathway from IgM hi TS cells that is reduced in severe systemic lupus erythematosus (SLE) (22). We therefore explored whether either or both of the MZB subsets identified here are analogous to previously studied circulating MZB cells.
To address this, we analyzed scRNA-seq data acquired from sorted blood CD19 + B cells from three healthy control donors and three patients with severe SLE (table S2). After normalization and quality control, data from all six donors were pooled ( fig. S7, A to C) and clustered according to 2000 variable genes. The 10 clusters identified by gene and surface protein expression (table S6) included 2 MZB clusters with the surface phenotype CD27 + IgM + IgD + and expression of CD1C and PLD4 that are hallmarks of MZB cells (Fig. 6, A to C, and fig. S7D) (22). We named these clusters MZB-1 and MZB-2 by comparison with MZB-1 and MZB-2 identified in tissues above. By analysis of the VDJ libraries ( fig. S8, A and B), we identified that the frequency of SHM in each circulating subset was consistent with the corresponding subset in tissue (Figs. 2D and 6D and figs. S3A and S8C). The MZB-1 subset had higher SHM than MZB-2 subset in blood ( fig. S8C). In both healthy individuals and patients with severe SLE, there was a significant tendency for MZB-1 cells to have clonal relationships with IgM-only B cells as in tissues (Fig. 6E). In contrast, MZB-2 showed a significant tendency to be clonally related to aNAV and DN-B cells in health. A clonal link between aNAV and DN-B cells was observed in health and severe SLE, consistent with published work (Fig. 6E) (22,23). To cross-reference the independent datasets acquired by mass cytometry and singlecell transcriptomic analyses of tissues and blood, we compared gene expression of those proteins that distinguished the two subsets of tissue MZB in the original mass cytometry analysis (Fig. 1C); this gene set also discriminated MZB-1 and MZB-2 in tissues and blood (Fig. 6, G and H).
We recently described that MZB cells can derive from IgM hi TS B cells, a feature that is associated with gut homing and that is depleted in severe lupus (22). We therefore asked how this pathway relates to MZB-1 and MZB-2. Although both subsets of MZB in blood were IgM hi , the MZB-1 subset was visibly linked to the naive B cell pool by an IgM hi bridge as described previously (Fig. 6F). In addition, expression of gut-homing 7 integrin was associated with MZB-1 in both blood and tissues (Fig. 6, I and J).
We observed that MZB-1 cells tend to be more abundant than MZB-2 cells in the blood in health but that the frequency of the MZB-1 subset was consistently reduced in severe SLE (Fig. 6K). As in previous studies, DN, aNAV, and class-switched memory B cells were increased in frequency in severe SLE (Fig. 6K) (22,23).
We then used CCR7 to distinguish between MZB-1 and MZB-2 by flow cytometry (fig. S9), using a panel comprising antibodies to CD19, CD27, IgD, IgM, CD1c, CCR7, and 7 integrin. Although flow cytometry does not have the power to resolve two clearly distinct entities as sharply as the single-cell approaches developed in the paper, it was clear that the frequency of MZB-1 cells was significantly greater than that of MZB-2 in health but not in severe SLE (Fig. 6L). MZB-1 was associated with greater 7 integrin expression than MZB-2 in health (P < 0.01, paired t test; Fig. 6M), consistent with Fig. 6I.
In summary, MZB-1 and MZB-2 are identifiable in blood and share features of these subsets in tissues. MZB-1 but not MZB-2 cells are depleted from the blood in lupus nephritis.

DISCUSSION
Here, we describe subsets of human B cells in matched appendix, mLN, and spleen samples from deceased transplant donors using multiparameter unsupervised methods. In addition to expected and previously unidentified tissue-associated subsets of AcBs, we observed two subsets of MZB cells, both of which are found at highest frequency in the spleen, within the MZ. MZB-1 cells are consistently more abundant than MZB-2 across all tissues independently and blood by CyTOF, scRNA-seq, and flow cytometry. The MZB-2 subset has a more diverse repertoire and fewer V region mutations.
Our recent analysis of circulating human B cell subsets identified IgM hi intermediary stages in the development of human MZB cells from TS B cells (22). Here, we refine those findings by showing that the IgM hi developmental pathway is relevant to the MZB-1 subset. In comparison, the MZB-2 subset has a signature of NOTCH-related genes, suggesting that this subset matures by NOTCH ligation, as described previously in humans and mice (19,41). Our findings suggest that the two MZB subsets are developmentally unrelated because although they each showed clonal relationships to cells in other clusters that are similar between tissues and blood, they were not significantly clonally related to each other in any dataset. RNA velocity analysis identified no evidence of developmental links between MZB subsets.
The MZ is traditionally associated with innate-like immune responses to bacterial antigens with repeating subunit structures, socalled T-independent type 2 antigens. Children less than 2 years of age (when MZB cells are poorly developed and consist of nonclonal cells with low mutation loads) and individuals who were splenectomized early in life are particularly at risk of infections with bacterial pathogens (46)(47)(48). Patients with severe SLE are also susceptible to infection with pneumococci, suggesting that it is principally the MZB-1 population, which we show here to be depleted in severe SLE, that provides protection (49). In contrast, the MZB-2 subset expressed the RNA helicase DDX21. DDX21, which is associated with regulation of RNA (50), is also associated with antiviral immunity (39,51). When members of this subset were detected in blood in health or lupus nephritis, they exhibited clonal relationships with the aNAV and DN subsets. aNAV and DN cells have been linked to the extrafollicular response in lupus and also severe acute respiratory syndrome coronavirus 2 infection (23,52).
Mouse MZB cell development is dependent on NOTCH2 (41,42). Mouse MZB cells are generally a naive subset with low frequencies of SHM, suggesting that human MZB-2 may be analogous to the mouse MZB subset (53). Heterogeneity within the mouse MZB subset has been described, for instance, NOTCH2-mediated plasticity in mouse MZB lineage (54)(55)(56). Thus, the differences between human and mouse MZB cells that have been noted for many years may relate to differences in relative proportions of MZB cell subsets derived through different pathways (10).
Whereas the importance of IgA in mucosal immunology is well defined, the role of IgM is less clear. There are both memory and clonally related plasma cells that express only IgM in human gut, suggesting that IgM-only cells are not merely recently aNAV B cells (14). The high number of mutations in the IgM-only B cell subset supports this concept. We have previously identified that CD27 + IgM + IgD + cells in GALT can be clonally related to IgM-only cells (17). However, IgM-only cells, class-switched memory cells, and CD27 + IgM + IgD + cells were not observed in the same clonal families in this previous study (17). This suggests that IgM-only cells in tissues are a heterogeneous population including both unswitched memory and MZB subtypes (17). The separate nature of MZB and classical memory is also supported by their different microanatomical distributions in GALT in a previous study (17) and in the spleen here. Thus, the B cell subset located most peripherally in the MZ appears to be memory cells rather than MZB themselves. Here, we identify IgM-only cells as a distinct transcriptome-driven cluster, with MZB-1 relatives and clone members that are disseminated between the three tissues. Of note, many IgM-only cells were clustered alongside IgA and IgG class-switched memory cells, consistent with the concept that the IgM-only subset contains distinct, albeit yet uncharacterized, functional groups. The GC cells predominatly used the IgM isotype and had high relative abundance of cells with low mutation frequencies. These features suggest that the role of GALT GCs in maturation of the naive repertoire and in sustaining IgM-only responses may have been underestimated (57).
We observed small clones of TS and naive B cells in appendix and spleen. These could only be observed within single sites of lymphoid tissue, suggesting that related cells occur within the same piece of tissue and that when they join the systemic circulating pool, clone members become separated and no longer detectable due to dilution. Detection of clones at multiple sites reflects not only the tendency of cells to migrate but also the size of the clone and therefore the probability of identifying clone members. When light chains were captured in addition to the heavy chains by which clones were identified, these were identical between clone members, supporting the concept of local proliferation of cells that were already fully mature in terms of BCR rearrangement. Activated TS cells have been observed previously in GALT (9). Selection occurs as B cells mature from TS to naive follicular and MZ cells, and it is possible that checkpoints in appendix and spleen involving cell division are involved in shaping the naive B cell repertoire (58).
We identified four subsets of AcB that are likely to represent different activation states or stages of differentiation. Only one, AcB3, was also identifiable in blood. AcB1 is transcriptionally similar to GC B cells; however, it does not express classic GC markers such as BCL6. In addition, although clonally related to GC cells, the AcB1 subset has higher mutation frequency, suggesting that it is a more terminally differentiated subset. Thus, the AcB1 subset could reflect cells that have just exited the GC, possibly on their way to undergo class switch recombination. AcB2 cells, which had the greatest tendency to be observed as clones, also expressed the proliferation antigen PCNA. This subset may represent extrafollicular proliferations of B cells. This subset was associated with aNAV cells that are components of extrafollicular responses in other studies (23).
Malignancies of MZB cells comprise different histogenetic types including MZB cell lymphoma of mucosa-associated lymphoid tissue (MALT lymphoma) (59) and splenic MZ lymphoma (SMZL) (60). It is possible that the two subsets of MZB cells that we describe could represent their benign analogs. MZB-1 could be benign analogs of MALT lymphoma. This malignancy can be driven by bacterial infections (59). These tumors tend to express IgM and can circulate from GALT via blood to the spleen where they tend to occupy the MZ (61). In contrast, MZB-2 could be benign analogs of SMZL. Cases of SMZL tend to localize to the spleen but not to other secondary lymphoid tissues, and they may arise through translocations involving the NOTCH pathway, activation of which is observed here in MZB-2 only (60). Understanding benign analogs of lymphoma subtypes may help future identification of drivers and thus potential therapeutic pathway inhibitors.
It is important to consider the caveats of the work we present. It should be borne in mind that this study is a deep observational study. Although it is possible to extrapoloate function and developmental origin from such datasets, it will be important to carry out functional validation assays of such inferred data in the future. Most of the tissues analyzed in the study were frozen samples of cells from tissue and blood donors. Although this allows us to carry out experiments with precious samples, validation using unprocessed cells will be important in the future. The blood studied here was exclusively from female donors because of the strong female gender bias of lupus nephritis. It will be important to validate this work across samples from donors representing the diversity of gender and ethnicity of the human population in the future.
Overall, the deep analysis of B cells in tissues that we present, which combined undirected methods of grouping similar cells with knowledge and reference-based subset alignments to the groups identified, provides a more accurate vision of tissue-based subsets and their interrelatedness within and between tissues than was previously available. There were organ-specific expression patterns within subsets, demonstrating that the local microarchitecture and milieu will determine cellular functions. The human MZB cell subset is considered to have putative innate immune function; our demonstration of the complexities within this subset highlights the requirement to better understand how its distinct microanatomical features relate to function and disease.

Study design
This study sought to understand the diverse features of and interrelationships between B cells in human tissues and their counterparts in blood in health and the severe autoimmune disease lupus nephritis. A series of deep observational methods were used including interrogation of cell surface phenotype by mass cytometry, of transcriptome at the single-cell level, and of imaging mass cytometry to place cells defined by complex marker sets in histological context.

Sample processing
Matched appendix, mLN, and spleen were collected from deceased transplant organ donors and stored in University of Wisconsin solution at 4°C (62). All tissue preparation and lymphocyte isolation procedures were performed with RPMI 1640 containing heatinactivated 10% fetal bovine serum (FBS), 2 mM l-glutamine, penicillin (100 IU/ml), and streptomycin (100 g/ml; RPMI-P/S) unless stated otherwise.
Cell suspensions from spleen were taken from about 2-cm 3 pieces of tissue. Whole mLNs were taken, and cell suspensions were prepared from the entire node. Cell suspensions from appendix used half of the appendix. Appendix, cut into 1-to 2-mm pieces, was incubated at 37°C for 30 min in medium with collagenase IV (1 mg/ml; Sigma-Aldrich) and deoxyribonuclease (DNase) I (1 mg/ml; Roche). Lightly digested appendix and fresh mLN were then teased through a 70-m nylon cell strainer, washed with medium, and resuspended for cryopreservation. Small sections of spleen were placed in gentle-MACS C tubes (Miltenyi) topped up with phosphate-buffered saline (PBS) + 2% FBS where the gentleMACS setting B ran three times in the gentleMACS Dissociator. The solution was then filtered through a 70-m nylon cell strainer (diluted with PBS as necessary). Spleen solution was layered onto Lymphoprep (STEMCELL Technologies) according to the manufacturer's instructions and then centrifuged at 800g for 20 min at 4°C with no brakes. The buffy coat was collected, washed with medium, centrifuged, and resuspended in red blood cell lysis buffer (0.17 M ammonium chloride) for 7 min at room temperature to lyse any remaining erythrocytes. Cells from each tissue were cryopreserved in freezing medium of FBS + 10% dimethyl sulfoxide (DMSO) in aliquots of 1 × 10 7 cells.
Blood samples were diluted 1:1 in RPMI-P/S and then layered onto Ficoll before centrifugation. The buffy coat layer was collected, and cells were washed. Peripheral blood mononuclear cells (PBMCs) were cryopreserved in FBS + 10% DMSO in aliquots of 1 × 10 7 cells.

Mass cytometry
Frozen samples were thawed in a 37°C water bath and washed in 2 ml of RPMI 1640 + 10% FBS and DNase I (0.1 mg/ml; Roche) and transferred to a larger volume of 10 ml and stained as described previously (22).

Preprocessing and normalization of mass cytometry data
Flow cytometry standard (FCS) files were normalized using bead standards and the Normalizer program developed by Nolan's group (v0.3) (26). The result of the bead normalization is visualized in fig. S1A. The mass cytometry data were initially manually gated in Cytobank (https://mrc.cytobank.org/) using the gating strategy shown in fig. S1B to identify live CD19 + B cells for downstream analysis.
For batch normalization purposes, each paired tissue set was run in the same batch with an internal biological sample from the same spleen as reference. Internal controls between batches were normalized by transforming the pooled intensity distribution of all batches toward a reference distribution (27). The range normalization method scaled the marker intensities so that the distribution range was the same for each batch. By comparing and matching the 1st and 99th percentiles of each batch to the reference distribution, a linear function was then defined and applied to all markers of all samples during normalization. In other words, overall distribution of each batch was adjusted on the basis of the minor distribution shifts of the internal control for that batch. The effects of normalization on the internal controls are visualized in fig. S1C.

Differential abundance analysis of mass cytometry data
After the data were normalized, cells were allocated into hyperspheres and then tested for differential abundance between tissues for each hypersphere while controlling for false discovery rate (27). Hyperspheres represent cells in regions of the data with similar marker phenotype. Unlike clustering, a cell can be allocated into more than one hypersphere. Low-abundance hyperspheres with average counts below 100 were filtered out ( fig. S1D). Multidimensional scaling plot was used to determine whether abundance differences were attributed to different tissue type and not biological variation ( fig. S1E). As developed previously by Lun et al. (27), empirical Bayes method was used to allow hypersphere-specific variation estimates even when replicate numbers were small (27,63).
To compare the cell counts between samples, significant differences in cell abundance between conditions were tested. The null hypothesis of no change in the average counts of cells between conditions for each hypersphere was tested using the negative binomial generalized linear model (NB GLMs) implementation in edgeR package (63). To control false discovery rates, spatial false discovery rate of 5% was used, which considers hypersphere densities.
Manual gating of mass cytometry data using SPADE viSNE was run on with 1,500,000 total events using all B cell markers except for CD19 and CD45 with 10,000 iterations, perplexity 50, and theta 0.5 on the Cytobank platform. SPADE was then run on the viSNE coordinates, and B cell subsets were identified and grouped into bubbles ( fig. S1G).

Cell sorting and CITE-seq antibody staining
Cryopreserved samples were thawed in a 37°C water bath and washed in RPMI with penicillin and streptomycin. After cell counting, 3 million cells per sample resuspended in PBS with 1% bovine serum albumin were transferred to Eppendorf LoBind Microcentrifuge tubes and washed. Cells were processed and stained as described previously with antibodies in table S2 (22) before sorted CD19 + cells were loaded onto the 10X Chromium controller.

Imaging mass cytometry
A panel containing 10 metal-tagged antibodies (table S1) was designed to identify and characterize immune populations expected to be present in the splenic white pulp. Formalin-fixed paraffinembedded samples of human spleen from three anonymous healthy adult donors were cut in 4-m-thick sections. Briefly, sections were deparaffinized, rehydrated, and subjected to antigen retrieval using a pressure cooker. Tissues were then blocked and incubated with the mix of metal-conjugated antibodies contained in the panel overnight at 4°C and subsequently incubated with the DNA intercalator Iridium Cell-ID Intercalator-Ir (Fluidigm) before being air-dried. Slides were then inserted into the Hyperion Imaging System and photographed to aid region selection. A total of 10 regions of about 1 mm 2 were selected to contain at least one identifiable GC and subsequently laser-ablated at 200-Hz frequency at 1 m/pixel resolution.

Imaging mass cytometry analysis
Single-cell segmentation was performed before single-cell protein expression analysis. Using CellProfiler, DNA staining was used to identify cell nuclei, and B cells were detected by masking nuclei that also contained CD20 (64). Resulting images showed morphologically appropriate cell boundaries and centers. Pixel-level composite images were created using histoCAT (65). Cytomapper was used to overlay the single-cell metadata onto the cell segmentation masks (66). Marker expressions were scaled by arcsinh with a factor of 0.1 for visualization purposes. The CD1c threshold of 1.2 was used. Rather than manually designating the center of the GC, the center point was determined by calculating the average X and Y position of every CD1c + cell for each image. Images with multiple follicles were excluded from downstream analysis (SPL_5 and SPL_7). The distance between the center point and each CD1c + cell was calculated.
This distance distribution was scaled such that the overall radius of each image was 1. In the histogram, counts were scaled by image such that the maximum count was 1 and the bins for the spatial distances were 0.05. Marker expression for each cell was classified as high or low if they were above or below the median marker expression per image. Linear regression was weighted by the cell counts in each bin. Paired and one-sample t tests were performed and illustrated using ggpubr (67).

scRNA-seq library preparation
Sorted CD19 + cell populations from three donors were loaded onto a 10X Genomics Chromium controller, and the libraries [5′ gene expression, V(D)J, and ADT] were prepared according to the manufacturer's guidelines (table S2). The Illumina HiSeq 2500 High Output platform was used for sequencing (30-100-100 sequencing configuration). Transcript alignment and generation of featurebarcode matrices for downstream analysis were performed using the 10X Genomics Cell Ranger workflow.

Single-cell transcriptome analysis of tissues
Using the Seurat R package (version 3.2.2), sorted CD19 + cells with high mitochondrial transcripts, low/high number of unique genes per cell, and low/high total RNA transcripts were filtered out (68,69). The threshold used was three times the mean absolute deviation of each sample ( fig. S2, B and C). B cells were isolated on the basis of the expression of B cell-specific genes (CD79A, CD79B, CD19, and MS4A1) and the absence of T cell-specific genes (CD2, CD3D, CD3E, CD3G, CD4, CD8B, and CD7) and TCR genes. Genes that were only expressed in 10 cells or fewer in the entire dataset were filtered out. In addition, IGHV and TCR genes were removed before downstream analysis. The data were transformed and integrated using the SCTransform and Integration workflow ( fig. S2D). For more efficient clustering and dimension reduction, principal components analysis (PCA) was performed on the top 3000 transcriptomic variable features, and the top 25 principal components were used for downstream analysis. Cells were clustered using the standard Seurat clustering with resolution of 1.3. Genes were deemed significantly differentially expressed using the Wilcoxon rank sum test with a log fold change threshold of 0.25 and expressed in 10% of either population (Seurat::FindClusters function). Clusters with similar differentially expressed gene lists were combined, and all the clusters were annotated using the gene lists shown in table S3.
Gene set enrichment analysis (GSEA) was done using the Broad Institute's GSEA software (70,71). All relevant NOTCH gene sets were downloaded from the Molecular Signatures Database and corrected for our dataset's background. Spliced and unspliced RNA count matrices were obtained by scVelo (72). RNA velocity was calculated using scVelo (36).

Single-cell transcriptome analysis of blood
A similar workflow as described above was performed on sorted CD19 + blood from healthy individuals and SLE patients with the following modifications. Genes that were only expressed in three cells or fewer in the entire dataset were filtered out. The standard Seurat integration workflow was used to normalize and integrated the samples (fig. S7, A to C). PCA was performed on the top 2000 variable features, and the top 20 principal components were used for downstream analysis. Cells were clustered using the standard Seurat graph-based approach with resolution of 1.8. Clusters were annotated using differentially expressed gene lists (table S6) where genes were deemed significantly differentially expressed as described above.

Single-cell BCR analysis
The BCR repertoire analysis used the Immcantation framework (version 4.0) (33). V, D, and J genes were assigned using IgBLAST. Nonproductive sequences were removed. The following criteria identified clones: identical IGH V-J gene usage, identical CDR3 junctional sequence length, and CDR3 junctional sequences having a minimum percentage sequence similarity based on the Hamming distance between each sequence and its nearest neighbor. Clonal threshold at 0.15 or 0.1 was determined from the Hamming distance (figs. S4B and S8B). To normalize the donor effect when looking at the ratio of detected disseminated clones to those found at one site only, the ratio in each subset was divided by the average dissemination ratio in that donor. To determine the clonal correlation within and between tissues (subset compositions on a per-cell basis within the clone), the Spearman correlations per donor along with the pairwise P values were calculated. These P values were then corrected for multiple interference using Holm's method in the R package RcmdrMisc (73). SHM was calculated using only the heavy-chain and productive rearrangements where the entire sequence is compared with the germline sequences to identify R and S mutations.

Flow cytometry
Cryopreserved PBMCs from healthy donors (n = 10) and patients with lupus nephritis (n = 10) were thawed, washed, and rested in RPMI with DNase and incubated for 45 min at 37°C before staining on ice for 15 min with pretitrated antibodies. All antibodies used were mouse anti-human, except Beta7 fluorescein isothiocyanate that was rat anti-human (table S1). Data were acquired on a BD Fortessa flow cytometer. FCS files were analyzed with FlowJo version 10.8.1 (Treestar) as described in fig. S9.

Statistics, data analysis, and visualization
All statistics, data analysis, and visualization were done in R (3.5.2 and 4.0.2) unless stated otherwise (74). Boxplots are in the style of Tukey where lower and upper hinges correspond to 25th and 75th percentiles, and whiskers extend from the hinge to the largest value no further than 1.5 times the interquartile range from the hinge. Data beyond the end of the whiskers are considered outlying points and plotted individually. We used paired t test and paired Wilcoxon test to compare two groups with paired variables that are normally and not normally distributed, respectively. We used two-way ANOVA to analyze experiments with multiple groups. Outliers were assessed by boxplot method, normality was assessed using Shapiro-Wilk's normality test, and homogeneity of variances was assessed by Levene's test. There were no extreme outliers (points three times the interquartile range), residuals were normally distributed (P > 0.05), and there was homogeneity of variances (P > 0.05). Consequently, an analysis of simple main effects for subsets was performed with statistical significance receiving a Tukey post hoc test adjustment. All pairwise comparisons (estimated marginal means) were analyzed between the different sample types organized by subsets, and P values were adjusted using Bonferroni correction. Differentially expressed gene P values were calculated using Wilcoxon test with Bonferroni correction for multiple comparisons. Statistically significant correlations were identified using Spearman correlation along with pairwise P values that were then corrected using Holm's method. Significance is indicated as follows: *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001.