Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: nicheVI release #3172

Open
wants to merge 44 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
ce557a8
code for nicheVI
LevyNat Feb 3, 2025
2c27373
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 3, 2025
0e2808a
fix tests
LevyNat Feb 3, 2025
9bba0db
fix test folder name
LevyNat Feb 3, 2025
50b3ff7
remove duplicated name
LevyNat Feb 3, 2025
d08ce84
cleaning
LevyNat Mar 2, 2025
7981061
Merge remote-tracking branch 'origin/main' into nathan-nicheVI
LevyNat Mar 2, 2025
25434f7
move NicheLossOutput to module file
LevyNat Mar 2, 2025
c268f4a
change one-hot to torch.nn.functional implementation/ update classif …
LevyNat Mar 2, 2025
cbf2b35
update dataclass/classifier names
LevyNat Mar 2, 2025
24c89f3
rename _niche_de_core, remove commented code
LevyNat Mar 10, 2025
e63ccdb
update alpha/eta decoder tests
LevyNat Mar 10, 2025
16691c9
copy NicheRNASeqMixin/NicheVAEMixin content to model
LevyNat Mar 10, 2025
f4c8da2
Merge branch 'main' into nathan-nicheVI
LevyNat Mar 10, 2025
fead31c
remove vaemixin
LevyNat Mar 10, 2025
b481e34
add training assertion
LevyNat Mar 10, 2025
5070243
remove rnaseqmixin
LevyNat Mar 10, 2025
2a249e7
NicheLossOutput inherits from LossOutput
LevyNat Mar 10, 2025
be4aef9
update changelog
LevyNat Mar 10, 2025
f2a0a40
initialize NicheVI doc from resolVI
LevyNat Mar 10, 2025
6cf60f6
update doc until preliminary
LevyNat Mar 19, 2025
b82fec1
update to spatial_weight for loss
LevyNat Mar 20, 2025
dd80ec4
doc up to descriptive model
LevyNat Mar 20, 2025
c4c19e1
Merge branch 'main' into nathan-nicheVI
LevyNat Mar 20, 2025
69f58c5
full documentation
LevyNat Mar 23, 2025
0d09cd5
organise docstrings and imports for model
LevyNat Mar 23, 2025
6e4ea7e
docstrings for module
LevyNat Mar 23, 2025
b6c7f85
docstrings for DE
LevyNat Mar 23, 2025
9247f5d
docstrings for components
LevyNat Mar 23, 2025
513d2e1
update prints for preprocessing
LevyNat Mar 24, 2025
89c15d5
update MoG samples from 30 to 10
LevyNat Mar 24, 2025
e903329
Merge branch 'main' into nathan-nicheVI
ori-kron-wis Mar 25, 2025
b2643dc
minor refactor
LevyNat Mar 26, 2025
a076d4d
gene_likelihood default to poisson
LevyNat Mar 27, 2025
fb4bb0b
default radius for DE set to None
LevyNat Mar 27, 2025
326a1a4
spatial_weight=10 to default
LevyNat Mar 28, 2025
ccfa16f
include plot_DE_results as a dataclass method
LevyNat Mar 30, 2025
143b591
explain LFC
LevyNat Mar 30, 2025
b4966ec
remove plot_DE_import
LevyNat Apr 1, 2025
83cff0b
Merge branch 'main' into nathan-nicheVI
LevyNat Apr 1, 2025
80ab2a1
Merge branch 'main' into nathan-nicheVI
LevyNat Apr 1, 2025
27d8a12
correct for typos in doc
LevyNat Apr 1, 2025
d0aa175
Merge branch 'nathan-nicheVI' of github.com:scverse/scvi-tools into n…
LevyNat Apr 1, 2025
fdd6279
minor doc changes
LevyNat Apr 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ to [Semantic Versioning]. Full commit history is available in the
- Add supervised module class {class}`scvi.module.base.SupervisedModuleClass`. {pr}`3237`.
- Add get normalized function model property for any generative model {pr}`3238` and changed
get_accessibility_estimates to get_normalized_accessibility, where needed.
- Add {class}`scvi.external.NICHEVI` for representation of cells and
their environments in spatial transcriptomics {pr}`3172`.
- Add Early stopping KL warmup steps. {pr}`3262`.

#### Fixed
Expand Down
1 change: 1 addition & 0 deletions docs/api/user.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ import scvi
external.Decipher
external.RESOLVI
external.SysVI
external.NICHEVI
```

## Data loading
Expand Down
1 change: 1 addition & 0 deletions docs/tutorials/index_spatial.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@ notebooks/spatial/gimvi_tutorial
notebooks/spatial/tangram_scvi_tools
notebooks/spatial/stereoscope_heart_LV_tutorial
notebooks/spatial/cell2location_lymph_node_spatial_tutorial
notebooks/spatial/NicheVI_tutorial
```
3 changes: 3 additions & 0 deletions docs/user_guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,9 @@ scvi-tools is composed of models that can perform one or many analysis tasks. In
* - :doc:`/user_guide/models/resolvi`
- Generative model of single-cell resolved spatial transcriptomics
- :cite:p:`Ergen25`
* - :doc:`/user_guide/models/nichevi`
- Representation of cells and their environments in spatial transcriptomics
- :cite:p:`Levy25`
```

## General purpose analysis
Expand Down
150 changes: 150 additions & 0 deletions docs/user_guide/models/nichevi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# NicheVI

**NicheVI** (Python class {class}`~scvi.external.NICHEVI`) is a generative model of single-cell resolved spatial
transcriptomics that can subsequently be used for many common downstream tasks.

The advantages of NicheVI are:

- Provides a probabilistic low-dimensional representation of the state of each cell that is corrected for batch effects
and captures its gene expression profile and its environment.
- Enables differential expression analysis across niches while accounting for wrong assignment of molecules to cells.
- Scalable to very large datasets (>1 million cells).

The limitations of NicheVI include:

- Effectively requires a GPU for fast inference.
- Latent space is not interpretable, unlike that of a linear method.
- Assumes single cells are observed and does not work with low resolution ST like Visium or Slide-Seq.

```{topic} Tutorials:

- {doc}`/tutorials/notebooks/spatial/nicheVI_tutorial`
```

## Preliminaries

NicheVI takes as input spatially-resolved scRNA data. In addition to the gene expression matrix ${X}$ with $N$ cells and $G$ genes,
it requires for each cell $n$:
- the spatial coordinates of the cell $y_n$
- the cell type assignment (possibly coarse) $c_n \in \{1, ..., T\}$
- the batch assignment $s_n$.


As preprocessing, we take the $K$ nearest neighbors of a cell to define its niche using the Euclidean distance in physical space.
We characterize the niche by its cell-type composition and gene expression. We denote by ${\alpha_n}$ the $T$ dimensional vector of cell type
proportions among the $K$ nearest neighbors of the cell $n$. Its values are in the probability simplex.
The niche gene expression is defined as the average expression of each cell type present in the niche.
In practice, we leverage gene expression embeddings (PCA, scVI or similar) and characterize a cell type expression profile as the local average
embedding of cells of the same type. The average embeddings are stored in the matrix ${\eta_n} \in \mathbb{R}^{T \times D}$, where $D$ is the embedding dimension.
## Descriptive model

We propose a latent variable model aiming to capture both gene expression heterogeneity and spatial variation resulting from the micro-environment.
We assume these two sources of variability are both captured by a $P$-dimensional latent variable ($P \ll G$):

```{math}
:nowrap: true
\begin{align}
z_n \sim \mathbf{MixtureOfGaussians}(\mu_1, ..., \mu_M; \Sigma_1, ..,\Sigma_M; \pi_1, ...,\pi_M)
\end{align}
```

We assume that the observed counts for cell $n$ and gene $g$, $x_{ng}$, are generated from the following process:

```{math}
:nowrap: true
\begin{align}
\rho _n &= f_{w}\left( z_n, s_n \right) \\
x_{ng} &\sim \mathbf{NegativeBinomial}(\ell_n \rho_n, \theta_g),
\end{align}
```
where $\rho_n$ is the normalized gene expression, $\ell_n$ is the library size of cell $n$ and $\theta_g$ is the dispersion parameter for gene $g$. See {doc}`/user_guide/models/scvi` for more details.
The cell-type proportions of the cell's $K$ nearest neighbors are obtained as

```{math}
:nowrap: true
\begin{align}
\alpha_n &\sim \mathbf{Dirichlet}\left( f_{\omega}(z_n) \right),
\end{align}
```

Last, we assume that the neighboring cells' average expression profiles are obtained as

```{math}
:nowrap: true
\begin{equation}
\eta_{nt} \sim
\begin{cases}
\mathcal{N} \left(f_{\nu}^{t}(z_n) \right), & \text{if } \alpha_{t} > 0 \\
0, & \text{otherwise}
\end{cases}
\end{equation}
```

where $t=1,...,T$. $w$, $\omega$ and $\nu$ are neural network parameters.


## Inference

We want to maximize the evidence of the data, which can be decomposed as:

```{math}
:nowrap: true
\begin{align}
\log p \left( \alpha, x, \eta \mid s \right) = \log p \left(x \mid s \right) + \log p \left( \alpha, \eta \mid x, s \right).
\end{align}
```

NicheVI uses variational inference, specifically auto-encoding variational Bayes
(see {doc}`/user_guide/background/variational_inference`) to learn both the model parameters
(the neural network parameters, dispersion parameters, etc.) and an approximate posterior distribution.

## Tasks

Here we provide an overview of some of the tasks that NicheVI can perform. Please see {class}`scvi.external.NICHEVI`
for the full API reference.

### Dimensionality reduction

For dimensionality reduction, the mean of the approximate posterior $q_\phi(z \mid x)$ is returned by default.
This is achieved using the method:

```
>>> adata.obsm["X_nichevi"] = model.get_latent_representation()
```

$\phi$ is a set of parameters corresponding to inference neural networks (encoders).
Users may also return samples from this distribution, as opposed to the mean, by passing the argument `give_mean=False`.

### Estimation of normalized expression

In {meth}`~scvi.external.NICHEVI.get_normalized_expression` NicheVI returns the expected true expression value of $\rho_n$ under the approximate posterior. For one cell $n$, this can be written as:

```{math}
:nowrap: true

\begin{align}
\mathbb{E}_{q_\phi(z_n \mid x_n)}\left[f_{w}\left(z_{n}, s_n \right) \right]
\end{align}
```

### Differential expression

Differential expression analysis is achieved with {meth}`~scvi.external.NICHEVI.differential_expression`. \
We leverage the lvm-DE method (see {doc}`/user_guide/background/differential_expression`) and adapt it to spatial data by taking into account cell neighborhood expression in a bid to discard false positives due to contamination. \
Considering two groups $\textit{G1}$ and $\textit{G2}$ corresponding to different spatial contexts (for instance, astrocytes in two brain regions), the goal is to determine which genes have different expression levels between the two groups. When setting `niche_mode="true"`, we compute the group spatial neighborhoods $\textit{N1}$ and $\textit{N2}$, which are the spatial nearest neighbors of a different type than the cells in $\textit{G1}$, and $\textit{G2}$ respectively.


To determine the upregulated genes of $\textit{G1 vs G2}$, we compute DE between $\{\textit{G1, G2}\}$, $\{\textit{N1, G2}\}$ and $\{\textit{G1, N1}\}$: using lvm-DE, we test differences in expression levels $\rho_{n}$ to compute Log-Fold Changes (LFC). \
The upregulated genes for $\textit{G1, N1}$ define a set of local cell type markers, denoted $\mathcal{S}_1$. Conversely, if a gene is both higher expressed in $\textit{N1}$ compared to $\textit{G1}$ and $\textit{G1}$ compared to $\textit{G2}$, it is likely that the increased expression in $\textit{G1}$ is spurious.
We argue that the probability of a gene being a $\textit{local marker}$ could be a relevant score to filter spurious genes. To compute this score, we considered the upregulation of a gene in one group relative to the upregulation in its neighborhood: a local marker $g$ should verify

```{math}
:nowrap: true
\begin{align}
\mathit{LFC^{~g}_{G1~vs~G2}} > \mathit{LFC^{~g}_{N1~vs~G2}},
\end{align}
```

which means that the signal comes from cells in $\textit{G1}$ rather than their neighbors $\textit{N1}$. \
We select genes for which $\mathit{LFC_{G1~vs~G2}} > 0$ and use the genes $\mathcal{S}_1$ as truely differentially expressed. We also define $\mathcal{N}_1 = \{g|\mathit{LFC^{~g}_{G1~vs~G2}} > 0,~g \notin \mathcal{S}_1 \}$. \
We train a Gaussian process classifier on $\mathbf{X} = [LFC_{G1~vs~G2}~,~LFC_{N1~vs~G2}]$ to classify between the $\textit{local markers}$ $\mathcal{S}_1$ and the $\textit{neighborhood genes}$ $\mathcal{N}_1$. Once fitted, the classifier returns a local marker probability $p_g=\mathit{p}(g \in \mathcal{S}_1 | \mathbf{X})$ for each gene $g$, that we can compare to a given threshold $\tau$ to filter the neighborhood genes.
2 changes: 2 additions & 0 deletions src/scvi/external/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from .gimvi import GIMVI
from .methylvi import METHYLANVI, METHYLVI
from .mrvi import MRVI
from .nichevi import nicheSCVI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least first letter large for classes. Just call the class NICHEVI - will be more user friendly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate.

from .poissonvi import POISSONVI
from .resolvi import RESOLVI
from .scar import SCAR
Expand Down Expand Up @@ -32,4 +33,5 @@
"METHYLVI",
"METHYLANVI",
"RESOLVI",
"nicheSCVI",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a convention that we start a class name with Capital letter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as soon as I finish running experiments, because you can't load models if you changed the class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to release a reproducibility repo with the version of code that you used and use this for revision. See resolVI archive in my GitHub.

]
11 changes: 11 additions & 0 deletions src/scvi/external/nichevi/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from ._components import DirichletDecoder, NicheDecoder
from ._constants import NICHEVI_REGISTRY_KEYS
from ._model import nicheSCVI
from ._module import nicheVAE

__all__ = [
"nicheSCVI",
"nicheVAE",
"NicheDecoder",
"DirichletDecoder",
]
Loading