Probabilistic Mapping of Dark Matter with Neural Score Matching

Benjamin Remy

With : Francois Lanusse, Niall Jeffrey, Jia Liu, J.-L. Starck, Ken Osato

slides at b-remy.github.io/talks/ML-IAP2021

Gravitational lensing

Galaxy shapes as estimators for gravitational shear

$$ e = \gamma + e_i \qquad \mbox{ with } \qquad e_i \sim \mathcal{N}(0, I)$$

We are trying the measure the ellipticity $e$ of galaxies as an estimator for the gravitational shear $\gamma$

The Weak Lensing Mass-Mapping as an Inverse Problem

Shear $\gamma$

Convergence $\kappa$

$$\gamma_1 = \frac{1}{2} (\partial_1^2 - \partial_2^2) \ \Psi \quad;\quad \gamma_2 = \partial_1 \partial_2 \ \Psi \quad;\quad \kappa = \frac{1}{2} (\partial_1^2 + \partial_2^2) \ \Psi$$

$$\boxed{\gamma = \mathbf{P} \kappa}$$

Illustration on the Dark Energy Survey (DES) Y3

Jeffrey, et al. (2021)

Linear inverse problems

$\boxed{\gamma = \mathbf{A}\kappa + n}$

$\mathbf{A}$ is known and encodes our physical understanding of the problem.
$\Longrightarrow$ When non-invertible or ill-conditioned, the inverse problem is ill-posed with no unique solution $x$

The Bayesian view of the problem:
$$ p(\kappa | \gamma) \propto p(\gamma | \kappa) \ p(\kappa) $$

$p(\gamma | \kappa)$ is the data likelihood, which contains the physics

$p(\kappa)$ is the prior knowledge on the solution.

We can estimate for instance the Maximum A Posteriori solution:
$$\hat{\kappa} = \arg\max\limits_\kappa \ \log p(\gamma \ | \ \kappa) + \log p(\kappa)$$ $\hat \kappa = \arg\max\limits_\kappa \ - \frac{1}{2} \parallel \gamma - \mathbf{A} x \parallel_{\mathbf{\Sigma}}^2 + \log p(\kappa)$

Or estimate from the full posterior $p(\kappa|\gamma)$ with MCMC or Variational Inference methods.

$\boxed{\log p(\kappa)}$ ??

Classical examples of signal priors

Sparse

$$ \log p(x) = \parallel \mathbf{W} x \parallel_1 $$

Gaussian

$$ \log p(x) = x^t \mathbf{\Sigma^{-1}} x $$

Total Variation

$$ \log p(x) = \parallel \nabla x \parallel_1 $$

Illustration on the Dark Energy Survey (DES) Y3

Jeffrey, et al. (2021)

But what about learning the prior
with deep generative models?

The score is all you need!

Whether you are looking for the MAP or sampling with HMC or MALA, you only need access to the score of the posterior: $$\frac{\color{orange} d \color{orange}\log \color{orange}p\color{orange}(\color{orange}x \color{orange}|\color{orange} y\color{orange})}{\color{orange} d \color{orange}x}$$
- Gradient descent: $x_{t+1} = x_t + \tau \nabla_x \log p(x_t | y) $
- Langevin algorithm: $x_{t+1} = x_t + \tau \nabla_x \log p(x_t | y) + \sqrt{2\tau} n_t$

The score of the full posterior is simply: $$\nabla_\kappa \log p(\kappa |\gamma) = \underbrace{\nabla_\kappa \log p(\gamma |\kappa)}_{\mbox{known}} \quad + \quad \underbrace{\nabla_\kappa \log p(\kappa)}_{\mbox{can be learned}}$$ $\Longrightarrow$ all we have to do is model/learn the score of the prior.

Neural Score Estimation by Denoising Score Matching

Denoising Score Matching: An optimal Gaussian denoiser learns the score of a given distribution.
- If $x \sim \mathbb{P}$ is corrupted by additional Gaussian noise $u \in \mathcal{N}(0, \sigma^2)$ to yield $$x^\prime = x + u$$
- Let's consider a denoiser $r_\theta$ trained under an $\ell_2$ loss: $$\mathcal{L}=\parallel x - r_\theta(x^\prime, \sigma) \parallel_2^2$$
- The optimal denoiser $r_{\theta^\star}$ verifies: $$\boxed{\boldsymbol{r}_{\theta^\star}(\boldsymbol{x}', \sigma) = \boldsymbol{x}' + \sigma^2 \color{orange}{\nabla_{\boldsymbol{x}} \log p_{\sigma^2}(\boldsymbol{x}')}}$$

$\Bigg($

$\boldsymbol{x}'$

$-$

$\boldsymbol{r}_{\theta^\star}(\boldsymbol{x}', \sigma)$

$\Bigg)~/~\sigma^2=$

$\color{orange}{\nabla_{\boldsymbol{x}} \log p_{\sigma^2}(\boldsymbol{x}')}$

Efficient sampling by Annealed HMC

Even with gradients, sampling in high number of dimensions is difficult! Because of:
- Curse of dimensionality
- Highly correlated chains
$\Longrightarrow$ Use a parallel annealing strategy to effectively sample from full distribution.
We use the fact that our score network $\mathbf{r}_\theta(x, \sigma)$ is learning a noise-convolved distribution $\nabla \log p_\sigma$, where $$p_\sigma(x) = \int p_{\mathrm{data}}(x')\mathcal{N}(x|x', \sigma^2)dx', ~~~~~~~~\sigma_1 > \sigma_2 > \sigma_3 > \sigma_4 $$
Run many HMC chains in parallel, progressively annealing the $\sigma$ to 0, keep last point in the chain as independent sample.

Illustration on $\kappa$-TNG simulations

$$\nabla_\kappa \log p_\sigma(\kappa |\gamma) = \nabla_\kappa \log p_\sigma(\gamma |\kappa) \quad + \quad \color{orange}{\nabla_\kappa \log p_\sigma(\kappa)}$$

Illustration on $\kappa$-TNG simulations

True convergence map

Traditional Kaiser-Squires

Wiener Filter

Posterior Mean (ours)

Posterior samples

Probabilistic Mass-Mapping of the HST COSMOS field

COSMOS shear data from Schrabback et al. 2010

Prior learned from MassiveNuS at fiducial cosmology (320x320 maps at 0.4 arcsec resolution).

Known massive X-ray clusters indicated with crosses, along with their redshifts, right pannel shows cutouts of central cluster from multiple posterior samples.

Takeaways

Hybrid physical/deep learning modeling:
- Deep generative models can be used to provide data driven priors.
- Explicit likelihood, uses of all of our physical knowledge.
  $\Longrightarrow$ The method can be applied for varying PSF, noise, or even different instruments!

Neural Score Estimation is a scalable approach to learn a prior score.

Knowledge of the posterior score is all we need for Bayesian inference aka uncertain quantification.

We implemented a new class of mass mapping method, providing the full posterior
$\Longrightarrow$ recovered a very high quality convergence map of the COSMOS field.

Thank you!

Probabilistic Mapping of Dark Matter with Neural Score Matching

Benjamin Remy

Gravitational lensing

The Weak Lensing Mass-Mapping as an Inverse Problem

Illustration on the Dark Energy Survey (DES) Y3

Linear inverse problems

Classical examples of signal priors

Illustration on the Dark Energy Survey (DES) Y3

But what about learning the prior with deep generative models?

The score is all you need!

Neural Score Estimation by Denoising Score Matching

Efficient sampling by Annealed HMC

Illustration on $\kappa$-TNG simulations

Illustration on $\kappa$-TNG simulations

Probabilistic Mass-Mapping of the HST COSMOS field

Takeaways

But what about learning the prior
with deep generative models?