Many data analysis problems occur in cosmology because of ill-posed inverse problems, e.g.
when the measurements are too much mixed, because of survey mask, noise corruption, ...
21 cm foreground removal
Weak Lensing Mass-map reconstruction
+ shape noise
Classical examples of signal priors
Sparse
$$ \log p(x) = \parallel \mathbf{W} x \parallel_1 $$
Gaussian
$$ \log p(x) = x^t \mathbf{\Sigma^{-1}} x $$
Total Variation
$$ \log p(x) = \parallel \nabla x \parallel_1 $$
But what about learning the prior with deep generative models?
Mass-maps
Jeffrey et al. (2021) DES Y3
2pt functions and higher order statistics (3pt functions, Wavelet peak counts, $\ell_1$-norm, scattering transform, neural summaries, ...)
Peel et al. (2021) Abel 520 cluster
Search for localized features: localized peaks visible in mass-maps but with no optical counterpart, uncertainty quantification
Gravitational lensing
Galaxy shapes as estimators for gravitational shear
$$ e = \gamma + e_i \qquad \mbox{ with } \qquad e_i \sim \mathcal{N}(0, I)$$
We are trying the measure the ellipticity $e$ of
galaxies as an estimator for the gravitational shear $\gamma$
The Weak Lensing Mass-Mapping as an Inverse Problem
Explicit likelihood: train the network only once, data fidelity garanteed at inference ✅
Remy et al. (2022) Posterior mean
Remy et al. (2022) Posterior samples
Bayesian Modeling
$$\boxed{\gamma = \mathbf{P}\kappa + n}$$
$\mathbf{P}$ is known and encodes our physical understanding of the problem
$\Longrightarrow$ Non-invertible (survey mask, shape noise), the inverse problem is ill-posed with no unique solution $\kappa$
The Bayesian view of the problem:
$$ \boxed{p(\kappa | \gamma) \propto p(\gamma | \kappa) \ p(\kappa)} $$
$p(\gamma | \kappa)$ is the data likelihood, which contains the physics
$p(\kappa)$ is the prior knowledge on the solution.
$\gamma$
$\kappa$
In this perspective we can provide point estimates: Posterior Mean, Max, Median, etc.
and the full posterior $p(\kappa|\gamma)$ with Markov Chain Monte Carlo or Variational Inference methods
There is no close form expression for the full non-Gaussian prior of the convergence.
However:
We do have access to samples of full implicit prior through simulations: $X = \{x_0, x_1, \ldots, x_n \}$ with $x_i \sim \mathbb{P}$
$\kappa$TNG (Osato et al. 2021)
$\Longrightarrow$ Our strategy: Learn the prior from simulation,
and then sample the full posterior.
Sampling method: Annealed Hamiltonian Monte Carlo
Sampling convergence maps $\kappa \sim p(\kappa|\gamma)$ is very difficult due to the high dimensionality of the space ( $360 \times 360 \approx 10^{5}$ parameters).
Especially for MCMC algorithms because of curse of dimensionality leading to highly correlated chains.
We need to design an efficient sampler. $~~~~\kappa_1, \kappa_2, ..., \kappa_N \sim p(\kappa| \gamma)$
Hamiltonial Monte Carlo proposal for a step size $\alpha$:
$$\begin{array}{lcl} \boldsymbol{m}_{t+\frac{\alpha}{2}} & = & \boldsymbol{m}_t + \dfrac{\alpha}{2} \color{orange}{\nabla_{\kappa}\log p(\boldsymbol{\kappa}_t|\gamma)} \\
\boldsymbol{\kappa}_{t+\alpha} & = & \boldsymbol{\kappa}_t + \alpha \boldsymbol{\mathrm{M}}^{-1} \boldsymbol{\kappa}_{t+\frac{\alpha}{2}} \\
\boldsymbol{m}_{t+\alpha} & = & \boldsymbol{m}_{t+\frac{\alpha}{2}} + \dfrac{\alpha}{2} \color{orange}{\nabla_\kappa \log p (\boldsymbol{\kappa}_{t+\alpha}|\gamma)}
\end{array}$$
Annealing: convolve the posterior with a wide gaussian to always remain on high probability density.
$$p_\sigma(x) = \int p_{\mathrm{data}}(x')\mathcal{N}(x|x', \sigma^2)dx', ~~~~~~~~\sigma_1 > \sigma_2 > \sigma_3 > \sigma_4 $$
Sampling method: Annealed Hamiltonian Monte Carlo
Target $\sigma_0 \approx 0$
Initialization $\sigma_T$ is high
Annealing scheme
The score is all you need!
With the HMC proposal, you
only need access to the score of the posterior:
$$\color{orange}{\nabla_\kappa \log p (\kappa|\gamma)}$$
The score of the full posterior is simply:
$$\nabla_x \log p(x |y) = \underbrace{\nabla_x \log p(y |x)}_{\mbox{known}} \quad + \quad \underbrace{\nabla_x \log p(x)}_{\mbox{can be learned}}$$
$\Longrightarrow$ all we have to do is model/learn the score of the prior.
Neural Score Estimation by Denoising Score Matching
Denoising Score Matching: An optimal Gaussian denoiser learns the score of a given distribution.
If $x \sim \mathbb{P}$ is corrupted by additional Gaussian noise $u \in \mathcal{N}(0, \sigma^2)$ to yield
$$x^\prime = x + u$$
Let's consider a denoiser $r_\theta$ trained under an $\ell_2$ loss:
$$\mathcal{L}=\parallel x - r_\theta(x^\prime, \sigma) \parallel_2^2$$
Deep generative models can be used to provide data driven priors.
Explicit likelihood, uses of all of our physical knowledge.
$\Longrightarrow$ The method can be applied for varying PSF, noise, or even different instruments!
We implemented a new class of mass mapping method, providing the full posterior
$\Longrightarrow$ Find the highest quality convergence map of the COSMOS field online: https://zenodo.org/record/5825654
Thank you!
Training a Neural Score Estimator in practice
A standard UNet
We use a very standard residual UNet, and we adopt a residual
score matching loss:
$$ \mathcal{L}_{DSM} = \underset{\boldsymbol{x} \sim P}{\mathbb{E}} \underset{\begin{subarray}{c}
\boldsymbol{u} \sim \mathcal{N}(0, I) \\
\sigma_s \sim \mathcal{N}(0, s^2)
\end{subarray}}{\mathbb{E}} \parallel \boldsymbol{u} + \sigma_s \boldsymbol{r}_{\theta}(\boldsymbol{x} + \sigma_s \boldsymbol{u}, \sigma_s) \parallel_2^2$$
$\Longrightarrow$ direct estimator of the score $\nabla \log p_\sigma(x)$