$\mathbf{A}$ is known and encodes our physical understanding of the problem.
$\Longrightarrow$ When non-invertible or ill-conditioned, the inverse problem is ill-posed with no unique solution $x$
Deconvolution
Inpainting
Denoising
$\boxed{y = \mathbf{A}x + n}$
The Bayesian view of the problem:
$$ p(x | y) \propto p(y | x) \ p(x) $$
$p(y | x)$ is the data likelihood, which contains the physics
$p(x)$ is the prior knowledge on the solution.
With these concepts in hand, we can estimate for instance the Maximum A Posteriori solution:
$$\hat{x} = \arg\max\limits_x \ \log p(y \ | \ x) + \log p(x)$$
For instance, if $n$ is Gaussian, $\hat x = \arg\max\limits_x \ - \frac{1}{2} \parallel y - \mathbf{A} x \parallel_{\mathbf{\Sigma}}^2 + \log p(x)$
Or estimate from the full posterior p(x|y) with MCMC or Variational Inference methods.
How do you choose the prior ?
Classical examples of signal priors
Sparse
$$ \log p(x) = \parallel \mathbf{W} x \parallel_1 $$
Gaussian
$$ \log p(x) = x^t \mathbf{\Sigma^{-1}} x $$
Total Variation
$$ \log p(x) = \parallel \nabla x \parallel_1 $$
But what about learning the prior with deep generative models?
The evolution of generative models
Deep Belief Network (Hinton et al. 2006)
Variational AutoEncoder (Kingma & Welling 2014)
Generative Adversarial Network (Goodfellow et al. 2014)
Wasserstein GAN (Arjovsky et al. 2017)
Not all generative models are created equal
Grathwohl et al. 2018
GANs and VAEs are very common and successfull but do not fit our purposes.
We would want a model with explicit likelihood, which can evaluate $\log p_\theta(x)$.
The score is all you need!
Whether you are looking for the MAP or sampling with HMC or MALA, you
only need access to the score of the posterior:
$$\frac{\color{orange} d \color{orange}\log \color{orange}p\color{orange}(\color{orange}x \color{orange}|\color{orange} y\color{orange})}{\color{orange}
d
\color{orange}x}$$
The score of the full posterior is simply:
$$\nabla_x \log p(x |y) = \underbrace{\nabla_x \log p(y |x)}_{\mbox{known}} \quad + \quad \underbrace{\nabla_x \log p(x)}_{\mbox{can be learned}}$$
$\Longrightarrow$ all we have to do is model/learn the score of the prior.
Neural Score Estimation by Denoising Score Matching
Denoising Score Matching: An optimal Gaussian denoiser learns the score of a given distribution.
If $x \sim \mathbb{P}$ is corrupted by additional Gaussian noise $u \in \mathcal{N}(0, \sigma^2)$ to yield
$$x^\prime = x + u$$
Let's consider a denoiser $r_\theta$ trained under an $\ell_2$ loss:
$$\mathcal{L}=\parallel x - r_\theta(x^\prime, \sigma) \parallel_2^2$$
We use a very standard residual UNet, and we adopt a denoising
score matching loss:
$$\mathcal{L}_{\mathrm{DAE}} = \mathbb{E}_{\boldsymbol{x}^\prime \sim p_{\sigma^2}}\left[ \|\boldsymbol{x} - \boldsymbol{r}_\theta(\boldsymbol{x}', \sigma) \|_2^2 \right]$$
Lipschitz regularization to improve robustness:
Without regularization
With regularization
Sampling method: Annealed Hamiltonian Monte Carlo
Sampling convergence maps $\kappa \sim p(\kappa|\gamma)$ is very difficult due to the high dimensionality of the space ( $360 \times 360 \approx 10^{5}$ parameters).
Especially for MCMC algorithms because of curse of dimensionality leading to highly correlated chains.
We need to design an efficient sampler. $~~~~\kappa_1, \kappa_2, ..., \kappa_N \sim p(\kappa| \gamma)$
Hamiltonial Monte Carlo proposal for a step size $\alpha$:
$$\begin{array}{lcl} \boldsymbol{m}_{t+\frac{\alpha}{2}} & = & \boldsymbol{m}_t + \dfrac{\alpha}{2} \color{orange}{\nabla_{\kappa}\log p(\boldsymbol{\kappa}_t|\gamma)} \\
\boldsymbol{\kappa}_{t+\alpha} & = & \boldsymbol{\kappa}_t + \alpha \boldsymbol{\mathrm{M}}^{-1} \boldsymbol{\kappa}_{t+\frac{\alpha}{2}} \\
\boldsymbol{m}_{t+\alpha} & = & \boldsymbol{m}_{t+\frac{\alpha}{2}} + \dfrac{\alpha}{2} \color{orange}{\nabla_\kappa \log p (\boldsymbol{\kappa}_{t+\alpha}|\gamma)}
\end{array}$$
Annealing: convolve the posterior with a wide gaussian to always remain on high probability density.
$$p_\sigma(x) = \int p_{\mathrm{data}}(x')\mathcal{N}(x|x', \sigma^2)dx', ~~~~~~~~\sigma_1 > \sigma_2 > \sigma_3 > \sigma_4 $$
Sampling method: Annealed Hamiltonian Monte Carlo
Target $\sigma_0 \approx 0$
Initialization $\sigma_T$ is high
Annealing scheme
Mass-maps
Jeffrey et al. (2021) DES Y3
2pt functions and higher order statistics (3pt functions, Wavelet peak counts, $\ell_1$-norm, scattering transform, neural summaries, ...)
Peel et al. (2021) Abel 520 cluster
Search for localized features: localized peaks visible in mass-maps but with no optical counterpart, uncertainty quantification
Gravitational lensing
Galaxy shapes as estimators for gravitational shear
$$ e = \gamma + e_i \qquad \mbox{ with } \qquad e_i \sim \mathcal{N}(0, I)$$
We are trying the measure the ellipticity $e$ of
galaxies as an estimator for the gravitational shear $\gamma$
The Weak Lensing Mass-Mapping as an Inverse Problem
Explicit likelihood: train the network only once, data fidelity garanteed at inference ✅
Remy et al. (2022) Posterior mean
Remy et al. (2022) Posterior samples
Bayesian Modeling
$$\boxed{\gamma = \mathbf{P}\kappa + n}$$
$\mathbf{P}$ is known and encodes our physical understanding of the problem
$\Longrightarrow$ Non-invertible (survey mask, shape noise), the inverse problem is ill-posed with no unique solution $\kappa$
The Bayesian view of the problem:
$$ \boxed{p(\kappa | \gamma) \propto p(\gamma | \kappa) \ p(\kappa)} $$
$p(\gamma | \kappa)$ is the data likelihood, which contains the physics
$p(\kappa)$ is the prior knowledge on the solution.
$\gamma$
$\kappa$
In this perspective we can provide point estimates: Posterior Mean, Max, Median, etc.
and the full posterior $p(\kappa|\gamma)$ with Markov Chain Monte Carlo or Variational Inference methods
We learn a hybriod Denoiser: theoretical Gaussian on large scale, data-driven on small scales using N-body simulations.
$$\underbrace{\nabla_{\boldsymbol{\kappa}} \log p(\boldsymbol{\kappa})}_\text{full prior} = \underbrace{\nabla_{\boldsymbol{\kappa}} \log p_{th}(\boldsymbol{\kappa})}_\text{gaussian prior} +
\underbrace{\boldsymbol{r}_\theta(\boldsymbol{\kappa}, \nabla_{\boldsymbol{\kappa}} \log p_{th}(\boldsymbol{\kappa}))}_\text{learned residuals}$$
Uncertainty quantification in Magnetic Resonance Imaging (MRI)
Ramzi, Remy, Lanusse et al. 2020
$$\boxed{y = \mathbf{M} \mathbf{F} x + n}$$
$\Longrightarrow$ We can see which parts of the image are well constrained by data, and which regions are uncertain.
Takeaways
Hybrid physical/deep learning modeling:
Deep generative models can be used to provide data driven priors.
Explicit likelihood, uses of all of our physical knowledge.
$\Longrightarrow$ The method can be applied for varying PSF, noise, or even different instruments!
Neural Score Estimation is a scalable approach to learn a prior score.
We implemented a new class of mass mapping method, providing the full posterior
$\Longrightarrow$ Find the highest quality convergence map of the COSMOS field online: https://zenodo.org/record/5825654