AI Regulations in China
AI Regulations in the European Union (EU)
AI Regulations in the US
AI Regulations in India
Model safety
Synthetic & Generative AI
MLOps
Model Performance
ML Monitoring
Explainable AI
Synthetic & Generative AI

Diffusion Models

Models designed to generate realistic, high-resolution images of varying quality.

Diffusion Models are a class of generative models designed to generate realistic, high-resolution images of varying quality. They work by iteratively applying Gaussian noise to the original data in the forward diffusion process and then learning to recover the data by reversing the noising process. Post training, the diffusion model can generate data by passing randomly sampled noise through the learned denoising process. In addition to achieving advanced image quality, Diffusion Models offer several advantages, such as not requiring adversarial training, scalability and parallelizability.

Let's understand the diffusion process in detail:

Denoising diffusion modelling involves a two-step process:

  • Forward Diffusion Process: In this step, a Markov chain of diffusion steps is performed. Noise is systematically and randomly introduced to the original data gradually. The purpose is to simulate a diffusion process where noise is added over time, creating a sequence of data points.
  • Reverse Diffusion Process: The reverse diffusion process attempts to undo or reverse the effects of the forward diffusion. Its goal is to generate the original data from the noised or diffused version. By iteratively removing the added noise in a controlled manner, the reverse diffusion process aims to reconstruct the initial data, effectively restoring it to its original state.

Forward Diffusion Process

In the forward diffusion process, Gaussian noise is incrementally introduced to the input image x₀ over a sequence of T steps. The process begins by sampling a data point x₀ from the real data distribution q(x) (represented as x₀ ~ q(x)). Subsequently, Gaussian noise with a variance parameter βₜ is added to the previous latent variable xₜ₋₁, generating a new latent variable xₜ. This newly generated variable follows a distribution q(xₜ | xₜ₋₁), reflecting the conditional distribution of xₜ given xₜ₋₁. The gradual addition of noise over the T steps simulates the diffusion process, transforming the original input image into a sequence of progressively noised data points.

Credits: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

Where, q(xₜ​​∣xₜ₋₁​) is defined by the mean μ as:

And ∑ as ∑ₜ​=βₜ​I, where I is the identity matrix, and Σ will always be a diagonal matrix of variances. As the number of steps T approaches infinity, xₜ converges to an isotropic Gaussian distribution.

Reparameterization trick

The reparameterization trick is employed to address the computational challenge associated with sampling from q(xₜ | xₜ₋₁) and calculating xₜ, especially when dealing with a substantial number of steps. This trick provides a workaround, enabling us to sample xₜ efficiently at any given time step from the distribution. Instead of directly sampling xₜ, the reparameterization trick involves expressing the sampling operation in a way that separates the randomness from the parameters, making it amenable to straightforward and efficient sampling

Know more about the reparameterization trick here.

Reverse Diffusion Process

The reverse diffusion process involves training a neural network to reconstruct the original data by undoing the noise introduced during the forward pass. Estimating q(xₜ₋₁|xₜ) is challenging, since it can require the entire dataset. To overcome this, a parameterized model represented as p_θ (Neural Network) is employed to learn the relevant parameters. When βₜ is sufficiently small, the distribution approximates a Gaussian, simplifying the process by parameterizing only the mean and variance. This allows the neural network to effectively learn how to reverse the noise-induced changes and recover the original data.

Credits: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ 

The neural network is trained to predict the mean and variance for each time step. Here μ_θ(xₜ,t) is the mean, and ∑_θ(xₜ,t) is the covariance matrix.

In addition to achieving advanced image quality, Diffusion Models offer several advantages, such as not requiring adversarial training, scalability and parallelizability. The generated samples can then be used for various applications, like data augmentation, simulation, and generating creative content.

Some popular diffusion models include GLIDE, DALL.E-3 developed by OpenAI, Imagen created by Google, and Stable Diffusion.

References:

Here are some more articles you might find helpful:

Liked the content? you'll love our emails!

Thank you! We will send you newest issues straight to your inbox!
Oops! Something went wrong while submitting the form.

Is Explainability critical for your 'AI' solutions?

Schedule a demo with our team to understand how AryaXAI can make your mission-critical 'AI' acceptable and aligned with all your stakeholders.