[review] High-Resolution Image Synthesis with Latent Diffusion Models

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

[review] High-Resolution Image Synthesis with Latent Diffusion Models 본문

Reveiw/Paper

[review] High-Resolution Image Synthesis with Latent Diffusion Models

Rayi 2024. 5. 14. 23:43

Ludwig Maximilian University of Munich와 Heidelberg University에서 발표한 논문으로, Stable Diffusion으로도 잘 알려진 Latent Diffusion Models에 관한 논문입니다.

아래 링크에서 확인할 수 있습니다.

https://arxiv.org/abs/2112.10752

High-Resolution Image Synthesis with Latent Diffusion Models

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism t

arxiv.org

Abstract

Diffusion Moedel(DM)은 강력한 생성 모델이지만, pixel space에서 계산이 진행되기 때문에 상당한 양의 GPU 자원을 소모해야 한다는 단점이 있습니다. 본 연구는 이를 해결하기 위해 latent space를 도입한 Latent Diffusion Model(LDM)을 제시합니다. LDM은 DM의 정확성은 유지하면서 단점이었던 복잡성을 극복하여 강력한 생성모델임 증명해냈습니다.

1. Introduction

1.1. Democratizing High-Resolution Image Synthesis

DM은 기본적으로 likelihood 기반 모델입니다. Likelihood 기반 모델의 공통된 문제점은 우리가 쉽게 알아차릴 수 없는 세부사항까지 계산하는데 상당한 시간을 소모한다는 것입니다. DM에서는 이 문제를 해결하기 위해 학습 단계에서 표본 자체를 적게 선택하는 undersampling 방식을 사용하지만, 그럼에도 근본적인 해결책을 찾지는 못했습니다. 그 수가 적어도, 표본마다 학습과정에서 손실 함수에 대한 계산 및 최적화를 많이 해야 하기 때문입니다.

1.2. Departure to Latent Space

Likelihood 기반 모델들은 크게 학습 단계가 두 가지로 나뉩니다.

1) 세부적인 특징을 제거하는 지각적 압축(perceptual compression)

2) 실제 의미적인 요소를 학습하는 의미적 압축(semantic compression)

[그림 2] 지각적 압축(perceptual compression)과 의미적 압축(semantic compression)

[그림2] 는 각 학습 모델의 rate-distortion trade-off(압축 비율이 낮을수록 대상의 왜곡이 높아짐)를 보여줍니다. Autoencdoer+GAN의 경우를 보면 이미지를 압축하는 과정에 비해 실제로 왜곡되는 값이 크지 않다는 것을 알 수 있습니다. 이는 Perceptual compression 단계에서 계산량은 많지만 변화는 찾기 힘든 단계임을 나타냅니다. 따라서 본 연구에서는 학습시간에 영향을 주는 지각적 압축 부분을 해결하기로 했습니다. 그렇게 고안한 LDM은 먼저 autoencoder를 학습시켜 pixel space와 지각적으로는 동일한 공간인 latent space를 만들 수 있게 하고, 이 공간 안에서 DM을 학습하게 됩니다.

LDM의 중요한 점 중 하나는 autoencoding을 한 번 학습하면 다른 종류의 DM에서도 동일하게 활용할 수 있다는 것입니다. 또한 Transformer를 추가하면 token 기반의 모델에도 적용시킬 수 있습니다 (3.3절 참고).

2. Related works

- Generative Models for Image Synthesis

- Diffusion Probabilistic Model (DDPM)

- Two-Stage Image Synthesis

3. Method

1절에서 언급했듯이, peceptual compression에서 소비되는 자원을 줄이기 위해 autoencoder를 이용하여 해당 단계를 분리하는 작업을 진행합니다. 이 방법을 이용하면 크게 세 가지 이점이 있습니다.

1) 학습 표본이 더 낮은 차원에서 추출되기 때문에 더 효율적으로 계산할 수 있다

2) UNet 구조에서 학습된 편향값을 다음 학습 및 추론 과정에서 유용하게 사용할 수 있다

3) 한 번 학습한 autoencoder모델을 다른 생성 모델에서도 사용할 수 있다

3.1. Perceptual Image Compression

Perceptual compression은 이전 연구들과 동일하게 pixel space에서 계산하는 L1 / L2 loss를 사용합니다. 이 방법을 사용하면 모델을 통해 생성되는 값을 이미지 변형(image manifold)의 형태 내에서 제한시킬 수 있습니다.

$x$

3.2. Latent Diffusion Models

728x90

'Reveiw > Paper' 카테고리의 다른 글

[review] DiffusionRig: Learning Personalized Priors for Facial Appearance Editing (0)	2024.11.19
[review] SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation (0)	2024.05.06
[review] Large Scale Distributed Deep Networks (0)	2023.09.23
[review] TensorFlow: A system for large-scale machine learning (0)	2023.09.16
[review] Denoising Diffusion Probabilistic Models (0)	2023.04.30