Text-to-Image
Diffusers
English
File size: 3,630 Bytes
fa31b3f
3896b0b
fa31b3f
 
 
 
 
 
2294e2e
fa31b3f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
library_name: diffusers
license: mit
datasets:
- uoft-cs/cifar10
- nyanko7/danbooru2023
language:
- en
pipeline_tag: text-to-image
---
# DDPM Project

This repository contains the implementation of Denoising Diffusion Probabilistic Models (DDPM).

## Table of Contents
- [Introduction](#introduction)
- [Installation](#installation)
- [Usage](#usage)
- [Contributing](#contributing)

## Introduction
Denoising Diffusion Probabilistic Models (DDPM) are a class of generative models that learn to generate data by reversing a diffusion process. This repository provides a comprehensive implementation of DDPM.

## Installation
To install the necessary dependencies, run:
```bash
pip install -r requirements.txt
```

## Usage
To train the model, use the following command:
```bash
python train.py
```
To generate samples, use:
```bash
python generate.py
```

## Game
To understand the model and it's workings, we're working on a cool cute little game where the user is the UNET reverser/diffusion model and is tasked to denoise the images with noise made of grids of lines.

Use [learndiffusion.vercel.app](learndiffusion.vercel.app) to access the primitive version of the game. You can also contribute to the game by checking out at the diffusion_game branch. A new model showcase will also be added such that the model's weights are loaded from the internet, model's files are installed and loaded into a gradio interface for direct use/inference on the vercel. Feel free to make changes for the same, issue is opened.

## Explanations and Mathematics
- slides from presentation : 
- notes/explanations : [HERE](slides\notes)
- a cute lab talk ppt: 
- plato's allegory : \<link to REPUBLIC>

## Resources
- Original Paper : https://arxiv.org/pdf/2006.11239
- Improvement Paper : https://arxiv.org/abs/2102.09672
- Improvement by OpenAI : https://arxiv.org/pdf/2105.05233
- Stable Diffusion Paper : https://arxiv.org/abs/2112.10752
- 

### Papers for background
- UNET Paper for Biomedical Segmentation
- Autoencooder
- Variational Autoencoder
- Markov Hierarchical VAE
- Introductory Lectures on Diffusion Process

### Youtube videos and courses
#### Mathematics
- Outliers
- Omar Jahil

#### Pytorch Implementation
- [Deep Findr](https://www.youtube.com/watch?v=a4Yfz2FxXiY)
- [Notebook from Deep Findr](https://colab.research.google.com/drive/1sjy9odlSSy0RBVgMTgP7s99NXsqglsUL?usp=sharing)

## Pretrained Weights
weights from the model can be found in [pretrained_weights](https://drive.google.com/drive/folders/1NiQDI3e67I9FITVnrzNPP2Az0LABRpic?usp=sharing)

For loading the pretrained weights:
```
model2 = SimpleUnet()
model2.load_state_dict(torch.load("/content/drive/MyDrive/Research Work/mlsa/DDPM/model_weights.pth"))
model2.eval()
```

For making inferences
TODO: Errors in the sampling function, boolean errors and etc. Will open issues for solving by others as exercise if needed.
```
num_samples = 8  # Number of images to generate
image_size = (3, 32, 32)  # Example for CIFAR10
noise = torch.randn(num_samples, *image_size).to("cuda")

model2.to("cuda")
# Generate images by denoising
with torch.no_grad():
    generated_images = model2.sample(noise)

# Save the generated images
save_image(generated_images, "generated_images.png", nrow=4, normalize=True)
```


## Contributing
Contributions are welcome! Please open an issue or submit a pull request.


## Future Ideas
- Make the model onnx compatible for training and inferencing on Intel GPUs
- Build a Stable Diffusion model Text2Img using CLIP implementationnnnn !!!
- Train the current model for a much larger dataset with more generalizations and nuances