Add evaluation results and reconstructions

Browse files

Files changed (4) hide show

.gitattributes +2 -0
README.md +198 -50
input_grid.png +3 -0
recon_grid.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+input_grid.png filter=lfs diff=lfs merge=lfs -text
+recon_grid.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,66 +1,214 @@
 ---
-tags:
-- pytorch
-- vae
-- diffusion
-- image-generation
-- cc3m
 license: mit
-datasets:
-- pixparse/cc3m-wds
-library_name: transformers
-pipeline_tag: image-to-image
 ---
-# VAE - UNet-Style Autoencoder for 256x256 Image Reconstruction
-This model is a UNet-style Variational Autoencoder (VAE) trained on the [CC3M](https://huggingface.co/datasets/pixparse/cc3m-wds) dataset for high-quality image reconstruction and generation. It integrates adversarial, perceptual, and identity-preserving loss terms to improve semantic and visual fidelity.
-## Architecture
-- **Encoder/Decoder**: Multi-scale UNet architecture
-- **Latent Space**: 8-channel latent bottleneck with reparameterization (mu, logvar)
-- **Losses**:
-  - L1 reconstruction loss
-  - KL divergence with annealing
-  - LPIPS perceptual loss (VGG backbone)
-  - Identity loss via MoCo-v2 embeddings
-  - Adversarial loss via Patch Discriminator w/ Spectral Norm
-$$
-\mathcal{L}_{total} = \mathcal{L}_{recon} + \mathcal{L}_{PIPS} + 0.5 * \mathcal{L}_{GAN} + 0.1 *\mathcal{L}_{ID} + 10^{-6} *\mathcal{L}_{KL}
-$$
-## Training Config
-| Hyperparameter        | Value                      |
-|-----------------------|----------------------------|
-| Dataset               | CC3M (850k images)         |
-| Image Resolution      | 256 x 256                  |
-| Batch Size            | 16                         |
-| Optimizer             | AdamW                      |
-| Learning Rate         | 5e-5                       |
-| Precision             | bf16 (mixed precision)     |
-| Total Steps           | 210,000                    |
-| GAN Start Step        | 50,000                     |
-| KL Annealing          | Yes (10% of training)      |
-| Augmentations         | Crop, flip, jitter, blur, rotation |
-Trained using a cosine learning rate schedule with gradient clipping and automatic mixed precision (`torch.cuda.amp`)
-## Usage Example
-```python
-import torch
-from transfusion.modeling.vae.vae import VAE
-from transfusion.config.model import VAEConfig
-config = VAEConfig(...)
-vae = VAE(config, is_training=False)
-ckpt = torch.load("vae_final_model.pt", map_location="cpu")
-vae.load_state_dict(ckpt["vae_state_dict"], strict=False)
-vae.eval()
-with torch.no_grad():
-    output, _, _ = vae(input_tensor)

 ---
+language: en
 license: mit
+model-index:
+- name: vae-256px-cc3m
+  results:
+  - task:
+      type: image-generation
+    dataset:
+      name: cc3m-val
+      type: image
+    metrics:
+    - type: FID
+      value: 9.458456993103027
+    - type: LPIPS
+      value: 0.16319363744094453
+    - type: MoCo-ID-Loss
+      value: 0.0010187711972133096
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** en
+- **License:** mit
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

input_grid.png ADDED Viewed

Git LFS Details

SHA256: 754f10c32157ac3cb1cc7ee4659407b4e2ada6b713a50d8faef52d517b1f2922
Pointer size: 133 Bytes
Size of remote file: 71.3 MB

recon_grid.png ADDED Viewed

Git LFS Details

SHA256: cf023fc8f811689cb1a8eda86dc7362f823d0e709a7c8412eae92bf7563d6a78
Pointer size: 133 Bytes
Size of remote file: 75.2 MB