GoodEnough
/

NiT-XL-Models

Model card Files Files and versions Community

nielsr HF Staff commited on 8 days ago

Commit

0a856dc

verified ·

1 Parent(s): e6b94cd

Improve model card with pipeline tag, library name, and links

Browse files

This PR improves the model card by:

- Adding the `pipeline_tag: unconditional-image-generation` for discoverability.
- Specifying the `library_name: diffusers`.
- Including links to the project page and the code repository for easy access to additional information.

Files changed (1) hide show

README.md +18 -3

README.md CHANGED Viewed

@@ -1,8 +1,23 @@
 ---
-license: mit
 datasets:
 - GoodEnough/NiT-Preprocessed-ImageNet1K
 ---
-Paper:
-- arxiv.org/abs/2506.03131

 ---
 datasets:
 - GoodEnough/NiT-Preprocessed-ImageNet1K
+license: mit
+pipeline_tag: unconditional-image-generation
+library_name: diffusers
 ---
+# Native-Resolution Image Synthesis
+The model was presented in the paper [Native-Resolution Image Synthesis](https://huggingface.co/papers/2506.03131).
+# Paper abstract
+We introduce native-resolution image synthesis, a novel generative modeling paradigm that enables the synthesis of images at arbitrary resolutions and aspect ratios. This approach overcomes the limitations of conventional fixed-resolution, square-image methods by natively handling variable-length visual tokens, a core challenge for traditional techniques. To this end, we introduce the Native-resolution diffusion Transformer (NiT), an architecture designed to explicitly model varying resolutions and aspect ratios within its denoising process. Free from the constraints of fixed formats, NiT learns intrinsic visual distributions from images spanning a broad range of resolutions and aspect ratios. Notably, a single NiT model simultaneously achieves the state-of-the-art performance on both ImageNet-256x256 and 512x512 benchmarks. Surprisingly, akin to the robust zero-shot capabilities seen in advanced large language models, NiT, trained solely on ImageNet, demonstrates excellent zero-shot generalization performance. It successfully generates high-fidelity images at previously unseen high resolutions (e.g., 1536 x 1536) and diverse aspect ratios (e.g., 16:9, 3:1, 4:3), as shown in Figure 1. These findings indicate the significant potential of native-resolution modeling as a bridge between visual generative modeling and advanced LLM methodologies.
+# Project Page
+https://wzdthu.github.io/NiT
+# Code
+https://github.com/WZDTHU/NiT