NYU VisionX

university

https://www.sainingxie.com/

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

sayakpaul authored a paper 15 days ago

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

sainx authored a paper 21 days ago

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

sainx authored a paper about 1 month ago

Scaling Language-Free Visual Representation Learning

View all activity

nyu-visionx's activity

sayakpaul

authored a paper 15 days ago

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Paper • 2504.16080 • Published 15 days ago • 15

sainx

authored a paper 21 days ago

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published 23 days ago • 21

sainx

authored a paper about 1 month ago

Scaling Language-Free Visual Representation Learning

Paper • 2504.01017 • Published Apr 1 • 29

tsbpp

authored a paper about 1 month ago

Scaling Language-Free Visual Representation Learning

Paper • 2504.01017 • Published Apr 1 • 29

ellisbrown

updated a dataset about 1 month ago

nyu-visionx/CV-Bench

Viewer • Updated Apr 1 • 5.28k • 6.69k • 30

xcpan

updated a dataset about 1 month ago

nyu-visionx/pyramid_flow_ft_results

Viewer • Updated Mar 30 • 8.42k • 23

xcpan

published a dataset about 1 month ago

nyu-visionx/pyramid_flow_ft_results

Viewer • Updated Mar 30 • 8.42k • 23

xcpan

updated a model about 1 month ago

nyu-visionx/pyramid_flow_ft_ckpt

Updated Mar 30

xcpan

published a model about 1 month ago

nyu-visionx/pyramid_flow_ft_ckpt

Updated Mar 30

Riiiickkk

updated a dataset about 2 months ago

nyu-visionx/pisa-experiments

Updated Mar 18 • 123 • 1

sayakpaul

authored a paper about 2 months ago

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Paper • 2503.09641 • Published Mar 12 • 38

Riiiickkk

published a dataset about 2 months ago

nyu-visionx/pisa-experiments

Updated Mar 18 • 123 • 1

jihanyang

authored a paper 2 months ago

UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published Feb 27 • 30

xcpan

updated a dataset 2 months ago

nyu-visionx/oro_depth_reward

Viewer • Updated Feb 23 • 889k • 4

sayakpaul

posted an update 3 months ago

Post

3742

Inference-time scaling meets Flux.1-Dev (and others) 🔥

Presenting a simple re-implementation of "Inference-time scaling diffusion models beyond denoising steps" by Ma et al.

I did the simplest random search strategy, but results can potentially be improved with better-guided search methods.

Supports Gemini 2 Flash & Qwen2.5 as verifiers for "LLMGrading" 🤗

The steps are simple:

For each round:

1> Starting by sampling 2 starting noises with different seeds.
2> Score the generations w.r.t a metric.
3> Obtain the best generation from the current round.

If you have more compute budget, go to the next search round. Scale the noise pool (2 ** search_round) and repeat 1 - 3.

This constitutes the random search method as done in the paper by Google DeepMind.

Code, more results, and a bunch of other stuff are in the repository. Check it out here: https://github.com/sayakpaul/tt-scale-flux/ 🤗

xcpan

published a dataset 3 months ago

nyu-visionx/oro_depth_reward

Viewer • Updated Feb 23 • 889k • 4

sayakpaul

posted an update 3 months ago

Post

2083

We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc.

We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links:

* Models and datasets:

finetrainers
* finetrainers: https://github.com/a-r-r-o-w/finetrainers
* LoRA extraction: https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py

1 reply

sainx

authored a paper 3 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 121

jihanyang

authored a paper 3 months ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 121

sayakpaul

posted an update 3 months ago

Post

2032

We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen

AI & ML interests

Recent Activity

Team members 16

nyu-visionx's activity