Ming-V2
Collection
7 items
β’
Updated
β’
24
π Technical Report ο½ π Project Page ο½ π€ Hugging Face ο½ π€ ModelScope ο½ πΎ GitHub
Figure 1: Conceptual comparison and qualitative examples of MingTok.
# build MingTok
from mingtok.modeling_mingtok import MingTok
mingtok_model = MingTok.from_pretrained("inclusionAI/MingTok-Vision")
mingtok_model = mingtok_model.cuda()
img_path = "mingtok/asset/mingtok.png"
save_path = "mingtok/asset/mingtok_recon.png"
# loading original image
image = Image.open(img_path).convert("RGB")
processor = CenterCropProcessor(image_size=512, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
image = processor(image).cuda().unsqueeze(0)
# performing reconstruction
with torch.no_grad():
image_recon = mingtok_model.forward_enc_dec(image)
# latent = mingtok_model.low_level_encoder(image)
# semantic_feat = mingtok_model.semantic_decoder(latent)['x_norm_patchtokens']
# image_recon = mingtok_model.forward_pixel_decoder(semantic_feat)
output_mean = torch.Tensor([0.5,0.5,0.5]).view(1,-1,1,1).cuda()
output_std = torch.Tensor([0.5,0.5,0.5]).view(1,-1,1,1).cuda()
output_image = (image_recon*output_std + output_mean)[0]
output_image = T.ToPILImage()(output_image)
output_image.save(save_path)
Tokenizer | Res. | # Tokens | rFID β | PSNR β | SSIM β | LPIPS β |
---|---|---|---|---|---|---|
Specialized tokenizers | ||||||
SD-VAE | 256 | 1024 | 1.06 | 28.62 | 0.86 | - |
GigaTok | 256 | 256 | 0.51 | 21.32 | 0.69 | 0.21 |
VA-VAE | 256 | 256 | 0.26 | 28.59 | 0.80 | 0.09 |
HieraTok | 256 | 256 | 1.04 | 23.90 | 0.72 | 0.09 |
DC-AE | 512 | 64 | 0.22 | 26.15 | 0.71 | 0.08 |
MAE-Tok | 512 | 128 | 0.62 | - | - | - |
TexTok | 512 | 256 | 0.73 | 24.45 | 0.66 | 0.19 |
Unified tokenizers | ||||||
UniTok | 256 | 256 | 0.38 | - | - | - |
TokenFlow | 384 | 729 | 0.63 | 22.77 | 0.73 | - |
MingTok-Vision | 512 | 256 | 0.54 | 30.77 | 0.62 | 0.14 |
MingTok-Vision β | 512 | 256 | 0.38 | 31.09 | 0.64 | 0.12 |
@article{huang2025mingunivision,
title={Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer},
author={Huang, Ziyuan and Zheng, DanDan and Zou, Cheng and Liu, Rui and Wang, Xiaolong and Ji, Kaixiang and Chai, Weilong and Sun, Jianxin and Wang, Libin and Lv, Yongjie and Huang, Taozhi and Liu, Jiajia and Guo, Qingpei and Yang, Ming and Chen, Jingdong and Zhou, Jun},
journal={arXiv preprint arXiv:2510.06590},
year={2025}
}