PyTorch Image Models

https://github.com/rwightman/pytorch-image-models

Activity Feed

AI & ML interests

Computer Vision

Recent Activity

rwightman new activity about 4 hours ago

timm/vit_pe_lang_gigantic_patch14_448.fb:Add project page link

rwightman new activity 2 days ago

timm/vit_pwee_patch16_reg1_gap_256.sbb_in1k:Update train_hparams.yaml

rwightman updated a model 8 days ago

timm/vit_pe_core_gigantic_patch14_448.fb

View all activity

rwightman

in timm/vit_pe_lang_gigantic_patch14_448.fb about 4 hours ago

Add project page link

#1 opened about 22 hours ago by

nielsr

rwightman

in timm/vit_pwee_patch16_reg1_gap_256.sbb_in1k 2 days ago

Update train_hparams.yaml

#1 opened 3 days ago by

tony0278611

merve

posted an update 3 days ago

Post

385

Dataset Viewer for PDFs just landed on Hugging Face 📖🤗 you can now preview all the PDFs easier than before!

on top of this, there's PdfFolder format to load the PDF datasets quicker 💨
> to use it, your dataset should follow a directory format like folder/train/doc1.pdf, folder/train/doc1.pdf
> if you want to include bounding boxes, labels etc. you can keep them in a metadata.csv file in the same folder 🤝

read document dataset docs https://huggingface.co/docs/datasets/main/en/document_dataset
check all the document datasets here https://huggingface.co/datasets?modality=modality:document&sort=trending 📖

merve

posted an update 4 days ago

Post

525

we've merged LightGlue keypoint matcher to Hugging Face transformers! it allows commercial use when paired with an open-source keypoint detector 🙏🏻

it works very well, try it yourself: ETH-CVG/LightGlue

here's an in-the-wild test with two images of the same place ⤵️

1 reply

merve

posted an update 5 days ago

Post

4240

Release picks of the past week is here! Find more models, datasets, Spaces here merve/june-20-releases-68594824d1f4dfa61aee3433

🖼️ VLMs/OCR
> moonshotai/Kimi-VL-A3B-Thinking-2506 is a powerful reasoning vision LM, 3B active params, smarter with less tokens, supports long documents, videos 👏 (OS)
> nanonets/Nanonets-OCR-s is 3.75B params OCR model based on Qwen2.5VL-3B-Instruct (OS)

💬 LLMs
> moonshotai/Kimi-Dev-72B is a strong coding model based on Qwen2.5-72B (OS)
> Mistral released mistralai/Mistral-Small-3.2-24B-Instruct-2506, an update to their former model with better function calling & instruction following (OS)

🗣️ Audio
> Google released google/magenta-realtime, real time music generation & audio synthesis (cc-by-4)
> kyutai released new speech-to-text models that come in 1B & 2B ( kyutai/stt-1b-en_fr, stt-2b-en_fr) with 0.5s and 2.5s delay

3D
> Tencent released tencent/Hunyuan3D-2.1 an image-to-3D model (see below)

merve

posted an update 6 days ago

Post

4968

fav open-source multimodal reasoning model just got an update 🔥

moonshotai/Kimi-VL-A3B-Thinking-2506 has
> smarter with less tokens, small size (only 3B active params!!!)
> better accuracy
> video reasoning
> higher resolution 🤯
Read their blog https://huggingface.co/blog/moonshotai/kimi-vl-a3b-thinking-2506

rwightman

updated 6 models 8 days ago

rwightman

published 6 models 8 days ago

timm/vit_pe_spatial_gigantic_patch14_448.fb

Image Feature Extraction • Updated 8 days ago • 21

timm/vit_pe_lang_large_patch14_448.fb

Image Feature Extraction • Updated 8 days ago • 191 • 1

timm/vit_pe_lang_gigantic_patch14_448.fb

Image Feature Extraction • Updated about 4 hours ago • 23

timm/vit_pe_core_large_patch14_336.fb

Image Feature Extraction • Updated 8 days ago • 186

timm/vit_pe_core_gigantic_patch14_448.fb

Image Feature Extraction • Updated 8 days ago • 28

timm/vit_pe_core_base_patch16_224.fb

Image Feature Extraction • Updated 8 days ago • 507 • 1

merve

posted an update 8 days ago

Post

2232

y'all have been asking my opinion on how OCR models compare to each other 👀
I will leave three apps to compare newest models by @prithivMLmods instead ⤵️
> compare Nanonets-OCR-s, Qwen2-VL-OCR-2B-Instruct, RolmOCR, Aya-Vision prithivMLmods/Multimodal-OCR
> SmolDocling, Nanonets-OCR-s, MonkeyOCR, Typhoon-OCR-7B prithivMLmods/Multimodal-OCR2
> docscopeOCR, MonkeyOCR, coreOCR prithivMLmods/core-OCR

1 reply

rwightman

updated a model 9 days ago

timm/naflexvit_base_patch16_parfac_gap.e300_s576_in1k

Image Classification • Updated 9 days ago • 41 • 2

AI & ML interests

Recent Activity

Team members 5

timm's activity

Add project page link

Update train_hparams.yaml