g-ronimo (geronimo)

upvoted an article 2 days ago

Article

KV Cache from scratch in nanoVLM

By

and 4 others •

3 days ago

• 55

upvoted an article 12 days ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

By

and 6 others •

17 days ago

• 140

upvoted a paper 29 days ago

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Paper • 2505.04601 • Published about 1 month ago • 26

upvoted a collection about 2 months ago

Vision

Collection

130 items • Updated 3 days ago • 1

upvoted an article about 2 months ago

Article

Remote VAEs for decoding with HF endpoints 🤗

By

and 1 other •

Feb 24

• 39

upvoted a paper about 2 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 188

upvoted an article 3 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

By

and 6 others •

Feb 20

• 262

upvoted a paper 4 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 232

upvoted a paper 5 months ago

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

Paper • 2501.03916 • Published Jan 7 • 16

upvoted an article 5 months ago

Article

Fine-tune ModernBERT for text classification using synthetic data

By

•

Dec 30, 2024

• 37

upvoted 2 papers 6 months ago

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Paper • 2412.16112 • Published Dec 20, 2024 • 23

VisualLens: Personalization through Visual History

Paper • 2411.16034 • Published Nov 25, 2024 • 18

upvoted 3 papers 7 months ago

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages

Paper • 2411.14343 • Published Nov 21, 2024 • 7

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Paper • 2411.07232 • Published Nov 11, 2024 • 67

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Paper • 2411.07975 • Published Nov 12, 2024 • 31

upvoted 2 articles 9 months ago

Article

"Diffusers Image Fill" guide

By

•

Sep 13, 2024

• 53

Article

Extending Transformer layers as Painters to DiT's

By

•

Aug 31, 2024

• 11

upvoted a paper about 1 year ago

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15, 2024 • 89

upvoted 2 articles about 1 year ago

Article

Train custom AI models with the trainer API and adapt them to 🤗

By

•

Jun 29, 2024

• 33

Article

SeeMoE: Implementing a MoE Vision Language Model from Scratch

By

•

Jun 23, 2024

• 34

geronimo PRO

AI & ML interests

Organizations

g-ronimo's activity

KV Cache from scratch in nanoVLM

nanoVLM: The simplest repository to train your VLM in pure PyTorch

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Vision

Remote VAEs for decoding with HF endpoints 🤗

SmolVLM: Redefining small and efficient multimodal models

SmolVLM2: Bringing Video Understanding to Every Device

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

Fine-tune ModernBERT for text classification using synthetic data

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

VisualLens: Personalization through Visual History

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

"Diffusers Image Fill" guide

Extending Transformer layers as Painters to DiT's

LoRA Learns Less and Forgets Less

Train custom AI models with the trainer API and adapt them to 🤗

SeeMoE: Implementing a MoE Vision Language Model from Scratch

geronimo PRO

AI & ML interests

Organizations

g-ronimo's activity

KV Cache from scratch in nanoVLM

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Remote VAEs for decoding with HF endpoints 🤗

SmolVLM2: Bringing Video Understanding to Every Device

Fine-tune ModernBERT for text classification using synthetic data

"Diffusers Image Fill" guide

Extending *Transformer layers as Painters* to DiT's

Train custom AI models with the trainer API and adapt them to 🤗

SeeMoE: Implementing a MoE Vision Language Model from Scratch

Extending Transformer layers as Painters to DiT's