andito (Andres Marafioti)

published an article 3 months ago

Article

Streaming datasets: 100x More Efficient

+3

Oct 27, 2025

•

79

published an article 3 months ago

Article

Supercharge your OCR Pipelines with Open Models

+5

Oct 21, 2025

•

295

published an article 6 months ago

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

+2

Jul 23, 2025

•

47

published an article 6 months ago

Article

Efficient MultiModal Data Pipeline

+3

Jul 8, 2025

•

69

published an article 7 months ago

Article

KV Cache from scratch in nanoVLM

+3

Jun 4, 2025

•

109

published an article 7 months ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

+7

Jun 3, 2025

•

307

published an article 8 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

+5

May 21, 2025

•

247

published an article 8 months ago

Article

Vision Language Models (Better, faster, stronger)

+3

May 12, 2025

•

584

published an article 11 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

+5

Feb 20, 2025

•

321

published an article 11 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

+5

Feb 20, 2025

•

321

published an article 12 months ago

Article

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

+1

Jan 23, 2025

•

189

published an article about 1 year ago

Article

SmolVLM - small yet mighty Vision Language Model

+3

Nov 26, 2024

•

401

published an article about 1 year ago

Article

Deploying Speech-to-Speech on Hugging Face

+2

Oct 22, 2024

•

45

published an article over 1 year ago

Article

FineVideo: behind the scenes

+4

Sep 23, 2024

•

35

published an article over 1 year ago

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25, 2024

•

17

published an article over 1 year ago

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25, 2024

•

17

published an article over 1 year ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18, 2024

•

77

published an article over 1 year ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

+1

Jun 24, 2024

•

205

Andres Marafioti

AI & ML interests

Organizations

Streaming datasets: 100x More Efficient

Supercharge your OCR Pipelines with Open Models

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Efficient MultiModal Data Pipeline

KV Cache from scratch in nanoVLM

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Vision Language Models (Better, faster, stronger)

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

SmolVLM - small yet mighty Vision Language Model

Deploying Speech-to-Speech on Hugging Face

FineVideo: behind the scenes

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Docmatix - a huge dataset for Document Visual Question Answering

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Andres Marafioti

AI & ML interests

Organizations

andito's activity

Streaming datasets: 100x More Efficient

Supercharge your OCR Pipelines with Open Models

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Efficient MultiModal Data Pipeline

KV Cache from scratch in nanoVLM

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Vision Language Models (Better, faster, stronger)

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

SmolVLM - small yet mighty Vision Language Model

Deploying Speech-to-Speech on Hugging Face

FineVideo: behind the scenes

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Docmatix - a huge dataset for Document Visual Question Answering

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models