Aritra Roy Gosthipaty's picture

Aritra Roy Gosthipaty PRO

ariG23498

·

https://arig23498.github.io/

AI & ML interests

Deep Representation Learning

Recent Activity

updated a dataset about 20 hours ago

model-metadata/trending_models

commented on their article 1 day ago

KV Cache from scratch in nanoVLM

posted an update 1 day ago

🚨 Implement KV Cache from scratch in pure PyTorch. 🚨 We have documented all of our learning while implementing KV Cache to nanoVLM. Joint work with @kashif @lusxvr @andito @pcuenq Blog: hf.co/blog/kv-cache

View all activity

Organizations

ariG23498's activity

upvoted an article 1 day ago

Article

KV Cache from scratch in nanoVLM

By

and 4 others •

2 days ago

• 50

upvoted a paper 2 days ago

FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation

Paper • 2506.01144 • Published 4 days ago • 14

upvoted an article 2 days ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

By

and 8 others •

3 days ago

• 87

upvoted a paper 3 days ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published 3 days ago • 68

upvoted 2 changelogs 10 days ago

Changelog

Xet is now the default storage option for new users and organizations

14 days ago

• 57

Changelog

Static Spaces can now have a build step

14 days ago

• 89

upvoted an article 11 days ago

Article

🐯 Liger GRPO meets TRL

By

and 5 others •

12 days ago

• 36

upvoted an article 15 days ago

Article

The Transformers Library: standardizing model definitions

By

and 3 others •

22 days ago

• 110

upvoted 2 articles 16 days ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

By

and 6 others •

16 days ago

• 136

Article

Microsoft and Hugging Face expand collaboration

By

and 2 others •

18 days ago

• 20

upvoted a collection 22 days ago

MobileCLIP Models + DataCompDR Data

MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4, 2024 • 29

upvoted an article 22 days ago

Article

Improving Hugging Face Model Access for Kaggle Users

By

and 4 others •

23 days ago

• 27

upvoted an article 24 days ago

Article

Vision Language Models (Better, Faster, Stronger)

By

and 4 others •

25 days ago

• 414

upvoted a paper 29 days ago

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 53

upvoted an article 30 days ago

Article

A Dive into Pretraining Strategies for Vision-Language Models

By

and 1 other •

Feb 3, 2023

• 66

upvoted a paper 30 days ago

Kimi-VL Technical Report

Paper • 2504.07491 • Published Apr 10 • 125

upvoted an article 30 days ago

Article

Vision Language Models Explained

By

and 1 other •

Apr 11, 2024

• 370

upvoted a paper about 1 month ago

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Paper • 2501.17162 • Published Jan 28 • 1

upvoted a collection about 1 month ago

D-FINE

State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated May 5 • 55

upvoted an article about 1 month ago

Article

Welcoming Llama Guard 4 on Hugging Face Hub

By

and 3 others •

Apr 29

• 37