Gyanateet Dutta's picture

Gyanateet Dutta

Ryukijano

·

https://ryukijano.github.io

AI & ML interests

Computer Graphics, General Artificial Intelligence,model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.

Recent Activity

liked a model about 14 hours ago

facebook/vjepa2-vitl-fpc64-256

upvoted a collection 10 days ago

updated a collection 14 days ago

Vision_transformer_robotics

View all activity

Organizations

upvoted a collection 10 days ago

V-JEPA 2

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated 14 days ago • 128

upvoted a paper about 1 month ago

Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation

Paper • 2505.13215 • Published May 19 • 28

upvoted a paper about 2 months ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29 • 94

upvoted a paper 2 months ago

WORLDMEM: Long-term Consistent World Simulation with Memory

Paper • 2504.12369 • Published Apr 16 • 34

upvoted 2 papers 3 months ago

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8 • 76

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7 • 25

upvoted a collection 3 months ago

TxGemma Release

Collection of open models to accelerate the development of therapeutics. • 5 items • Updated 29 days ago • 59

upvoted 2 papers 3 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179

Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

Paper • 2503.21694 • Published Mar 27 • 16

upvoted a collection 3 months ago

💫StarVector Models

StarVector is a multimodal LLM for Scalable Vector Graphics (SVG) generation, producing structured SVG code directly from images and text. • 2 items • Updated Mar 20 • 96

upvoted an article 4 months ago

Article

Open-source DeepResearch – Freeing our search agents

By

and 4 others •

Feb 4

• 1.26k

upvoted a collection 5 months ago

Eagle 2

Eagle 2 is a family of frontier vision-language models with vision-centric design. The model supports 4K HD input, long-context video, and grounding. • 10 items • Updated 1 day ago • 36

upvoted a collection 7 months ago

VILA: On Pre-training for Visual Language Models

10 items • Updated Apr 17 • 54

upvoted an article 7 months ago

Article

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

By

•

Nov 19, 2024

• 12

upvoted 2 papers 7 months ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 126

Grounding Image Matching in 3D with MASt3R

Paper • 2406.09756 • Published Jun 14, 2024 • 1

upvoted an article 7 months ago

Article

How to run Gemini Nano locally in your browser

By

•

Jul 11, 2024

• 46

upvoted 3 collections 8 months ago

Sparsh

Models and datasets for Sparsh: Self-supervised touch representations for vision-based tactile sensing • 15 items • Updated Oct 24, 2024 • 13

MobileLLM

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 40 items • Updated 4 days ago • 118

Stable Diffusion 3.5

6 items • Updated Jan 9 • 165