6 157 60

rotem israeli

irotem98

https://rotem154154.github.io

rotem154154

AI & ML interests

None yet

Recent Activity

liked a dataset 1 day ago

cloneofsimo/imagenet.int8

liked a model 3 days ago

timm/mvitv2_tiny.fb_in1k

upvoted a paper 3 days ago

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

View all activity

Organizations

None yet

irotem98's activity

upvoted 2 papers 3 days ago

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

Paper • 2503.18931 • Published 5 days ago • 29

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Paper • 2503.19325 • Published 4 days ago • 68

upvoted a paper 4 days ago

Training-free Diffusion Acceleration with Bottleneck Sampling

Paper • 2503.18940 • Published 5 days ago • 12

upvoted a collection 9 days ago

Orpheus TTS

Collection

TTS Towards Human-Sounding Speech • 2 items • Updated 10 days ago • 51

upvoted a paper 9 days ago

TULIP: Towards Unified Language-Image Pretraining

Paper • 2503.15485 • Published 10 days ago • 43

upvoted 2 papers 12 days ago

PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity

Paper • 2503.07677 • Published 19 days ago • 81

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published 15 days ago • 123

upvoted 2 papers 16 days ago

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

Paper • 2503.09151 • Published 17 days ago • 29

TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published 17 days ago • 42

upvoted a paper 17 days ago

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Paper • 2503.08638 • Published 18 days ago • 60

upvoted 2 papers 18 days ago

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

Paper • 2503.04812 • Published 25 days ago • 13

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Paper • 2503.07027 • Published 19 days ago • 26

upvoted a paper 19 days ago

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published 23 days ago • 86

upvoted a paper 29 days ago

UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published 30 days ago • 29

upvoted 3 papers about 1 month ago

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Paper • 2502.17157 • Published Feb 24 • 51

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 188

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 138

upvoted a paper about 2 months ago

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published Jan 28 • 28

upvoted a paper 2 months ago

Diffusion Adversarial Post-Training for One-Step Video Generation

Paper • 2501.08316 • Published Jan 14 • 33

upvoted a paper 3 months ago

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 91