Xi's picture

Xi

xi0v

·

AI & ML interests

Reinforcement learning, Diffusion Model Merging, LLM Merging, Model Editing and Vision/Multimodal Model Fine-tuning.

Recent Activity

updated a model about 5 hours ago

xi0v/Illu01-v-H100bs256rate2e_2debiasoffset-step56350

published a model about 5 hours ago

xi0v/Illu01-v-H100bs256rate2e_2debiasoffset-step56350

liked a model about 7 hours ago

TRI-ML/mistral-supra

View all activity

Organizations

upvoted an article about 19 hours ago

Article

Gemma 3n fully available in the open-source ecosystem!

By

and 7 others •

2 days ago

• 71

upvoted a paper 6 days ago

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

Paper • 2312.03732 • Published Nov 28, 2023 • 10

upvoted a paper 8 days ago

MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models

Paper • 2506.14435 • Published 11 days ago • 8

upvoted an article 11 days ago

Article

Tiny Agents in Python: a MCP-powered agent in ~70 lines of code

By

and 3 others •

May 23

• 137

upvoted a paper 17 days ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published 18 days ago • 234

upvoted a paper 23 days ago

Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers

Paper • 2506.03065 • Published 24 days ago • 27

upvoted 2 articles 24 days ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

By

and 5 others •

25 days ago

• 60

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

By

and 8 others •

25 days ago

• 167

upvoted 2 papers 26 days ago

One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23 • 59

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published about 1 month ago • 42

upvoted an article 28 days ago

Article

🌙 Introducing Moon: Storytelling Generator Model

By

and 1 other •

29 days ago

• 6

upvoted a paper 29 days ago

D-AR: Diffusion via Autoregressive Models

Paper • 2505.23660 • Published 29 days ago • 34

upvoted a paper 30 days ago

Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published about 1 month ago • 45

upvoted an article 30 days ago

Article

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

By

and 8 others •

Apr 29

• 33

upvoted 5 papers about 1 month ago

s3: You Don't Need That Much Data to Train a Search Agent via RL

Paper • 2505.14146 • Published May 20 • 17

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 87

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23 • 78

Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval

Paper • 2505.16967 • Published May 22 • 23

Neuro-Symbolic Query Compiler

Paper • 2505.11932 • Published May 17 • 16

upvoted an article about 1 month ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

By

and 6 others •

May 21

• 174