Xi's picture

Xi

xi0v

AI & ML interests

Reinforcement learning, Diffusion Model Merging, LLM Merging, Model Editing and Vision/Multimodal Model Fine-tuning.

Recent Activity

reacted to Kseniase's post with 👀 about 3 hours ago
8 types of RoPE As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on. Here are 8 types of RoPE that can be implemented in different cases: 1. Original RoPE -> https://huggingface.co/papers/2104.09864 Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info. 2. LongRoPE -> https://huggingface.co/papers/2402.13753 Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search. 3. LongRoPE2 -> https://huggingface.co/papers/2502.20082 Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity. 4. Multimodal RoPE (MRoPE) -> https://huggingface.co/papers/2502.13923 Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos. 5. Directional RoPE (DRoPE) -> https://huggingface.co/papers/2503.15029 Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage. 6. VideoRoPE -> https://huggingface.co/papers/2502.05173 Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing. 7. VRoPE -> https://huggingface.co/papers/2502.11664 An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus. 8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
liked a model about 3 hours ago
SUFE-AIFLM-Lab/Fin-R1
liked a model about 7 hours ago
OnomaAI/Log2char_Orion-14B
View all activity

Organizations

GEM benchmark's profile picture OpenGVLab's profile picture BigScience Biomedical Datasets's profile picture fast.ai community's profile picture LLMs's profile picture ONNXConfig for all's profile picture DeepGHS's profile picture Open-Source AI Meetup's profile picture Arabic Machine Learning 's profile picture Literally Me FRFR Research Society's profile picture The Waifu Research Department's profile picture Blog-explorers's profile picture OpenSky's profile picture Falcons.ai's profile picture CyberHarem's profile picture Tensor Diffusion's profile picture ICCV2023's profile picture ICML2023's profile picture AI Hobbyist's profile picture That Time I got Reincarnated as a Hugging Face Organization's profile picture ZeroGPU Explorers's profile picture Project Fluently's profile picture LocalLLaMA's profile picture MLX Community's profile picture INNOVA AI's profile picture Narra's profile picture AstraLLMs's profile picture 0ai's profile picture C4AI Community's profile picture Project Fluently LM's profile picture Stable Diffusion Community (Unofficial, Non-profit)'s profile picture Hugging Face for Legal's profile picture Raye's profile picture open/ acc's profile picture Data Is Better Together Contributor's profile picture void's profile picture