Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
130.8
TFLOPS
34
95
227
dame rajee
damerajee
Follow
NikolayKozloff's profile picture
lunarflu's profile picture
Theartplug's profile picture
15 followers
·
31 following
damerajee44
dame-cell
AI & ML interests
None yet
Recent Activity
reacted
to
Kseniase
's
post
with ❤️
1 day ago
8 types of RoPE As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on. Here are 8 types of RoPE that can be implemented in different cases: 1. Original RoPE -> https://huggingface.co/papers/2104.09864 Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info. 2. LongRoPE -> https://huggingface.co/papers/2402.13753 Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search. 3. LongRoPE2 -> https://huggingface.co/papers/2502.20082 Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity. 4. Multimodal RoPE (MRoPE) -> https://huggingface.co/papers/2502.13923 Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos. 5. Directional RoPE (DRoPE) -> https://huggingface.co/papers/2503.15029 Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage. 6. VideoRoPE -> https://huggingface.co/papers/2502.05173 Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing. 7. VRoPE -> https://huggingface.co/papers/2502.11664 An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus. 8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrix, improving stability on long sequences.
reacted
to
Kseniase
's
post
with 👀
1 day ago
8 types of RoPE As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on. Here are 8 types of RoPE that can be implemented in different cases: 1. Original RoPE -> https://huggingface.co/papers/2104.09864 Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info. 2. LongRoPE -> https://huggingface.co/papers/2402.13753 Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search. 3. LongRoPE2 -> https://huggingface.co/papers/2502.20082 Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity. 4. Multimodal RoPE (MRoPE) -> https://huggingface.co/papers/2502.13923 Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos. 5. Directional RoPE (DRoPE) -> https://huggingface.co/papers/2503.15029 Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage. 6. VideoRoPE -> https://huggingface.co/papers/2502.05173 Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing. 7. VRoPE -> https://huggingface.co/papers/2502.11664 An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus. 8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrix, improving stability on long sequences.
reacted
to
Kseniase
's
post
with 👀
1 day ago
8 types of RoPE As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on. Here are 8 types of RoPE that can be implemented in different cases: 1. Original RoPE -> https://huggingface.co/papers/2104.09864 Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info. 2. LongRoPE -> https://huggingface.co/papers/2402.13753 Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search. 3. LongRoPE2 -> https://huggingface.co/papers/2502.20082 Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity. 4. Multimodal RoPE (MRoPE) -> https://huggingface.co/papers/2502.13923 Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos. 5. Directional RoPE (DRoPE) -> https://huggingface.co/papers/2503.15029 Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage. 6. VideoRoPE -> https://huggingface.co/papers/2502.05173 Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing. 7. VRoPE -> https://huggingface.co/papers/2502.11664 An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus. 8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrix, improving stability on long sequences.
View all activity
Organizations
damerajee
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
BhashaAI/README
11 days ago
can i make a non-indic but powerful AI space?
3
#1 opened about 2 months ago by
Reality123b
commented
2 papers
2 months ago
$\text{Transformer}^2$: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
Jan 9
•
54
•
7
$\text{Transformer}^2$: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
Jan 9
•
54
•
7
New activity in
cointegrated/SONAR_200_text_encoder
3 months ago
can you please do the same for decoder
1
#2 opened 3 months ago by
damerajee
New activity in
argilla/FinePersonas-v0.1
6 months ago
How to make those beautiful banners on the readme?
1
#3 opened 6 months ago by
damerajee
New activity in
sarvamai/sarvam-0.5
6 months ago
Can we perform CPT?
1
#6 opened 6 months ago by
architsaxena
New activity in
Kukedlc/Phi-3-Vision-Win-snap
10 months ago
Fine-tuning notebook or script?
#1 opened 10 months ago by
damerajee
New activity in
theblackcat102/llava-instruct-mix
10 months ago
How was this datasets made ?
#1 opened 11 months ago by
damerajee
New activity in
microsoft/Phi-3-vision-128k-instruct
10 months ago
Fine-tuning Phi-3-vision on custom dataset fails
12
#21 opened 10 months ago by
samyak24jain
New activity in
lamm-mit/Cephalo-Phi-3-vision-128k-4b-alpha
10 months ago
Could you provide finetuning script?
#1 opened 10 months ago by
damerajee
New activity in
google/paligemma-3b-mix-224
10 months ago
Does the processor contain apply_chat_template??
1
#4 opened 10 months ago by
damerajee
New activity in
RekaAI/VibeEval
11 months ago
How are these dataset made ?
1
#3 opened 11 months ago by
damerajee
New activity in
HuggingFaceH4/llava-instruct-mix-vsft
11 months ago
How are the images saved in this dataset?
13
#1 opened 11 months ago by
Shure-Dev
Update README.md
1
#2 opened 11 months ago by
damerajee
New activity in
damerajee/pretrained_large.v2
11 months ago
Librarian Bot: Add language metadata for dataset
#2 opened 11 months ago by
librarian-bot
New activity in
toshi456/llava-jp-1.3b-v1.1-pretrain
11 months ago
How long did it take you to pre-train and how much compute did it cost ?
2
#1 opened 11 months ago by
damerajee
New activity in
vikhyatk/lnqa
11 months ago
dataset format a bit weird
1
#4 opened 11 months ago by
damerajee
New activity in
Lin-Chen/ShareGPT4V
11 months ago
How to prepare the images?
2
#4 opened over 1 year ago by
tiancaiye
New activity in
damerajee/pretrained_large
11 months ago
Librarian Bot: Add language metadata for dataset
#1 opened 11 months ago by
librarian-bot
New activity in
visheratin/MC-LLaVA-3b
11 months ago
Pre training code
#9 opened 11 months ago by
damerajee
Load more