Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
226.1
TFLOPS
11
30
Weronika Stryj
privategeek24
Follow
pcuenq's profile picture
Mi6paulino's profile picture
nautilie7's profile picture
5 followers
·
136 following
weronika-stryj-a863721b2
AI & ML interests
Text classification, Sentiment Analysis, Finance Risk analysis, Text generation, Question answearing, Optimization LLM, Cyber risk analysis, Speech to text
Recent Activity
liked
a Space
30 days ago
sentence-transformers/quantized-retrieval
reacted
to
singhsidhukuldeep
's
post
with 👀
3 months ago
While Google's Transformer might have introduced "Attention is all you need," Microsoft and Tsinghua University are here with the DIFF Transformer, stating, "Sparse-Attention is all you need." The DIFF Transformer outperforms traditional Transformers in scaling properties, requiring only about 65% of the model size or training tokens to achieve comparable performance. The secret sauce? A differential attention mechanism that amplifies focus on relevant context while canceling out noise, leading to sparser and more effective attention patterns. How? - It uses two separate softmax attention maps and subtracts them. - It employs a learnable scalar λ for balancing the attention maps. - It implements GroupNorm for each attention head independently. - It is compatible with FlashAttention for efficient computation. What do you get? - Superior long-context modeling (up to 64K tokens). - Enhanced key information retrieval. - Reduced hallucination in question-answering and summarization tasks. - More robust in-context learning, less affected by prompt order. - Mitigation of activation outliers, opening doors for efficient quantization. Extensive experiments show DIFF Transformer's advantages across various tasks and model sizes, from 830M to 13.1B parameters. This innovative architecture could be a game-changer for the next generation of LLMs. What are your thoughts on DIFF Transformer's potential impact?
reacted
to
davidberenstein1957
's
post
with 👍
3 months ago
Don't use an LLM when you can use a much cheaper model. The problem is that no one tells you how to actually do it. Just picking a pre-trained model (e.g., BERT) and throwing it at your problem won't work! If you want a small model to perform well on your problem, you need to fine-tune it. And to fine-tune it, you need data. The good news is that you don't need a lot of data but instead high-quality data for your specific problem. In the latest livestream, I showed you guys how to get started with Argilla on the Hub! Hope to see you at the next one. https://www.youtube.com/watch?v=BEe7shiG3rY
View all activity
Organizations
None yet
privategeek24
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a Space
30 days ago
Build error
87
🔍
Quantized Retrieval
Efficient quantized retrieval over Wikipedia
liked
4 models
3 months ago
danitamayo/bert-cybersecurity-NER
Token Classification
•
Updated
Apr 20, 2023
•
40
•
2
improz4/LogBert-tokenizer
Updated
Sep 18, 2023
•
1
cservan/malbert-base-cased-128k
Fill-Mask
•
Updated
May 16, 2024
•
617
•
1
alphacep/vosk-model-small-ru
Automatic Speech Recognition
•
Updated
Aug 8, 2023
•
8
liked
4 models
4 months ago
hantian/yolo-doclaynet
Updated
Oct 7, 2024
•
25
google/gemma-2-2b-it
Text Generation
•
Updated
Aug 27, 2024
•
348k
•
•
872
jinhybr/OCR-Donut-CORD
Image-to-Text
•
Updated
Nov 5, 2022
•
333
•
196
distilbert/distilbert-base-uncased
Fill-Mask
•
Updated
May 6, 2024
•
11.4M
•
600
liked
2 models
5 months ago
tiiuae/falcon-mamba-7b
Text Generation
•
Updated
Dec 17, 2024
•
13.5k
•
227
Rachu/tabnetmodel
Tabular Classification
•
Updated
Feb 19, 2024
•
3
liked
a Space
5 months ago
Running
236
✨
Code generation with 🤗
liked
2 models
5 months ago
nvidia/Nemotron-4-340B-Instruct
Updated
Jun 24, 2024
•
73
•
667
microsoft/Phi-3.5-MoE-instruct
Text Generation
•
Updated
Oct 24, 2024
•
56.1k
•
548
liked
a Space
5 months ago
Running
on
CPU Upgrade
115
🥇
LevelBot
liked
5 models
5 months ago
sdadas/polish-reranker-large-ranknet
Text Classification
•
Updated
Apr 23, 2024
•
457
•
2
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Sentence Similarity
•
Updated
7 days ago
•
164k
•
154
Qwen/Qwen2-1.5B-Instruct
Text Generation
•
Updated
Jun 6, 2024
•
157k
•
134
sdadas/mmlw-retrieval-e5-large
Sentence Similarity
•
Updated
Oct 27, 2024
•
22
•
3
Vanessasml/cyber-risk-llama-3-8b
Text Generation
•
Updated
May 7, 2024
•
63
•
12
Load more