Lee Junbum PRO

beomi

AI & ML interests

AI/ML GDE. Advancing Low-Resource Language Open Access LLM

Recent Activity

Organizations

Social Post Explorers's profile picture CarbonLover's profile picture

beomi's activity

reacted to ArthurZ's post with 🀝🀯❀️πŸ”₯ about 2 months ago
posted an update 3 months ago
view post
Post
5021
# PyTorch == 2.5.0 Breaks Transformers' SDPAttention!

When you encounter "RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph."

We can use workaround like this:

torch.backends.cuda.enable_cudnn_sdp(False)


but this slow downs the performance gain from PyTorch 2.5.

Although it is fixed(not "fixed" but default option is turn-off the cuDNN SDPA) at here -- https://github.com/pytorch/pytorch/pull/138587 , but not released yet. (you need to install directly from source)

Fastest way for now : pip install "torch<2.5"

Ref: https://github.com/huggingface/diffusers/issues/9704#issuecomment-2422585273
reacted to danielhanchen's post with πŸ€—β€οΈπŸš€ 6 months ago
view post
Post
3704
Yay we got 500K+ monthly HF downloads on our Unsloth HF repo! :) Super appreciate everyone in the OSS community - and thanks for using Unsloth!!
Β·
reacted to Tonic's post with ❀️ 7 months ago
view post
Post
2491
appreciation post for @osanseviero + huggingface staff ( @reach-vb , @merve , many others many many others) , that fight hard for many weeks / months to fix the releases in many organisations to make it easier for us to test out so many things ... πŸ€—πŸ€—πŸ€— thanks for that folks !
  • 1 reply
Β·
reacted to clem's post with β€οΈπŸš€ 7 months ago
reacted to maywell's post with πŸ‘πŸš€ 9 months ago
view post
Post
8934
πŸ”₯ Transfer model's Chat feature, Context length and Knowledge to another under 1 minute without any train.

Imagine being able to create chat models, expand context, and transfer domain-specific knowledge to models, all within a matter of minutes. Our innovative approach, based on a combination of diff-based techniques and sigmoid ratio calculations, makes this possible.

By considering the diffs between the desired information model (long context or chat) and the base model, as well as the diffs between the base model and the target model, we can efficiently transfer features and expand context without the need for extensive training or resources.

Our method minimizes model degradation and ensures that only the desired information is captured, resulting in high-quality models that can be created with just a single click. Whether you need a chat model, expanded context, or domain-specific knowledge transfer, our approach offers a rapid and effective solution.

In blog post below, we will dive into the details of our method, provide code examples, and showcase the impressive results achieved using our approach. Get ready to revolutionize your model creation process and unlock new possibilities with this powerful technique.

Blog - https://huggingface.co/blog/maywell/llm-feature-transfer
  • 2 replies
Β·
reacted to vikhyatk's post with πŸš€πŸ”₯ 9 months ago
view post
Post
3071
Updated the vikhyatk/lnqa dataset to include images, so you no longer need to separately download them from OpenImages!
posted an update 9 months ago
view post
Post
13733
#TPU #PyTorch #Jax

When You're trying to use PyTorch or Jax on TPU,

for v2/v3/v4:
use tpu-ubuntu2204-base

for v5p:
use v2-alpha-tpuv5

for v5e:
use v2-alpha-tpuv5-lite

You must use these base images for the system to 'boot'.

Previously used tpu-vm-v4-pt-1.13 images might seem to start the VM, but SSH connections do not work.

I thought it was a firewall issue and spent a lot of time on it before realizing it was a problem with the boot image πŸ₯²

https://cloud.google.com/tpu/docs/runtimes#pytorch_and_jax
reacted to tomaarsen's post with πŸ”₯ 9 months ago
view post
Post
3176
πŸš€ Sentence Transformers v2.7.0 is out! Featuring a new loss function, easier Matryoshka model inference & evaluation, CrossEncoder improvements & Intel Gaudi2 Accelerator support. Details:

1️⃣ A new loss function: CachedGISTEmbedLoss
This loss function is a combination of CachedMultipleNegativesRankingLoss and the GISTEmbedLoss, both of which are already excellent. The caching mechanism allows for much higher batch sizes with constant memory usage, which boosts training performance. The GIST part introduces a guide model to guide the in-batch negative sample selection. This prevents false negatives, resulting in a stronger training signal.

2️⃣ Automatic Matryoshka model truncation
Matryoshka models produce embeddings that are still useful after truncation. However, this truncation always had to be done manually, until now! We've added a truncate_dim option to the Sentence Transformer constructor. This also allows truncation when using HuggingFaceEmbeddings from LlamaIndex or LangChain.

3️⃣ Additionally, you can now specify truncate_dim in evaluators to get the performance after truncation. (Hint: it's surprisingly good, even for models not trained with MatryoshkaLoss, and it can speed up e.g. clustering, retrieval, etc.)

4️⃣ CrossEncoder improvements
The CrossEncoder now supports 'push_to_hub' to upload trained reranker models to Hugging Face. Additionally, CrossEncoders now support trust_remote_code to load models with custom modelling code.

5️⃣ Inference on Intel Gaudi2
If you have an Intel Gaudi2 Accelerator, Sentence Transformers now uses it automatically for even faster inference. No changes are necessary to your code, the device is automatically detected!

Check out the release notes for all of the details: https://github.com/UKPLab/sentence-transformers/releases/tag/v2.7.0

I'm very excited for the upcoming releases: I'm making great progress with a notable v3 refactor that should heavily improve the training process for embedding models!
  • 2 replies
Β·
replied to their post 9 months ago
view reply

I'm testing on it, with 32K(claimed at paper) and 1M seq len.
IMG_6332.jpeg
I'm training those models with minipile dataset and for now, it seems minimal continual training let model to adapt 'memory' could be sufficient.(less than <1B tokens)

Train is not finished yet, but after the loss converges then I could test haystack test or inference tests. it won't be take long :)

posted an update 9 months ago
view post
Post
12266
πŸš€ **InfiniTransformer, Gemma/Llama3 based Implementation!** 🌌

> Update @ 2024.04.19: It now supports Llama-3!

> Note: this implementation is unofficial

This implementation is designed to handle virtually infinite context lengths.

Here's the github repo: https://github.com/Beomi/InfiniTransformer

πŸ“„ **Read the original Paper:** https://arxiv.org/abs/2404.07143

## **Focus on Infini-Attention**

- **2 Types of Implementation available:** Attention-layer only implementation / Model & Train-wise implementation
- **Fixed(segment dependent) Memory Usage:** Enables training on larger models and longer sequences without the memory overhead typical of standard Transformer implementations.
- **Infinite Context Capability:** Train with unprecedented sequence lengthsβ€”imagine handling up to 1 million sequence lengths on standard hardware!
- You could train Gemma-2B with 1M sequence length with 2K segmentation size with single H100 GPU.

## **Try InfiniTransformer**

1. **Clone the repository:**
bash git clone https://github.com/Beomi/InfiniTransformer
2. **Install necessary tools:**
bash pip install -r requirements.txt pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers
3. **Dive Deep into Custom Training:**
- Train with extensive sequence lengths using scripts such as ./train.gemma.infini.noclm.1Mseq.sh.

for more detailed info, please visit Repo: https://github.com/Beomi/InfiniTransformer

Look forward to see your feedbacks! 😊

ps. Training loss plot is here πŸ˜‰
  • 2 replies
Β·
reacted to clefourrier's post with ❀️ 11 months ago