Vaibhav Srivastav's picture

Vaibhav Srivastav PRO

reach-vb

AI & ML interests

TTS + LM performance prediction

Recent Activity

liked a dataset about 2 hours ago
amphion/Emilia-Dataset
reacted to lbourdois's post with 🔥 about 4 hours ago
We introduce FAT5 (Flash Attention T5) ⚡ An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations. The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE. The result kernel is 2 times faster than a SPDA implementation. We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer. The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining. All other optimizations are described in a 📝 subsequent blog post available on @huggingface 🤗: https://huggingface.co/spaces/CATIE-AQ/FAT5-report. This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ €1,900) and a low carbon footprint (13.5kg eq CO2). The model's weights are also available on Hugging Face: https://huggingface.co/CATIE-AQ/FAT5-small. Not very useful in practice, it's a PoC and not an instructed model (it's planned for later). All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific domain: https://github.com/catie-aq/flashT5 ⭐ Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.
reacted to lbourdois's post with ❤️ about 4 hours ago
We introduce FAT5 (Flash Attention T5) ⚡ An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations. The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE. The result kernel is 2 times faster than a SPDA implementation. We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer. The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining. All other optimizations are described in a 📝 subsequent blog post available on @huggingface 🤗: https://huggingface.co/spaces/CATIE-AQ/FAT5-report. This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ €1,900) and a low carbon footprint (13.5kg eq CO2). The model's weights are also available on Hugging Face: https://huggingface.co/CATIE-AQ/FAT5-small. Not very useful in practice, it's a PoC and not an instructed model (it's planned for later). All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific domain: https://github.com/catie-aq/flashT5 ⭐ Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.
View all activity

Organizations

Hugging Face's profile picture Notebooks-explorers's profile picture Whisper fine-tuning sprint's profile picture Hugging Face Course's profile picture Whisper Fine-Tuning Event's profile picture Kensho's profile picture Mozilla Foundation's profile picture PolinaOrg's profile picture Coqui.ai's profile picture Internal Data & Models for Speech Recognition Event's profile picture Speech Recognition Community Event Version 2's profile picture onnx's profile picture Hugging Test Lab's profile picture Internal Data's profile picture The Team Ten's profile picture Huggingface Projects's profile picture EuroPython 2022's profile picture Whisper Distillation's profile picture BigCode's profile picture Hugging Face OSS Metrics's profile picture Harmonai's Dance Diffusion Community's profile picture EuroSciPy 2022's profile picture LaLoka Labs's profile picture Core ML Projects's profile picture meta-private's profile picture Blog-explorers's profile picture Music Gen Sprint's profile picture Hugging Face for Audio's profile picture Hugging Face Smol Models Research's profile picture Open ASR Leaderboard's profile picture test's profile picture MusicGen Internal's profile picture TTS Eval (OLD)'s profile picture ZeroGPU Explorers's profile picture Editing Audio's profile picture ggml.ai's profile picture LocalLLaMA's profile picture gg-hf's profile picture Python Italia's profile picture Unofficial Mistral Community's profile picture Journalists on Hugging Face's profile picture Llzama's profile picture finding-nemo's profile picture diarizers-community's profile picture MLX Community's profile picture Cartesia's profile picture Hugging Face Assignments's profile picture IBM Granite's profile picture On-device Squad's profile picture TTS AGI's profile picture Social Post Explorers's profile picture Apple CoreNet Models 's profile picture LM Studio Community's profile picture gg-gguf's profile picture hsramall's profile picture Lina Speech's profile picture Dev Mode Explorers's profile picture Sweet Dream(Booth)s's profile picture private beta for deeplinks's profile picture Paris AI Running Club's profile picture gg-tt's profile picture Kyutai's profile picture OuteAI's profile picture Hugging Face Discord Community's profile picture LLHF's profile picture SLLHF's profile picture Ratchet Community's profile picture Hugging Quants's profile picture lbhf's profile picture CoreML Scratchpad's profile picture blhf's profile picture Meta Llama's profile picture kmhf's profile picture nltpt's profile picture nltpt-q's profile picture ai4b-hf's profile picture Ollama Tools's profile picture Spirit LM's profile picture qrias's profile picture Audio Collabs's profile picture Consumer AI Edge Hackathon (Meta, Hugging Face, Pytorch, Scaleway & Unaite)'s profile picture open/ acc's profile picture ExecuTorch Community's profile picture wut?'s profile picture DDUF's profile picture AI Starter Pack's profile picture None yet's profile picture Open R1's profile picture LiteRT Community (FKA TFLite)'s profile picture MultiLlasa's profile picture gg-hf-g's profile picture mshf's profile picture fluxions-hf's profile picture yoso's profile picture hf-private-mlx's profile picture Bitsandbytes Community's profile picture

Posts 12

view post
Post
5799
VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: https://huggingface.co/jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/

What a time to be alive! 🔥

Articles 17

Article
23

Open R1: How to use OlympicCoder locally for coding?