Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

reach-vb 
posted an update 2 days ago
view post
Post
1812
Less than two days ago Kyutai Labs open sourced Moshi - an ~7.6B on-device Speech to Speech foundation model and Mimi - SoTA streaming speech codec! 🔥

The release includes:

1. Moshiko & Moshika - Moshi finetuned on synthetic data (CC-BY license) ( kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd)
2. Mimi - Streaiming Audio Codec, processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps (CC-BY license) ( kyutai/mimi)
3. Model checkpoints & Inference codebase written in Rust (Candle), PyTorch & MLX (Apache license) (https://github.com/kyutai-labs/moshi)

How does Moshi work?

1. Moshi processes two audio streams: one for itself and one for the user, with the user's stream coming from audio input and Moshi's stream generated by the model.

2. Along with these audio streams, Moshi predicts text tokens for its speech, enhancing its generation quality.

3. The model uses a small Depth Transformer for codebook dependencies and a large 7B parameter Temporal Transformer for temporal dependencies.

4. The theoretical latency is 160ms, with a practical latency of around 200ms on an L4 GPU.

Model size & inference:

Moshiko/ka are 7.69B param models

bf16 ~16GB VRAM
8-bit ~8GB VRAM
4-bit ~4GB VRAM

You can run inference via Candle 🦀, PyTorch and MLX - based on your hardware.

The Kyutai team, @adefossez @lmz and team are cracked AF, they're bringing some serious firepower to the open source/ science AI scene, looking forward to what's next! 🐐
  • 1 reply
·
inflatebot 
posted an update 1 day ago
view post
Post
936
!!SEE UPDATE BELOW!!
I don't know who still needs to hear this, but if you're using Mistral Nemo-based models, you might have been using the wrong completions format. This is a signal boost from MarinaraSpaghetti's model card for NemoMix-Unleashed: MarinaraSpaghetti/NemoMix-Unleashed-12B
A lot of people have been working with a version of Nemo that's been reconfigured for ChatML, and while that works great, simply using the right format might be just as effective at correcting weirdness people in the AIRP scene sometimes have with Nemo.

Huge ups to Marinara for pointing this out, and to the MistralAI team member who let her know.

Update: A PR has been merged to SillyTavern Staging with new corrected templates! If you don't want to switch or wait, I put them up on GitHub: https://github.com/inflatebot/SillyTavern-Mistral-Templates

PRs for KoboldCPP's chat adapters and KoboldAI Lite *have been merged* and are coming in their respective releases (probably the next time KoboldCPP updates -- it didn't make it for 1.75.1, but you could just grab 'em from the repo!)
  • 1 reply
·
KingNish 
posted an update 1 day ago
kz919 
posted an update 2 days ago
view post
Post
708
Just for the meme.

But the clear lesson I learnt from building these demos are, the more powerful the underlying base model is, the closer you will get to GPT4o1. CoT is nothing more than simply inducing the latent reasoning capability from the model.

kz919/GPT4-O1-Proximas
tomaarsen 
posted an update 2 days ago
view post
Post
1023
I've just shipped the Sentence Transformers v3.1.1 patch release, fixing the hard negatives mining utility for some models. This utility is extremely useful to get more performance out of your embedding training data.

⛏ Hard negatives are texts that are rather similar to some anchor text (e.g. a query), but are not the correct match. They're difficult for a model to distinguish from the correct answer, often resulting in a stronger model after training.
mine_hard_negatives docs: https://sbert.net/docs/package_reference/util.html#sentence_transformers.util.mine_hard_negatives

🔓 Beyond that, this release removes the numpy<2 restriction from v3.1.0. This was previously required for Windows as not all third-party libraries were updated to support numpy v2. With Sentence Transformers, you can now choose v1 or v2 of numpy.

Check out the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.1.1

I'm looking forward to releasing v3.2, I have some exciting things planned 🚀
MonsterMMORPG 
posted an update about 19 hours ago
view post
Post
513
I have done an extensive multi-GPU FLUX Full Fine Tuning / DreamBooth training experimentation on RunPod by using 2x A100–80 GB GPUs (PCIe) since this was commonly asked of me.

Full article here : https://medium.com/@furkangozukara/multi-gpu-flux-fu

Image 1
Image 1 shows that only first part of installation of Kohya GUI took 30 minutes on a such powerful machine on a very expensive Secure Cloud pod — 3.28 USD per hour
There was also part 2, so just installation took super time
On Massed Compute, it would take like 2–3 minutes
This is why I suggest you to use Massed Compute over RunPod, RunPod machines have terrible hard disk speeds and they are like lottery to get good ones



Image 2, 3 and 4
Image 2 shows speed of our very best config FLUX Fine Tuning training shared below when doing 2x Multi GPU training
https://www.patreon.com/posts/kohya-flux-fine-112099700
Used config name is : Quality_1_27500MB_6_26_Second_IT.json
Image 3 shows VRAM usage of this config when doing 2x Multi GPU training
Image 4 shows the GPUs of the Pod


Image 5 and 6
Image 5 shows speed of our very best config FLUX Fine Tuning training shared below when doing a single GPU training
https://www.patreon.com/posts/kohya-flux-fine-112099700
Used config name is : Quality_1_27500MB_6_26_Second_IT.json
Image 6 shows this setup used VRAM amount


Image 7 and 8
Image 7 shows speed of our very best config FLUX Fine Tuning training shared below when doing a single GPU training and Gradient Checkpointing is disabled
https://www.patreon.com/posts/kohya-flux-fine-112099700
Used config name is : Quality_1_27500MB_6_26_Second_IT.json
Image 8 shows this setup used VRAM amount


....
loztcontrol 
posted an update 1 day ago
view post
Post
624
I am developing a personal project to further support and help people living with Depression and Anxiety. As I suffer mainly from chronic depression I would like to create a tool based on AI that can monitor my moods but first I will collect information about myself, my moods and after collecting at least 6 months of my moods and my writings I will be able to formulate as a kind of recognition when my emotions are “out of control” I mean those states or feelings of emptiness. I think that sometimes not all of us have access to treatments and therapies so I would like to develop in a free way this project that I have just started today. I have already started the code to register events of my moods. I will share with you the updates :D


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import nltk
from nltk.corpus import stopwords
import string
import matplotlib.pyplot as plt
from datetime import datetime

nltk.download('stopwords')

data = {
    'text': [
        "Hoy me siento bien, aunque un poco cansado", 
        "Me siento triste y solo", 
        "Esto es frustrante, todo sale mal", 
        "Estoy nervioso por lo que va a pasar",
        "No puedo con este estrés", 
        "Todo está saliendo bien, me siento optimista", 
        "Siento miedo de lo que pueda suceder", 
        "Hoy fue un día horrible"
    ],
    'emotion': [
        'felicidad', 
        'tristeza', 
        'enojo', 
        'ansiedad', 
        'ansiedad', 
        'felicidad', 
        'miedo', 
        'tristeza'
    ]
}

df = pd.DataFrame(data)

# Función para limpiar el texto
def clean_text(text):

Yes, I speak Spanish :P too
  • 3 replies
·
dylanebert 
posted an update 1 day ago
yeonseok-zeticai 
posted an update 2 days ago
view post
Post
1664
🚀 Revolutionary Method to Convert YOLOv8 to On-Device AI with mobile NPU utilizations

Attention AI developers and engineers!
Discover how ZETIC.MLange can effortlessly transform YOLOv8 models into on-device AI with mobile NPU utilizations. 🎉

💡 We highlight the power of mobile NPU, showing how it outperforms CPU in processing speed. The results speak for themselves—NPU-driven execution is faster, smarter, and more efficient.
* Real-time demos are no easy feat but with ZETIC.MLange, we’ve made it possible!

🎥 Watch the video to see how we’re revolutionizing on-device AI.
: https://youtu.be/LkP3JDTcVN8?si=6Ha5vHA-G7jZE9oq

🌟 Our team has successfully implemented YOLOv8 as on-device AI using ZETIC.MLange. This innovative approach enables high-performance object detection across various manufacturers mobile devices.

🔍 Curious about the details? We've shared a comprehensive guide on our blog. Check it out through the link below!
📚 Blog link: https://zetic.ai/blog/implementing-yolov8-on-device-ai-with-zetic-mlange
tomaarsen 
posted an update 3 days ago
view post
Post
1599
🎉SetFit v1.1.0 is out! Training efficient classifiers on CPU or GPU now uses the Sentence Transformers Trainer, and we resolved a lot of issues caused by updates of third-party libraries (like Transformers). Details:

Training a SetFit classifier model consists of 2 phases:
1. Finetuning a Sentence Transformer embedding model
2. Training a Classifier to map embeddings -> classes

🔌The first phase now uses the SentenceTransformerTrainer that was introduced in the Sentence Transformers v3 update. This brings some immediate upsides like MultiGPU support, without any (intended) breaking changes.

➡️ Beyond that, we softly deprecated the "evaluation_strategy" argument in favor of "eval_strategy" (following a Transformers deprecation), and deprecated Python 3.7. In return, we add official support for Python 3.11 and 3.12.

✨ There's some more minor changes too, like max_steps and eval_max_steps now being a hard limit instead of an approximate one, training/validation losses now logging nicely in Notebooks, and the "device" parameter no longer being ignored in some situations.

Check out the full release notes here: https://github.com/huggingface/setfit/releases/tag/v1.1.0
Or read the documentation: https://huggingface.co/docs/setfit
Or check out the public SetFit models for inspiration: https://huggingface.co/models?library=setfit&sort=created

P.s. the model in the code snippet trained in 1 minute and it can classify ~6000 sentences per second on my GPU.