AI & ML interests

None defined yet.

Recent Activity

blog-explorers's activity

MrDragonFox 
posted an update 1 day ago
view post
Post
1813
yet a other audio datasets pre classified for events + audio aestetics

this time for german - 680h sampled from emilia yodas

timestamps for asr training or other fancier things available as nc in the raw repo

MrDragonFox/DE_Emilia_Yodas_680h

cc by 4.0 as by emilia yodas

raw events / transcriptions are cc by NC 4.0

MrDragonFox/DE_Emilia_Yodas_680h_raw_timestamps

the coming days i should push about 600h english + some japanese too same format
monsoon-nlp 
posted an update 15 days ago
MrDragonFox 
posted an update 19 days ago
view post
Post
1903
did a small emotive classified test dataset for all the tts tuners out there

MrDragonFox/Elise

3h total mit - single speaker voice

dataset is a copy of an existing one just added the emotional tags over 1200 samples - should be good enough to test if emotional tags stick in your finetune
  • 1 reply
·
chansung 
posted an update 20 days ago
view post
Post
3424
simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!
·
louisbrulenaudet 
posted an update 23 days ago
view post
Post
900
I’ve just released logfire-callback on PyPI, designed to facilitate monitoring of Hugging Face Transformer training loops using Pydantic Logfire 🤗

The callback will automatically log training start with configuration parameters, periodic metrics and training completion ⏱️

Install the package using pip:
pip install logfire-callback

First, ensure you have a Logfire API token and set it as an environment variable:
export LOGFIRE_TOKEN=your_logfire_token

Then use the callback in your training code:
from transformers import Trainer, TrainingArguments
from logfire_callback import LogfireCallback

# Initialize your model, dataset, etc.

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    # ... other training arguments
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    callbacks=[LogfireCallback()]  # Add the Logfire callback here
)

trainer.train()

If you have any feedback, please reach out at @louisbrulenaudet
chansung 
posted an update 26 days ago
view post
Post
2562
Mistral AI Small 3.1 24B is not only commercial free but also the best model in a single GPU deployment.

I packed up all the information you need to know in a single picture. Hope this helps! :)
  • 1 reply
·
WaveCut 
posted an update 27 days ago
abhishek 
posted an update 29 days ago
view post
Post
3296
🚀 I'm thrilled to announce the launch of Arcee Conductor, a game-changing platform that's about to revolutionize the way you interact with AI models! 🤖 As the pioneers of small language models (SLMs), we've been working tirelessly to bring you the most exciting innovation in the AI space.
Here's a quick TL;DR of what Arcee Conductor is all about:

🌟 Choice and flexibility: Get access to multiple models, including our powerful SLMs and third-party LLMs, to choose the best one for your specific use case
🤖 Intelligent routing: Our platform evaluates which model is best-suited for each of your queries, ensuring you get the most accurate results
📈 Cost savings: Reduce your AI costs with our affordable SLMs, while still having access to leading LLMs when needed
🚀 Easy to get started: Sign up now and try Arcee Conductor today, with 400 million tokens (a $200 value) on us! 🎁
📊 Proven track record: Our SLMs have already racked up 222K+ downloads on Hugging Face, with customers seeing significant cost savings and improved accuracy

For a limited time, you can get $200 credits to use with Conductor for FREE. Check it out here: https://conductor.arcee.ai
  • 3 replies
·
chansung 
posted an update about 1 month ago
view post
Post
1569
Gemma 3 Release in a nutshell
(seems like function calling is not supported whereas the announcement said so)
monsoon-nlp 
posted an update about 1 month ago
view post
Post
3209
Genetic counselors help patients get 🧬 tests and understand their results. They need to study inheritance of several conditions, statistics, and patient care 🤓⚕️. I compiled 225 multiple-choice questions for the ABGC exam into a dataset: monsoon-nlp/genetic-counselor-multiple-choice
Llama 3.1 8B Instruct gets a 51% score.
I'm also creating a dataset of real-world open-ended questions (starting with Reddit) and am open to contributors
Tonic 
posted an update about 1 month ago
view post
Post
1300
🙋🏻‍♂️Hey there folks,

Did you know that you can use ModernBERT to detect model hallucinations ?

Check out the Demo : Tonic/hallucination-test

See here for Medical Context Demo : MultiTransformer/tonic-discharge-guard

check out the model from KRLabs : KRLabsOrg/lettucedect-large-modernbert-en-v1

and the library they kindly open sourced for it : https://github.com/KRLabsOrg/LettuceDetect

👆🏻if you like this topic please contribute code upstream 🚀

  • 2 replies
·