Mistral AI Game Jam

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Mistral-AI-Game-Jam's activity

Tonicย 
posted an update 1 day ago
view post
Post
232
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ hey there folks ,

So every bio/med/chem meeting i go to i always the same questions "why are you sharing a gdrive link with me for this?" and "Do you have any plans to publish your model weights and datasets on huggingface?" and finally i got a good answer today which explains everything :

basically there is some kind of government censorship on this (usa, but i'm sure others too) and they are told they are not allowed as it is considered a "dataleak" which is illegal !!!!

this is terrible ! but the good news is that we can do something about it !

so there is this "call for opinions and comments" here from the NIH (usa) , and here we can make our opinion on this topic known : https://osp.od.nih.gov/comment-form-responsibly-developing-and-sharing-generative-artificial-intelligence-tools-using-nih-controlled-access-data/

kindly consider dropping your opinion and thoughts about this censorship of science , and share this post , link or thoughts widely .

Together maybe we can start to share data and model weights appropriately and openly in a good way ๐Ÿ™๐Ÿป๐Ÿš€

cc. @cyrilzakka

Tonicย 
posted an update 11 days ago
view post
Post
2437
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ Hey there folks ,

Yesterday the world's first "Learn to Vibe Code" application was released .

As vibe coding is the mainstream paradigm , so now the first educational app is there to support it .

You can try it out already :

https://vibe.takara.ai

and of course it's entirely open source, so i already made my issue and feature branch :-) ๐Ÿš€
Jofthomasย 
posted an update 16 days ago
view post
Post
2730
Meet our new agentic model : ๐——๐—ฒ๐˜ƒ๐˜€๐˜๐—ฟ๐—ฎ๐—น

Devstral is an open-source LLM built software engineering tasks built under a collaboration between Mistral AI and All Hands AI ๐Ÿ™Œ.

๐—ž๐—ฒ๐˜† ๐—ณ๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ๐˜€ :
โ€ข ๐Ÿค– ๐—”๐—ด๐—ฒ๐—ป๐˜๐˜€ : perfect for Agentic coding
โ€ข ๐Ÿƒ ๐—น๐—ถ๐—ด๐—ต๐˜๐˜„๐—ฒ๐—ถ๐—ด๐—ต๐˜: Devstral is a ๐Ÿฎ๐Ÿฐ๐—• parameter based on Mistral small.
โ€ข ยฉ๏ธ ๐—”๐—ฝ๐—ฎ๐—ฐ๐—ต๐—ฒ ๐Ÿฎ.๐Ÿฌ, meaning fully open-source !
โ€ข ๐Ÿ“„ A ๐Ÿญ๐Ÿฎ๐Ÿด๐—ธ context window.

๐Ÿ“šBlog : https://mistral.ai/news/devstral
โšกAPI : The model is also available on our API under the name ๐—ฑ๐—ฒ๐˜ƒ๐˜€๐˜๐—ฟ๐—ฎ๐—น-๐˜€๐—บ๐—ฎ๐—น๐—น-๐Ÿฎ๐Ÿฑ๐Ÿฌ๐Ÿฑ
๐Ÿค— repo : mistralai/Devstral-Small-2505

Can't wait to see what you will build with it !
  • 1 reply
ยท
MikeDoesย 
posted an update about 1 month ago
view post
Post
1525
PII-Masking-1M Final Day (7/7)! ๐Ÿš€ Today, we unveil 5 NEW Enterprise PII (E-PII) Dataset PREVIEWS!

Standard PII tools often miss sensitive *business* data. That's why we built E-PII previews for the data that powers your operations and compliance needs.

Get a first look (representing 100,000 samples each!) into datasets designed for real-world enterprise security across these categories:

๐Ÿฅ **PHI Preview**: For Healthcare Data
๐Ÿ’ณ **PFI Preview:** For Financial Data
๐Ÿข **PWI Preview:** For Workplace Data
๐Ÿ’ป **PDI Preview:** For Digital Activity Data
๐Ÿ“ **PLI Preview:** For Location Data


That wraps up our #PIIMasking1M 7 days announcement! HUGE thanks for following along and for your engagement.
Explore ALL our releases, including these E-PII previews, in the Ai4Privacy Hugging Face Collection & show some love โค๏ธ if you find them useful!
๐Ÿ”— Visit the Collection:https://huggingface.co/ai4privacy

Let's keep building safer AI, together!
MrDragonFoxย 
posted an update about 1 month ago
view post
Post
2835
as a few of you know - i am working on a rather more elaborate-tts that can produce more interesting sounds in context of rp

early sneak peak is here -

MrDragonFox/mOrpheus_3B-1Base_early_preview-v1-25000

its based on orpheus - but really the model is irrelevant as i focus mostly on data augmentation / prep / pipelineing - its just the way to show progress

should be able to express fine even in a sfw context

probably the last release for a few weeks as i go back to the data pipeline and improve there ..

in the mean time, please do test and report problems or enjoyable generations you found - we have a growing discord community and i love to see what you get out of that early release !

(small colab is provided on the model page if you dont have the gpu to run that your self)
MrDragonFoxย 
posted an update about 2 months ago
view post
Post
3941
yet a other audio datasets pre classified for events + audio aestetics

this time for german - 680h sampled from emilia yodas

timestamps for asr training or other fancier things available as nc in the raw repo

MrDragonFox/DE_Emilia_Yodas_680h

cc by 4.0 as by emilia yodas

raw events / transcriptions are cc by NC 4.0

MrDragonFox/DE_Emilia_Yodas_680h_raw_timestamps

the coming days i should push about 600h english + some japanese too same format
MikeDoesย 
posted an update 2 months ago
MrDragonFoxย 
posted an update 2 months ago
view post
Post
2118
did a small emotive classified test dataset for all the tts tuners out there

MrDragonFox/Elise

3h total mit - single speaker voice

dataset is a copy of an existing one just added the emotional tags over 1200 samples - should be good enough to test if emotional tags stick in your finetune
  • 1 reply
ยท
MikeDoesย 
posted an update 2 months ago
view post
Post
2788
๐Ÿš€ We are quite excited to announce the Ai4Privacy Python library! ๐ŸŽ‰

pip install ai4privacy to anonymize short english text with OpenPII Masking 500k labels

๐Ÿ“Š Day 5/7 of PII Masking 1M announcements complete! โฐ
MikeDoesย 
posted an update 2 months ago
MikeDoesย 
posted an update 3 months ago
view post
Post
1728
๐Ÿ“Š 99%+ PII Masking Precision in English Straight to Your Browser! ๐Ÿš€

ai4privacy/general-english-anonymiser-openpii-500k

Hard Facts:
๐Ÿ–ฅ๏ธ Runs in-browserโ€”blazing fast, no server latency
๐Ÿ‘ Open-source, MIT-licensed (even for commercial use)
๐Ÿ“ˆ Full metrics on Hugging Face dataset and model pages

Day 3 out 7 of PII-Masking-1M Announcements Complete!
*Accuracies reported from the new OpenPII-500k dataset

#DataPrivacy #AI #OpenSource
MikeDoesย 
posted an update 3 months ago
view post
Post
2104
#PII Masking Tech that does not **** around!

We are happy to release the OpenPII English Anonymiser โ€”the most powerful open-source tool for redacting sensitive info from English text.

Fine-tuned Modernbert on 5.7 million+ PII examples, itโ€™s clocking 99%+ accuracy across emails, dates, social numbers, and more!

Why itโ€™s a big deal:
โœ… Top-tier precision: 100% for passport numbers, 99.96% for emails*.
โœ… Totally free: MIT license for personal or commercial use.
โœ… No secrets: Full metrics shared on Hugging Face.

#AI #OpenSource #DataSecurity @huggingface

Day 2 out 7 of PII-Masking-1M Announcements Complete!

*Accuracies reported from the new OpenPII-500k dataset

ai4privacy/llama-ai4privacy-english-anonymiser-openpii
MikeDoesย 
posted an update 3 months ago
view post
Post
2710
๐Ÿš€ Ai4Privacy Team is excited to unveil PII-Masking-1M, our most significant release yet! ๐ŸŽ‰

This publication series ๐Ÿ“ฆ includes datasets ๐Ÿ“Š, models ๐Ÿค–, and applications โš™๏ธ to advance PII masking with AI systems ๐Ÿ›ก๏ธ

Starting on Monday with daily posts at 7 PM CET โฐ
Tonicย 
posted an update 3 months ago
view post
Post
1585
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธHey there folks,

Did you know that you can use ModernBERT to detect model hallucinations ?

Check out the Demo : Tonic/hallucination-test

See here for Medical Context Demo : MultiTransformer/tonic-discharge-guard

check out the model from KRLabs : KRLabsOrg/lettucedect-large-modernbert-en-v1

and the library they kindly open sourced for it : https://github.com/KRLabsOrg/LettuceDetect

๐Ÿ‘†๐Ÿปif you like this topic please contribute code upstream ๐Ÿš€

  • 2 replies
ยท
Tonicย 
posted an update 3 months ago
view post
Post
844
Powered by KRLabsOrg/lettucedect-large-modernbert-en-v1 from KRLabsOrg.

Detect hallucinations in answers based on context and questions using ModernBERT with 8192-token context support!

### Model Details
- **Model Name**: [lettucedect-large-modernbert-en-v1]( KRLabsOrg/lettucedect-large-modernbert-en-v1)
- **Organization**: [KRLabsOrg]( KRLabsOrg )
- **Github**: [https://github.com/KRLabsOrg/LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect)
- **Architecture**: ModernBERT (Large) with extended context support up to 8192 tokens
- **Task**: Token Classification / Hallucination Detection
- **Training Dataset**: [RagTruth]( wandb/RAGTruth-processed)
- **Language**: English
- **Capabilities**: Detects hallucinated spans in answers, provides confidence scores, and calculates average confidence across detected spans.

LettuceDetect excels at processing long documents to determine if an answer aligns with the provided context, making it a powerful tool for ensuring factual accuracy.
ngxsonย 
posted an update 3 months ago
view post
Post
4107
A comprehensive matrix for which format should you use.

Read more on my blog post: https://huggingface.co/blog/ngxson/common-ai-model-formats

| Hardware        | GGUF      | PyTorch                | Safetensors              | ONNX  |
|-----------------|-----------|------------------------|--------------------------|-------|
| CPU             | โœ… (best) | ๐ŸŸก                      | ๐ŸŸก                       | โœ…    |
| GPU             | โœ…        | โœ…                      | โœ…                       | โœ…    |
| Mobile          | โœ…        | ๐ŸŸก (via executorch)     | โŒ                       | โœ…    |
| Apple silicon   | โœ…        | ๐ŸŸก                      | โœ… (via MLX framework)   | โœ…    |
  • 1 reply
ยท