Korakoe (Buttercream)

replied to their post 4 months ago

We're working on something, more details soon-ish 🤫

reacted to hexgrad's post with 👍 about 1 year ago

Post

1486

@Respair just dropped Tsukasa: frontier TTS in Japanese Respair/Tsukasa_Speech
It's expressive, punches way above its weight class, and supports voice cloning. Go check it out! 🚀
(Unmute the audio sample below after hitting play)

reacted to charlesdedampierre's post with 🔥 over 1 year ago

Post

4205

Please check the Open Source AI Network: we mapped the top 500 HF users
based on their followers' profiles.

The map can be found here: bunkalab/mapping_the_OS_community

1 reply

·

reacted to cdminix's post with 🚀👍 over 1 year ago

Post

2272

Since new TTS (Text-to-Speech) systems are coming out what feels like every day, and it's currently hard to compare them, my latest project has focused on doing just that.

I was inspired by the TTS-AGI/TTS-Arena (definitely check it out if you haven't), which compares recent TTS system using crowdsourced A/B testing.

I wanted to see if we can also do a similar evaluation with objective metrics and it's now available here:
ttsds/benchmark
Anyone can submit a new TTS model, and I hope this can provide a way to get some information on which areas models perform well or poorly in.

The paper with all the details is available here: https://arxiv.org/abs/2407.12707

reacted to anakin87's post with ❤️ over 1 year ago

Post

1055

How to alter the behavior of a Language Model without fine-tuning or prompting? Say hello to 🎤 yo-Llama 🦙!

Model anakin87/yo-Llama-3-8B-Instruct

This experiment steers Llama-3-8B-Instruct to respond in a rap style.
How? Amplifying the rap direction in the activation space. 😎

𝐖𝐡𝐚𝐭 𝐬𝐩𝐚𝐫𝐤𝐞𝐝 𝐭𝐡𝐢𝐬 𝐢𝐝𝐞𝐚?

Lately, I got interested in mechanistic interpretability of LLMs.

💡 A recent paper, "Refusal in Language Models Is Mediated by a Single Direction," showed how to find the refusal direction in the activation space of Chat Language Models and either erase or amplify it.
A clever jailbreak method for open weights models.

Then, @failspy took it a step further by modifying the models to amplify different traits, such as making a model seem grumpy or irritable.

𝐇𝐨𝐰 𝐝𝐢𝐝 𝐈 𝐜𝐫𝐞𝐚𝐭𝐞 𝐲𝐨-𝐋𝐥𝐚𝐦𝐚?
(📓 notebook in the HF repository, heavily inspired by Failspy's work)

1️⃣ Load the Llama-3-8B-Instruct model.
2️⃣ Load 1024 examples from Alpaca (instruction dataset).
3️⃣ Prepare a system prompt to make the original model act like a rapper.
4️⃣ Run inference on the examples, with and without the system prompt, and cache the activations.
5️⃣ Compute the rap feature directions (one for each layer) from the activations.
6️⃣ Apply the feature directions one by one, checking the results on some examples.
7️⃣ Pick the best-performing feature direction.
8️⃣ Apply this feature direction and voilà!
yo-Llama-3-8B-Instruct is born! 🥳🎶

This was a fun experiment.

📚 Resources

Refusal in Language Models Is Mediated by a Single Direction - https://arxiv.org/abs/2406.11717

Uncensor any LLM with abliteration: great practical blog post by @mlabonne https://huggingface.co/blog/mlabonne/abliteration

Practical materials by @failspy
- abliterator library https://github.com/FailSpy/abliterator
- Llama-MopeyMule-3-8B-Instruct model (+ notebook) failspy/Llama-3-8B-Instruct-MopeyMule

replied to their post over 1 year ago

Hey! Glad you appreciate the model! The final release space has been up for a while (and has some small tricks to mitigate trailing artefacts), here's some audio to compare!

and here's the like to the release version of the model's space: https://huggingface.co/spaces/ShoukanLabs/Vokan

posted an update over 1 year ago

Post

3101

I've published several older versions of Vokan! Sometimes, they may sound more natural, but less like the target speaker.

Please check em out!
Korakoe/Vokan-V0.5
ShoukanLabs/Vokan

7 replies

·

reacted to Jaward's post with 🔥 over 1 year ago

Post

2338

All You Need To Know About Apple Intelligence Architecture And Models!!

One key challenge with running llms on device is a balance between compute, performance and model size. Apple Intelligence solves this using small/specialized chunks (Adapters) of the on-device foundation model when needed.

For compute, they engineered a new framework that uses LoRA adapters of rank 16, allowing a merged 2-bit and 4-bit config that yields up to 3.5 bits per weight, achieving the same performance as the uncompressed models.

With the help of an OSS model latency and power analysis tool (Talaria), they were able to optimize the bit rate selection for each operation. This along with activation & embedding quantizations plus efficient key-value caching, achieved up to 30 tokens/sec on iPhone 15 pro.

When the model is prompted (e.g to rewrite an email in the mail app), the app draws from the app intents toolbox which sends the prompt to the adapter specialized for writing, the model responds through the same pipeline with a real-time update of the text to rewrite.

The coolest feature of these models is their ability to adapt and dynamically specialize on user’s everyday activities. For this they adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.

For tasks that require more capable models, the arch utilizes server/larger models on a private cloud compute infrastructure that delivers SOTA secured and verifiable privacy experience.

More on the private cloud compute: https://developer.apple.com/videos/play/wwdc2024/102/

reacted to mrm8488's post with 🚀 over 1 year ago

Post

8809

Working on a concept GPT-2 (small) that uses KANs instead of MLPs.
The ckpt and training code will be soon on the hub.

6 replies

·

reacted to mrfakename's post with 👍❤️ almost 2 years ago

Post

Today, I’m thrilled to release a project I’ve been working on for the past couple weeks in collaboration with Hugging Face: the TTS Arena.

The TTS Arena, inspired by LMSys's Chatbot Arena, allows you to enter text which will be synthesized by two SOTA models. You can then vote on which model generated a better sample. The results will be published on a publicly-accessible leaderboard.

We’ve added several open access models, including Pheme, MetaVoice, XTTS, OpenVoice, & WhisperSpeech. It also includes the proprietary ElevenLabs model.

If you have any questions, suggestions, or feedback, please don’t hesitate to DM me on X (https://twitter.com/realmrfakename) or open a discussion in the Space. More details coming soon!

Try it out: TTS-AGI/TTS-Arena

5 replies

·

Buttercream

AI & ML interests

Recent Activity

Organizations

Buttercream

AI & ML interests

Recent Activity

Organizations

Korakoe's activity