We're working on something, more details soon-ish π€«
Buttercream
Korakoe
AI & ML interests
GAN, Conversational transformers, Diffusers
Recent Activity
liked
a Space
about 1 month ago
jordand/echo-tts-preview
liked
a model
about 1 month ago
mrfakename/granite-tts-1b
liked
a dataset
about 1 month ago
facebook/omnilingual-asr-corpus
Organizations
replied to
their
post
4 months ago
reacted to
hexgrad's
post with π
about 1 year ago
Post
1486
@Respair
just dropped Tsukasa: frontier TTS in Japanese
Respair/Tsukasa_Speech
It's expressive, punches way above its weight class, and supports voice cloning. Go check it out! π
(Unmute the audio sample below after hitting play)
It's expressive, punches way above its weight class, and supports voice cloning. Go check it out! π
(Unmute the audio sample below after hitting play)
reacted to
charlesdedampierre's
post with π₯
over 1 year ago
Post
4205
Please check the Open Source AI Network: we mapped the top 500 HF users
based on their followers' profiles.
The map can be found here: bunkalab/mapping_the_OS_community
based on their followers' profiles.
The map can be found here: bunkalab/mapping_the_OS_community
reacted to
cdminix's
post with ππ
over 1 year ago
Post
2272
Since new TTS (Text-to-Speech) systems are coming out what feels like every day, and it's currently hard to compare them, my latest project has focused on doing just that.
I was inspired by the TTS-AGI/TTS-Arena (definitely check it out if you haven't), which compares recent TTS system using crowdsourced A/B testing.
I wanted to see if we can also do a similar evaluation with objective metrics and it's now available here:
ttsds/benchmark
Anyone can submit a new TTS model, and I hope this can provide a way to get some information on which areas models perform well or poorly in.
The paper with all the details is available here: https://arxiv.org/abs/2407.12707
I was inspired by the TTS-AGI/TTS-Arena (definitely check it out if you haven't), which compares recent TTS system using crowdsourced A/B testing.
I wanted to see if we can also do a similar evaluation with objective metrics and it's now available here:
ttsds/benchmark
Anyone can submit a new TTS model, and I hope this can provide a way to get some information on which areas models perform well or poorly in.
The paper with all the details is available here: https://arxiv.org/abs/2407.12707
reacted to
anakin87's
post with β€οΈ
over 1 year ago
Post
1055
How to alter the behavior of a Language Model without fine-tuning or prompting? Say hello to π€ yo-Llama π¦!
Model anakin87/yo-Llama-3-8B-Instruct
This experiment steers Llama-3-8B-Instruct to respond in a rap style.
How? Amplifying the rap direction in the activation space. π
ππ‘ππ π¬π©ππ«π€ππ ππ‘π’π¬ π’πππ?
Lately, I got interested in mechanistic interpretability of LLMs.
π‘ A recent paper, "Refusal in Language Models Is Mediated by a Single Direction," showed how to find the refusal direction in the activation space of Chat Language Models and either erase or amplify it.
A clever jailbreak method for open weights models.
Then, @failspy took it a step further by modifying the models to amplify different traits, such as making a model seem grumpy or irritable.
ππ¨π° ππ’π π ππ«ππππ π²π¨-ππ₯ππ¦π?
(π notebook in the HF repository, heavily inspired by Failspy's work)
1οΈβ£ Load the Llama-3-8B-Instruct model.
2οΈβ£ Load 1024 examples from Alpaca (instruction dataset).
3οΈβ£ Prepare a system prompt to make the original model act like a rapper.
4οΈβ£ Run inference on the examples, with and without the system prompt, and cache the activations.
5οΈβ£ Compute the rap feature directions (one for each layer) from the activations.
6οΈβ£ Apply the feature directions one by one, checking the results on some examples.
7οΈβ£ Pick the best-performing feature direction.
8οΈβ£ Apply this feature direction and voilΓ !
yo-Llama-3-8B-Instruct is born! π₯³πΆ
This was a fun experiment.
π Resources
Refusal in Language Models Is Mediated by a Single Direction - https://arxiv.org/abs/2406.11717
Uncensor any LLM with abliteration: great practical blog post by @mlabonne https://huggingface.co/blog/mlabonne/abliteration
Practical materials by @failspy
- abliterator library https://github.com/FailSpy/abliterator
- Llama-MopeyMule-3-8B-Instruct model (+ notebook) failspy/Llama-3-8B-Instruct-MopeyMule
Model anakin87/yo-Llama-3-8B-Instruct
This experiment steers Llama-3-8B-Instruct to respond in a rap style.
How? Amplifying the rap direction in the activation space. π
ππ‘ππ π¬π©ππ«π€ππ ππ‘π’π¬ π’πππ?
Lately, I got interested in mechanistic interpretability of LLMs.
π‘ A recent paper, "Refusal in Language Models Is Mediated by a Single Direction," showed how to find the refusal direction in the activation space of Chat Language Models and either erase or amplify it.
A clever jailbreak method for open weights models.
Then, @failspy took it a step further by modifying the models to amplify different traits, such as making a model seem grumpy or irritable.
ππ¨π° ππ’π π ππ«ππππ π²π¨-ππ₯ππ¦π?
(π notebook in the HF repository, heavily inspired by Failspy's work)
1οΈβ£ Load the Llama-3-8B-Instruct model.
2οΈβ£ Load 1024 examples from Alpaca (instruction dataset).
3οΈβ£ Prepare a system prompt to make the original model act like a rapper.
4οΈβ£ Run inference on the examples, with and without the system prompt, and cache the activations.
5οΈβ£ Compute the rap feature directions (one for each layer) from the activations.
6οΈβ£ Apply the feature directions one by one, checking the results on some examples.
7οΈβ£ Pick the best-performing feature direction.
8οΈβ£ Apply this feature direction and voilΓ !
yo-Llama-3-8B-Instruct is born! π₯³πΆ
This was a fun experiment.
π Resources
Refusal in Language Models Is Mediated by a Single Direction - https://arxiv.org/abs/2406.11717
Uncensor any LLM with abliteration: great practical blog post by @mlabonne https://huggingface.co/blog/mlabonne/abliteration
Practical materials by @failspy
- abliterator library https://github.com/FailSpy/abliterator
- Llama-MopeyMule-3-8B-Instruct model (+ notebook) failspy/Llama-3-8B-Instruct-MopeyMule
replied to
their
post
over 1 year ago
Hey! Glad you appreciate the model! The final release space has been up for a while (and has some small tricks to mitigate trailing artefacts), here's some audio to compare!
and here's the like to the release version of the model's space: https://huggingface.co/spaces/ShoukanLabs/Vokan
posted
an
update
over 1 year ago
Post
3101
I've published several older versions of Vokan! Sometimes, they may sound more natural, but less like the target speaker.
Please check em out!
Korakoe/Vokan-V0.5
ShoukanLabs/Vokan
Please check em out!
Korakoe/Vokan-V0.5
ShoukanLabs/Vokan
reacted to
Jaward's
post with π₯
over 1 year ago
Post
2338
All You Need To Know About Apple Intelligence Architecture And Models!!
One key challenge with running llms on device is a balance between compute, performance and model size. Apple Intelligence solves this using small/specialized chunks (Adapters) of the on-device foundation model when needed.
For compute, they engineered a new framework that uses LoRA adapters of rank 16, allowing a merged 2-bit and 4-bit config that yields up to 3.5 bits per weight, achieving the same performance as the uncompressed models.
With the help of an OSS model latency and power analysis tool (Talaria), they were able to optimize the bit rate selection for each operation. This along with activation & embedding quantizations plus efficient key-value caching, achieved up to 30 tokens/sec on iPhone 15 pro.
When the model is prompted (e.g to rewrite an email in the mail app), the app draws from the app intents toolbox which sends the prompt to the adapter specialized for writing, the model responds through the same pipeline with a real-time update of the text to rewrite.
The coolest feature of these models is their ability to adapt and dynamically specialize on userβs everyday activities. For this they adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.
For tasks that require more capable models, the arch utilizes server/larger models on a private cloud compute infrastructure that delivers SOTA secured and verifiable privacy experience.
More on the private cloud compute: https://developer.apple.com/videos/play/wwdc2024/102/
One key challenge with running llms on device is a balance between compute, performance and model size. Apple Intelligence solves this using small/specialized chunks (Adapters) of the on-device foundation model when needed.
For compute, they engineered a new framework that uses LoRA adapters of rank 16, allowing a merged 2-bit and 4-bit config that yields up to 3.5 bits per weight, achieving the same performance as the uncompressed models.
With the help of an OSS model latency and power analysis tool (Talaria), they were able to optimize the bit rate selection for each operation. This along with activation & embedding quantizations plus efficient key-value caching, achieved up to 30 tokens/sec on iPhone 15 pro.
When the model is prompted (e.g to rewrite an email in the mail app), the app draws from the app intents toolbox which sends the prompt to the adapter specialized for writing, the model responds through the same pipeline with a real-time update of the text to rewrite.
The coolest feature of these models is their ability to adapt and dynamically specialize on userβs everyday activities. For this they adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.
For tasks that require more capable models, the arch utilizes server/larger models on a private cloud compute infrastructure that delivers SOTA secured and verifiable privacy experience.
More on the private cloud compute: https://developer.apple.com/videos/play/wwdc2024/102/
reacted to
mrfakename's
post with πβ€οΈ
almost 2 years ago
Post
Today, Iβm thrilled to release a project Iβve been working on for the past couple weeks in collaboration with Hugging Face: the TTS Arena.
The TTS Arena, inspired by LMSys's Chatbot Arena, allows you to enter text which will be synthesized by two SOTA models. You can then vote on which model generated a better sample. The results will be published on a publicly-accessible leaderboard.
Weβve added several open access models, including Pheme, MetaVoice, XTTS, OpenVoice, & WhisperSpeech. It also includes the proprietary ElevenLabs model.
If you have any questions, suggestions, or feedback, please donβt hesitate to DM me on X (https://twitter.com/realmrfakename) or open a discussion in the Space. More details coming soon!
Try it out: TTS-AGI/TTS-Arena
The TTS Arena, inspired by LMSys's Chatbot Arena, allows you to enter text which will be synthesized by two SOTA models. You can then vote on which model generated a better sample. The results will be published on a publicly-accessible leaderboard.
Weβve added several open access models, including Pheme, MetaVoice, XTTS, OpenVoice, & WhisperSpeech. It also includes the proprietary ElevenLabs model.
If you have any questions, suggestions, or feedback, please donβt hesitate to DM me on X (https://twitter.com/realmrfakename) or open a discussion in the Space. More details coming soon!
Try it out: TTS-AGI/TTS-Arena