Matt Valoatto PRO

mvaloatto

AI & ML interests

Image classification, image feature extraction, text classification, design, art, tech, science. 🤗 since 2016.

Recent Activity

Organizations

AI FILMS's profile picture lora concepts library's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture huggingPartyParis's profile picture Spaces Playground's profile picture Social Post Explorers's profile picture Top Contributors: Space Likes's profile picture Top Contributors: Dataset Downloads's profile picture Top Contributors: Model Downloads's profile picture Top Contributors: Profile Followers's profile picture Tamis AI's profile picture Hugging Face Discord Community's profile picture

mvaloatto's activity

reacted to clem's post with ❤️ 26 days ago
reacted to victor's post with 🔥 9 months ago
view post
Post
4304
The hype is real: a mysterious gpt2-chatbot model has appeared on the LLM Arena Leaderboard 👀.
It seems to be at least on par with the top performing models (closed and open).

To try it out: https://chat.lmsys.org/ -> then click on the Direct Chat tab and select gpt2-chatbot.

Take your bet, what do you think it is?
·
reacted to clem's post with 🤗 10 months ago
view post
Post
2537
Introducing gretelai/synthetic_text_to_sql by https://huggingface.co/gretelai

It stands as the largest and most diverse synthetic Text-to-SQL dataset available to-date.

The dataset includes:

- 105,851 records partitioned into 100,000 train and 5,851 test records
~23M total tokens, including ~12M SQL tokens
- Coverage across 100 distinct domains/verticals
- Comprehensive array of SQL tasks: data definition, retrieval, manipulation, analytics & reporting
- Wide range of SQL complexity levels, including subqueries, single joins, multiple joins, aggregations, window functions, set operations
- Database context, including table and view create statements
- Natural language explanations of what the SQL query is doing
- Contextual tags to optimize model training

Blogpost: https://gretel.ai/blog/synthetic-text-to-sql-dataset
Dataset: gretelai/synthetic_text_to_sql
  • 1 reply
·
reacted to sayakpaul's post with 🔥 11 months ago
view post
Post
We released 🧨 Diffusers 0.27.0, and it's a versatile release 💫

Among other things, we shipped:

* Stable Cascade
* Playground v2.5 and EDM-style training
* EDM-formulated schedulers
* Trajectory Consistency Distillation for accelerated sampling
* A new guide on merging LoRAs
* A new image editing pipeline -- LEDITS++

Check out the release notes to catch everything that went into the release
https://github.com/huggingface/diffusers/releases/tag/v0.27.0

Thanks to everyone that contributed to the release 🤗
replied to their post 11 months ago
view reply

Yes, time will tell! Still, good news for the open AI ecosystem 👍

posted an update 11 months ago
reacted to osanseviero's post with 👍 11 months ago
view post
Post
Diaries of Open Source. Part 3! OS goes to the moon!

💻 OpenCodeInterpreter, a family of very powerful code generation models
Models: m-a-p/opencodeinterpreter-65d312f6f88da990a64da456
Paper: OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement (2402.14658)
Demo m-a-p/OpenCodeInterpreter_demo

🔷🔶Zephyr 7B Gemma, Gemma fine-tuned with the Zephyr recipe
Model: HuggingFaceH4/zephyr-7b-gemma-v0.1
Demo: HuggingFaceH4/zephyr-7b-gemma-chat
GH Repo: https://github.com/huggingface/alignment-handbook

🪆The MixedBread folks released a 2D Matryoshka text embedding model, which means you can dynamically change the embedding size and layer counts
Model: mixedbread-ai/mxbai-embed-2d-large-v1
Release blog post: https://www.mixedbread.ai/blog/mxbai-embed-2d-large-v1

🐋Microsoft released Orca Math, which includes 200K grade school math problems
Dataset: microsoft/orca-math-word-problems-200k

🥷IBM silently released Merlinite, a cool model trained on Mixtral-generated synthetic data using a novel LAB method ibm/merlinite-7b

🌚 Moondream2 - a small vision language model to run on-device!
Model: vikhyatk/moondream2
Demo: vikhyatk/moondream2

🏙️CityDreamer: 3D City Generation
Demo: hzxie/city-dreamer
Repo: https://github.com/hzxie/city-dreamer
Model: hzxie/city-dreamer

🌏ML in all languages
Sailor, a family of South-East Asian languages models sail/sailor-language-models-65e19a749f978976f1959825
Samvaad dataset, which includes 140k QA pairs in Hindi, Bengali, Marathi, Tamil, Telugu, Oriya, Punjabi, and Gujarati GenVRadmin/Samvaad-Mixed-Language-2

You can see the previous part at https://huggingface.co/posts/osanseviero/674644082063278
  • 1 reply
·
reacted to Xenova's post with 👍 11 months ago
view post
Post
Introducing the 🤗 Transformers.js WebGPU Embedding Benchmark! ⚡️
👉 Xenova/webgpu-embedding-benchmark 👈

On my device, I was able to achieve a 64.04x speedup over WASM! 🤯 How much does WebGPU speed up ML models running locally in your browser? Try it out and share your results! 🚀
·
reacted to Tonic's post with 👍 11 months ago
view post
Post
Last day on Spaces of the Week ,
and we made it to last place on trending.
i really thought it couldnt get any better, but i'm crying ! 😭

The thing i like the most about ZeroGPU , import spaces , is that i dont have to always check to see if someone decided to test if i have hard character limits , and it reloads the application flawlessly .

drop a like on my spaces here :
Spaces of the Week : https://huggingface.co/spaces/tonic/starcoder2
9 other ZeroGPU demos : https://huggingface.co/tonic
reacted to akhaliq's post with ❤️ 11 months ago
view post
Post
PixArt-Σ

Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation (2403.04692)

In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution. PixArt-\Sigma represents a significant advancement over its predecessor, PixArt-\alpha, offering images of markedly higher fidelity and improved alignment with text prompts. A key feature of PixArt-\Sigma is its training efficiency. Leveraging the foundational pre-training of PixArt-\alpha, it evolves from the weaker' baseline to a stronger' model via incorporating higher quality data, a process we term "weak-to-strong training". The advancements in PixArt-\Sigma are twofold: (1) High-Quality Training Data: PixArt-\Sigma incorporates superior-quality image data, paired with more precise and detailed image captions. (2) Efficient Token Compression: we propose a novel attention module within the DiT framework that compresses both keys and values, significantly improving efficiency and facilitating ultra-high-resolution image generation. Thanks to these improvements, PixArt-\Sigma achieves superior image quality and user prompt adherence capabilities with significantly smaller model size (0.6B parameters) than existing text-to-image diffusion models, such as SDXL (2.6B parameters) and SD Cascade (5.1B parameters). Moreover, PixArt-\Sigma's capability to generate 4K images supports the creation of high-resolution posters and wallpapers, efficiently bolstering the production of high-quality visual content in industries such as film and gaming.


reacted to vladbogo's post with 👍 11 months ago
view post
Post
"Multi-LoRA Composition for Image Generation" introduces two new approaches for combining multiple visual elements in text-to-image generation using Low-Rank Adaptations (LoRAs)! 🎨

Key Points:
* Proposes two methods - LoRA Switch and LoRA Composite - that activate/combine LoRAs during the denoising process rather than merging weights
* LoRA Switch cycles through different LoRAs at each step, while LoRA Composite averages guidance from all LoRAs simultaneously

Paper: Multi-LoRA Composition for Image Generation (2402.16843)
Project page: https://maszhongming.github.io/Multi-LoRA-Composition

Congrats to the authors for their work!
reacted to clefourrier's post with 👍 11 months ago
view post
Post
🔥 New multimodal leaderboard on the hub: ConTextual!

Many situations require models to parse images containing text: maps, web pages, real world pictures, memes, ... 🖼️
So how do you evaluate performance on this task?

The ConTextual team introduced a brand new dataset of instructions and images, to test LMM (large multimodal models) reasoning capabilities, and an associated leaderboard (with a private test set).

This is super exciting imo because it has the potential to be a good benchmark both for multimodal models and for assistants' vision capabilities, thanks to the instructions in the dataset.

Congrats to @rohan598 , @hbXNov , @kaiweichang and @violetpeng !!

Learn more in the blog: https://huggingface.co/blog/leaderboard-contextual
Leaderboard: ucla-contextual/contextual_leaderboard
reacted to osanseviero's post with 👍 11 months ago
view post
Post
Diaries of Open Source. Part 2. Open Source is going brrrrr

🚀The European Space Agency releases MajorTOM, a dataset of earth observation covering half the earth. The dataset has 2.5 trillion pixels! Congrats @aliFrancis and @mikonvergence !
Dataset: Major-TOM/Core-S2L2A
Viewer: Major-TOM/MajorTOM-Core-Viewer

🍞Re-ranking models by MixedBreadAI, with very high quality, Apache 2 license, and easy to use!
Models: https://huggingface.co/models?other=reranker&sort=trending&search=mixedbread-ai
Blog: https://www.mixedbread.ai/blog/mxbai-rerank-v1

🧊StabilityAI and TripoAI release TripoSR, a super-fast MIT-licensed image-to-3D model!
Model: stabilityai/TripoSR
Demo: stabilityai/TripoSR

🤝Together AI and HazyResearch release Based
Models and datasets: hazyresearch/based-65d77fb76f9c813c8b94339c
GH repo: https://github.com/HazyResearch/based

🌊LaVague: an open-source pipeline to turn natural language into browser actions! It can run locally with HuggingFaceH4/zephyr-7b-gemma-v0.1
Read more about it at https://huggingface.co/posts/dhuynh95/717319217106504

🏆Berkeley Function-Calling Leaderboard
Read about it: https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html
Leaderboard: https://gorilla.cs.berkeley.edu/leaderboard.html

🐬Sailor-Chat: chat models built on top of OpenOrca and @sarahooker CohereForAI Aya project. They can be used for South-East Asia languages such as Indonesian, Thai, Vietnamese, Malay and Lao!
Models: sail/sailor-language-models-65e19a749f978976f1959825
Demo: https://huggingface.co/spaces/sail/Sailor-7B-Chat

🤗Arabic-OpenHermes-2.5: OpenHermes dataset translated to Arabic 2A2I/Arabic-OpenHermes-2.5

See the previous part here https://huggingface.co/posts/osanseviero/622788932781684
  • 3 replies
·
reacted to andrewyng's post with 👍 11 months ago
view post
Post
DeepLearning.AI just announced a new short course: Open Source Models with Hugging Face 🤗, taught by Hugging Face's own Maria Khalusova, Marc Sun and Younes Belkada!

As many of you already know, Hugging Face has been a game changer by letting developers quickly grab any of hundreds of thousands of already-trained open source models to assemble into new applications. This course teaches you best practices for building this way, including how to search and choose among models.

You'll learn to use the Transformers library and walk through multiple models for text, audio, and image processing, including zero-shot image segmentation, zero-shot audio classification, and speech recognition. You'll also learn to use multimodal models for visual question answering, image search, and image captioning. Finally, you’ll learn how to demo what you build locally, on the cloud, or via an API using Gradio and Hugging Face Spaces.

Thank you very much to Hugging Face's wonderful team for working with us on this.

You can sign up for the course here: https://www.deeplearning.ai/short-courses/open-source-models-hugging-face/
  • 1 reply
·
reacted to osanseviero's post with ❤️ 11 months ago
view post
Post
Diaries of Open Source. Part 1.

What a week! Here are some of the exciting Open Source releases of the week!

1. BigCode releases The Stack v2 and StarCoder 2
Resources in https://huggingface.co/posts/loubnabnl/596860170283496
Blog https://huggingface.co/blog/starcoder2
Collection: bigcode/starcoder2-65de6da6e87db3383572be1a

2. Playground v2.5, a very powerful new text-to-image model
Model: playgroundai/playground-v2.5-1024px-aesthetic
Demo: playgroundai/playground-v2.5
Blog: https://playground.com/blog/playground-v2-5

3.Evo: DNA foundation models
Blog: https://arcinstitute.org/news/blog/evo
Models: togethercomputer/evo-1-131k-base

4. OpenHermesPreferences: a dataset of ~1 million AI Preferences argilla/OpenHermesPreferences

5. SpeechBrain 1.0: a toolkit with hundreds of recipes and pretrained models for audio-related tasks, such as speech recognition, diarization, and enhancement. New major release!
HF repos: https://huggingface.co/speechbrain
Website: https://speechbrain.github.io/

6. Tower: a suite of Llama-based multilingual translation models Unbabel/tower-659eaedfe36e6dd29eb1805c

7. AllenAI releases OLMo-7B-Instruct
allenai/olmo-suite-65aeaae8fe5b6b2122b46778

8. DIBT - An crowdsourced effort to human-rate prompts. Its 10k prompts dataset is released ttps://huggingface.co/datasets/DIBT/10k_prompts_ranked

9. ChatMusician: A Llama 2 fine-tuned model for music generation m-a-p/ChatMusician

10. Bonito, an model that converts data into synthetic instruction datasets
GitHub: https://github.com/BatsResearch/bonito
Model: BatsResearch/bonito-v1
Paper: Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation (2402.18334)
·
reacted to akhaliq's post with ❤️ 11 months ago
view post
Post
VisionLLaMA

A Unified LLaMA Interface for Vision Tasks

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks (2403.00522)

Large language models are built on top of a transformer-based architecture to process textual inputs. For example, the LLaMA stands out among many open-source implementations. Can the same transformer be used to process 2D images? In this paper, we answer this question by unveiling a LLaMA-like vision transformer in plain and pyramid forms, termed VisionLLaMA, which is tailored for this purpose. VisionLLaMA is a unified and generic modelling framework for solving most vision tasks. We extensively evaluate its effectiveness using typical pre-training paradigms in a good portion of downstream tasks of image perception and especially image generation. In many cases, VisionLLaMA have exhibited substantial gains over the previous state-of-the-art vision transformers. We believe that VisionLLaMA can serve as a strong new baseline model for vision generation and understanding.
reacted to aliFrancis's post with 🤗 11 months ago
view post
Post
🗺 Major TOM: Expandable Datasets for Earth Observation

🚨 RECORD-BREAKING EO DATASET: the largest ever ML-ready Sentinel-2 dataset! It covers almost every single point on Earth captured by the Copernicus Sentinel-2 satellite. @mikonvergence and I are thrilled to finally announce the release of Major-TOM/Core-S2L2A and Major-TOM/Core-S2L1C

🌍 About half of the entire planet is covered. That's 2,245,886 patches of 1068 x 1068 pixels, available in both L1C and L2A. At 10 m resolution, we've got 256 million square km with over 2.5 trillion pixels. It's all yours with a few lines of code. See the paper linked below 🔽 for more info!

🧱 And this is just the beginning. We are currently preparing more datasets from different satellites for the Major TOM org. TOM stands for Terrestrial Observation Metaset - a simple set of rules for building an ecosystem of ML-ready EO datasets, which can be seamlessly combined as if they were Lego bricks.

🚴‍♀️ Want to take the dataset for a spin? We have a viewer app on spaces that lets you go anywhere on Earth and shows you the data, if its available Major-TOM/MajorTOM-Core-Viewer

📰 Preprint paper: Major TOM: Expandable Datasets for Earth Observation (2402.12095)
💻 Colab example: https://colab.research.google.com/github/ESA-PhiLab/Major-TOM/blob/main/03-Filtering-in-Colab.ipynb

Thank you to the amazing 🤗Hugging Face team for the support on this one! @osanseviero @lhoestq @BrigitteTousi
  • 1 reply
·
reacted to multimodalart's post with 👍 11 months ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! 📝

Model
📏 2 base model variants mentioned: 2B and 8B sizes

📐 New architecture in all abstraction levels:
- 🔽 UNet; ⬆️ Multimodal Diffusion Transformer, bye cross attention 👋
- 🆕 Rectified flows for the diffusion process
- 🧩 Still a Latent Diffusion Model

📄 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

🗃️ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
🔁 A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
✏️ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
✅ State of the art in automated evals for composition and prompt understanding
✅ Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
·
reacted to julien-c's post with 👍 11 months ago
view post
Post
What if you could casually access your remote GPU in HF Spaces from the comfort of your local VSCode 🤯
·
reacted to chiphuyen's post with 🤗 11 months ago
view post
Post
It feels awkward having my first post sharing my stuff, but this is a weekend project that I really enjoyed working on. I'd love to meet more people interested in random ideas like this.

A hard part of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt?

Predictive human preference aims to predict which model users might prefer for a specific query.

https://huyenchip.com/2024/02/28/predictive-human-preference.html

One use case is model routing. If we know in advance that for a prompt, users will prefer Claude Instant’s response over GPT-4, and Claude Instant is cheaper/faster than GPT-4, we can route this prompt to Claude Instant. Model routing has the potential to increase response quality while reducing costs and latency.

One pattern is that for simple prompts, weak models can do (nearly) as well as strong models. For more challenging prompts, however, users are more likely to prefer stronger models. Here’s a visualization of predicted human preference for an easy prompt (“hello, how are you?”) and a challenging prompt (“Explain why Planc length …”).

Preference predictors make it possible to create leaderboards unique to any prompt and domain.
·