Linoy Tsaban

linoyts

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face's profile picture ๐ŸงจDiffusers's profile picture Hugging Face Internal Testing Organization's profile picture Huggingface Projects's profile picture Snap Research's profile picture Weizmann Institute of Science's profile picture Editing Images's profile picture leditsplusplus's profile picture Latent Consistency's profile picture Editing Audio's profile picture Women on Hugging Face's profile picture +RAIN film festival's profile picture diffusers-internal-dev's profile picture rnri-inversion's profile picture Snapchat Inc.'s profile picture Latent Explorers's profile picture open/ acc's profile picture RF Inversion's profile picture FlowEdit's profile picture CRINGE's profile picture Rรฉflexion IA's profile picture IP Composer's profile picture Inference Endpoints Images's profile picture

linoyts's activity

reacted to abidlabs's post with โค๏ธ 2 days ago
view post
Post
2974
HOW TO ADD MCP SUPPORT TO ANY ๐Ÿค— SPACE

Gradio now supports MCP! If you want to convert an existing Space, like this one hexgrad/Kokoro-TTS, so that you can use it with Claude Desktop / Cursor / Cline / TinyAgents / or any LLM that supports MCP, here's all you need to do:

1. Duplicate the Space (in the Settings Tab)
2. Upgrade the Gradio sdk_version to 5.28 (in the README.md)
3. Set mcp_server=True in launch()
4. (Optionally) add docstrings to the function so that the LLM knows how to use it, like this:

def generate(text, speed=1):
    """
    Convert text to speech audio.

    Parameters:
        text (str): The input text to be converted to speech.
        speed (float, optional): Playback speed of the generated speech.


That's it! Now your LLM will be able to talk to you ๐Ÿคฏ
reacted to ginipick's post with ๐Ÿ‘ 2 days ago
view post
Post
2814
๐ŸŽจ Renoir Studio: Impressionist Masterpieces Reborn Through AI โœจ

๐ŸŒŸ Experience Renoir's Magical Brushstrokes with AI!

๐Ÿ”— Try it now: ginigen/flux-lora-renoir
๐Ÿ”— Model page: openfree/pierre-auguste-renoir
๐Ÿ”— Collection: openfree/painting-art-ai-681453484ec15ef5978bbeb1

Hello, AI art enthusiasts! ๐Ÿ’–
Today I'm introducing a special model - Pierre-Auguste Renoir Studio. Create your own beautiful artwork in the style of the 19th century French Impressionist master! ๐Ÿ–ผ๏ธ
โœจ Why Renoir's Style?
Renoir is famous for his luminous colors and soft brushstrokes. His works feature:

๐ŸŒž Warm sunshine and dancing light
๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ The beauty of everyday life and joyful moments
๐ŸŒธ Vibrant nature and portraits of beautiful women
๐ŸŽญ Lively Parisian social gatherings and outdoor scenes

๐Ÿ”ฌ Technical Features
This model was developed as a flux-based learning model trained on a curated collection of high-resolution masterpieces from renowned global artists. The LoRA fine-tuning process leveraged exceptional quality open-access imagery released by prestigious institutions including the Art Institute of Chicago. The resulting model demonstrates remarkable capability in capturing the nuanced artistic techniques and stylistic elements across diverse historical art movements! ๐Ÿง ๐Ÿ’ซ
๐Ÿš€ How to Use

Describe your desired scene in the prompt box
Add the "renoir" keyword at the end (this is the trigger keyword!)
Click the 'Generate' button
Enjoy your ideas reborn in Renoir's style!

๐Ÿ’ก Recommended Prompt Examples

"Elegant ladies enjoying a picnic in a sunlit garden, wearing pastel dresses and hats renoir"
"People boating by a riverbank, light reflecting on water, warmth of summer renoir"
"Paris cafe terrace, people chatting over coffee, evening sunset renoir"

๐ŸŒˆ Now It's Your Turn!
#AI#Renoir #ArtificialIntelligence#HuggingFace #FLUX #LoRA
reacted to sanaka87's post with ๐Ÿ”ฅ 2 days ago
view post
Post
2243
๐Ÿš€ Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer๏ฝž

๐ŸŽจ Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

๐Ÿ”“ Code is now open source!
๐Ÿ”ฅ Huggingface DEMO: RiverZ/ICEdit
๐ŸŒ Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
๐Ÿ  GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
๐Ÿค— Huggingface: sanaka87/ICEdit-MoE-LoRA
๐Ÿ“„ arxiv Paper: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

๐Ÿ”ฅ Why itโ€™s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods โ€” extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger โ€” think of it as the โ€œDeepSeek of image editingโ€ ๐Ÿ‘€

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video โ€” happy to send it your way!
  • 1 reply
ยท
reacted to jasoncorkill's post with ๐Ÿš€ 6 days ago
view post
Post
5467
๐Ÿš€ Building Better Evaluations: 32K Image Annotations Now Available

Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.

Rapidata/text-2-image-Rich-Human-Feedback-32k

A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.

Rapidata/text-2-image-Rich-Human-Feedback

In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].

We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].
reacted to AdinaY's post with ๐Ÿ”ฅ 6 days ago
view post
Post
5016
Kimi-Audio ๐Ÿš€๐ŸŽง an OPEN audio foundation model released by Moonshot AI
moonshotai/Kimi-Audio-7B-Instruct
โœจ 7B
โœจ 13M+ hours of pretraining data
โœจ Novel hybrid input architecture
โœจ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
reacted to samihalawa's post with ๐Ÿ”ฅ 10 days ago
view post
Post
2405
SkyReels-V2 INFINITE VIDEO๐Ÿ”ฅโ™พ๏ธ๐ŸŽฌ UNLIMITED duration video generation model by Skywork.

> โ€œFinally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.โ€™โ€™๐Ÿ˜ฎ

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)

Model: Skywork/SkyReels-V2-T2V-14B-720P

โœจ 1.3B & 14B
โœจ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods
reacted to victor's post with ๐Ÿ‘ 10 days ago
view post
Post
2899
DIA TTS is just amazing - please share your funniest gens (here is mine) ๐Ÿ˜‚
nari-labs/Dia-1.6B
reacted to AdinaY's post with ๐Ÿ”ฅ 11 days ago
view post
Post
3451
MAGI-1 ๐Ÿช„ the autoregressive diffusion video model, released by Sand AI

sand-ai/MAGI-1

โœจ 24B with Apache 2.0
โœจ Strong temporal consistency
โœจ Benchmark-topping performance
  • 1 reply
ยท
posted an update 12 days ago
reacted to fdaudens's post with ๐Ÿคฏ 24 days ago
view post
Post
4101
๐ŸŽจ Designers, meet OmniSVG! This new model helps you create professional vector graphics from text/images, generate editable SVGs from icons to detailed characters, convert rasters to vectors, maintain style consistency with references, and integrate into your workflow.

@OmniSVG
  • 2 replies
ยท
reacted to ajibawa-2023's post with ๐Ÿ”ฅ 24 days ago
view post
Post
3956
Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: ajibawa-2023/Children-Stories-Collection

First Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.

Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.
ยท
reacted to AdinaY's post with ๐Ÿ”ฅ 30 days ago
reacted to seawolf2357's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
8209
๐ŸŽจ Ghibli-Style Image Generation with Multilingual Text Integration: FLUX.1 Hugging Face Edition ๐ŸŒโœจ

Hello creators! Today I'm introducing a special image generator that combines the beautiful aesthetics of Studio Ghibli with multilingual text integration! ๐Ÿ˜

seawolf2357/Ghibli-Multilingual-Text-rendering

โœจ Key Features

Ghibli-Style Image Generation - High-quality animation-style images based on FLUX.1
Multilingual Text Rendering - Support for Korean, Japanese, English, and all languages! ๐Ÿ‡ฐ๐Ÿ‡ท๐Ÿ‡ฏ๐Ÿ‡ต๐Ÿ‡ฌ๐Ÿ‡ง
Automatic Image Editing with Simple Prompts - Just input your desired text and you're done!
Two Stylistic Variations Provided - Get two different results from a single prompt
Full Hugging Face Spaces Support - Deploy and share instantly!

๐Ÿš€ How Does It Work?

Enter a prompt describing your desired image (e.g., "a cat sitting by the window")
Input the text you want to add (any language works!)
Select the text position, size, and color
Two different versions are automatically generated!

๐Ÿ’ฏ Advantages of This Model

No Tedious Post-Editing Needed - Text is perfectly integrated during generation
Natural Text Integration - Text automatically adjusts to match the image style
Perfect Multilingual Support - Any language renders beautifully!
User-Friendly Interface - Easily adjust text size, position, and color
One-Click Hugging Face Deployment - Use immediately without complex setup

๐ŸŽญ Use Cases

Creating multilingual greeting cards
Animation-style social media content
Ghibli-inspired posters or banners
Character images with dialogue in various languages
Sharing with the community through Hugging Face Spaces

This project leverages Hugging Face's FLUX.1 model to open new possibilities for seamlessly integrating high-quality Ghibli-style images with multilingual text using just prompts! ๐ŸŒˆ
Try it now and create your own artistic masterpieces! ๐ŸŽจโœจ

#GhibliStyle #MultilingualSupport #AIImageGeneration #TextRendering #FLUX #HuggingFace
ยท
reacted to ZhiyuanthePony's post with ๐Ÿค— about 1 month ago
view post
Post
2586
๐ŸŽ‰ Thrilled to share our #CVPR2025 accepted work:
Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data (2503.21694)

๐Ÿ”ฅ โ€‹Key Innovations:
1๏ธโƒฃ First to adapt SD for โ€‹direct textured mesh generation (1-2s inference)
2๏ธโƒฃ Novel teacher-student framework leveraging multi-view diffusion models ([MVDream](https://arxiv.org/abs/2308.16512) & [RichDreamer](https://arxiv.org/abs/2311.16918))
3๏ธโƒฃ โ€‹Parameter-efficient tuning - โ€‹only +2.6% params over base SD
4๏ธโƒฃ โ€‹3D data-free training liberates model from dataset constraints

๐Ÿ’ก Why matters?
โ†’ A novel โ€‹3D-Data-Free paradigm
โ†’ Outperforms data-driven methods on creative concept generation
โ†’ Unlocks web-scale text corpus for 3D content creation

๐ŸŒ Project: https://theericma.github.io/TriplaneTurbo/
๐ŸŽฎ Demo: ZhiyuanthePony/TriplaneTurbo
๐Ÿ’ป Code: https://github.com/theEricMa/TriplaneTurbo
reacted to prithivMLmods's post with ๐Ÿ‘ about 1 month ago
view post
Post
2632
Dropping Downstream tasks using newly initialized parameters and weights ([classifier.bias & weights]) support domain-specific ๐—ถ๐—บ๐—ฎ๐—ด๐—ฒ ๐—ฐ๐—น๐—ฎ๐˜€๐˜€๐—ถ๐—ณ๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป. Based on siglip2-base-patch16-224 and DomainNet (single-domain, multi-source adaptation), with Fashion-MNIST & More for experimental testing. ๐Ÿงคโ˜„๏ธ

Fashion-Mnist : prithivMLmods/Fashion-Mnist-SigLIP2
Mnist-Digits : prithivMLmods/Mnist-Digits-SigLIP2
Multisource-121 : prithivMLmods/Multisource-121-DomainNet
Painting-126 : prithivMLmods/Painting-126-DomainNet
Sketch-126 : prithivMLmods/Sketch-126-DomainNet
Clipart-126 : prithivMLmods/Clipart-126-DomainNet

Models are trained with different parameter settings for experimental purposes only, with the intent of further development. Refer to the model page below for instructions on running it with Transformers ๐Ÿค—.

Collection : prithivMLmods/domainnet-0324-67e0e3c934c03cc40c6c8782

Citations : SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786 & Moment Matching for Multi-Source Domain Adaptation : https://arxiv.org/pdf/1812.01754

reacted to Yehor's post with ๐Ÿ‘ about 2 months ago
view post
Post
2878
Published a stable version of Ukrainian Text-to-Speech library on GitHub and PyPI.

Features:

- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.

Repository: https://github.com/egorsmkv/tts_uk
reacted to freddyaboulton's post with ๐Ÿš€ 2 months ago
view post
Post
3269
Getting WebRTC and Websockets right in python is very tricky. If you've tried to wrap an LLM in a real-time audio layer then you know what I'm talking about.

That's where FastRTC comes in! It makes WebRTC and Websocket streams super easy with minimal code and overhead.

Check out our org: hf.co/fastrtc
reacted to burtenshaw's post with ๐Ÿ”ฅ 2 months ago
view post
Post
6430
Now the Hugging Face agent course is getting real! With frameworks like smolagents, LlamaIndex, and LangChain.

๐Ÿ”— Follow the org for updates agents-course

This week we are releasing the first framework unit in the course and itโ€™s on smolagents. This is what the unit covers:

- why should you use smolagents vs another library?
- how to build agents that use code
- build multiagents systems
- use vision language models for browser use

The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric .
reacted to AdinaY's post with โค๏ธ 2 months ago
view post
Post
4241
๐Ÿš€ StepFun้˜ถ่ทƒๆ˜Ÿ่พฐ is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm ๐Ÿ”ฅbut many didnโ€™t know they were also building some amazing models. Now, theyโ€™ve just dropped something huge on the hub!

๐Ÿ“บ Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

๐Ÿ”Š Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b
ยท
reacted to davidberenstein1957's post with ๐Ÿค— 3 months ago