598 1018 3695

Victor Mustar PRO

victor

victormustar

AI & ML interests

Building the UX of this website

Recent Activity

upvoted a changelog about 11 hours ago

Static Spaces can now have a build step

upvoted an article about 12 hours ago

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

reacted to ginipick's post with 🔥 about 12 hours ago

🎨 FLUX VIDEO Generation - All-in-One AI Image/Video/Audio Generator 🚀 Introduction FLUX VIDEO Generation is an all-in-one AI creative tool that generates images, videos, and audio from text prompts, powered by NVIDIA H100 GPU for lightning-fast processing! https://huggingface.co/spaces/ginigen/Flux-VIDEO ✨ Key Features 1️⃣ Text → Image → Video 🖼️➡️🎬 Generate high-quality images from Korean/English prompts Transform still images into natural motion videos Multiple size presets (Instagram, YouTube, Facebook, etc.) Demo: 1-4 seconds / Full version: up to 60 seconds 2️⃣ Image Aspect Ratio Change 🎭 Freely adjust image aspect ratios Expand images with outpainting technology 5 alignment options (Center, Left, Right, Top, Bottom) Real-time preview functionality 3️⃣ Video + Audio Generation 🎵 Add AI-generated audio to videos Korean prompt support (auto-translation) Context-aware sound generation Powered by MMAudio technology 🛠️ Tech Stack Image Generation: FLUX, Stable Diffusion XL Video Generation: TeaCache optimization Audio Generation: MMAudio (44kHz high-quality) Outpainting: ControlNet Union Infrastructure: NVIDIA H100 GPU for ultra-fast generation 💡 How to Use Select your desired tab Enter your prompt (Korean/English supported!) Adjust settings Click generate button 🎯 Use Cases 📱 Social media content creation 🎥 YouTube Shorts/Reels 📊 Presentation materials 🎨 Creative artwork 🎵 Background sound generation

View all activity

Organizations

victor's activity

reacted to ginipick's post with 🔥 about 12 hours ago

Post

3912

🎨 FLUX VIDEO Generation - All-in-One AI Image/Video/Audio Generator

🚀 Introduction
FLUX VIDEO Generation is an all-in-one AI creative tool that generates images, videos, and audio from text prompts, powered by NVIDIA H100 GPU for lightning-fast processing!

ginigen/Flux-VIDEO

✨ Key Features
1️⃣ Text → Image → Video 🖼️➡️🎬

Generate high-quality images from Korean/English prompts
Transform still images into natural motion videos
Multiple size presets (Instagram, YouTube, Facebook, etc.)
Demo: 1-4 seconds / Full version: up to 60 seconds

2️⃣ Image Aspect Ratio Change 🎭

Freely adjust image aspect ratios
Expand images with outpainting technology
5 alignment options (Center, Left, Right, Top, Bottom)
Real-time preview functionality

3️⃣ Video + Audio Generation 🎵

Add AI-generated audio to videos
Korean prompt support (auto-translation)
Context-aware sound generation
Powered by MMAudio technology

🛠️ Tech Stack

Image Generation: FLUX, Stable Diffusion XL
Video Generation: TeaCache optimization
Audio Generation: MMAudio (44kHz high-quality)
Outpainting: ControlNet Union
Infrastructure: NVIDIA H100 GPU for ultra-fast generation

💡 How to Use

Select your desired tab
Enter your prompt (Korean/English supported!)
Adjust settings
Click generate button

🎯 Use Cases

📱 Social media content creation
🎥 YouTube Shorts/Reels
📊 Presentation materials
🎨 Creative artwork
🎵 Background sound generation

1 reply

reacted to frascuchon's post with 👍 about 21 hours ago

Post

2550

Hey! I built RAG MCP Server Space, a simple Gradio MCP server for RAG systems that allows you to search relevant results without passing huge contexts to your LLM.

You can use this space to integrate with your agents and improve the efficiency of your search results. Feel free to try it out and let me know if you have any feedback or questions!

frascuchon/rag-mcp-server

Thanks for checking it out!

reacted to prithivMLmods's post with 👍 about 21 hours ago

Post

4491

OpenAI, Google, Hugging Face, and Anthropic have released guides and courses on building agents, prompting techniques, scaling AI use cases, and more. Below are 10+ minimalistic guides and courses that may help you in your progress. 📖

⤷ Agents Companion : https://www.kaggle.com/whitepaper-agent-companion
⤷ Building Effective Agents : https://www.anthropic.com/engineering/building-effective-agents
⤷ Guide to building agents by OpenAI : https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
⤷ Prompt engineering by Google : https://www.kaggle.com/whitepaper-prompt-engineering
⤷ Google: 601 real-world gen AI use cases : https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders
⤷ Prompt engineering by IBM : https://www.ibm.com/think/topics/prompt-engineering-guide
⤷ Prompt Engineering by Anthropic : https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
⤷ Scaling AI use cases : https://cdn.openai.com/business-guides-and-resources/identifying-and-scaling-ai-use-cases.pdf
⤷ Prompting Guide 101 : https://services.google.com/fh/files/misc/gemini-for-google-workspace-prompting-guide-101.pdf
⤷ AI in the Enterprise by OpenAI : https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf

by HF🤗 :
⤷ AI Agents Course by Huggingface : https://huggingface.co/learn/agents-course/unit0/introduction
⤷ Smol-agents Docs : https://huggingface.co/docs/smolagents/en/tutorials/building_good_agents
⤷ MCP Course by Huggingface : https://huggingface.co/learn/mcp-course/unit0/introduction
⤷ Other Course (LLM, Computer Vision, Deep RL, Audio, Diffusion, Cookbooks, etc..) : https://huggingface.co/learn

2 replies

reacted to codelion's post with 🚀 1 day ago

Post

2874

🧠 We just implemented Andrej Karpathy's "third paradigm" for LLM learning!

System Prompt Learning (SPL) enables LLMs to automatically learn problem-solving strategies from experience, rather than relying on static prompts.

🚀 How it works:
Your LLM builds a database of effective strategies, selects the best ones for each problem, and refines them over time based on success rates.

📊 Results across math benchmarks:
Arena Hard: 29% → 37.6% (+8.6%)
AIME24: 23.33% → 30% (+6.67%)
OptILLMBench: 61% → 65% (+4%)

The best part? All strategies are human-readable and the system gets progressively better at problem types you use frequently.

✨ Key benefits:
🔄 Cumulative learning over time
📖 Transparent, inspectable strategies
🔌 Works with any OpenAI-compatible API
⚡ Simple integration: just add "spl-" prefix to your model

Built as an open-source plugin in optillm. After 500 queries, our system developed 129 strategies and refined 97 of them!

This feels like a genuine step toward AI that learns from experience while staying completely interpretable.

🔗 GitHub: https://github.com/codelion/optillm/tree/main/optillm/plugins/spl
📖 Full article: https://huggingface.co/blog/codelion/system-prompt-learning
🐦 Original Karpathy tweet: https://x.com/karpathy/status/1921368644069765486

Have you experimented with advanced system prompting? What strategies would you want your LLM to learn?

reacted to Kseniase's post with 🚀 1 day ago

Post

1649

13 Awesome MCP Servers

MCP changed how agents connect with tools.

After writing the most read explanation of MCP on Hugging Face (https://huggingface.co/blog/Kseniase/mcp), we chose this 13 awesome MCP servers that you can work with:

1. Agentset MCP -> https://github.com/agentset-ai/mcp-server
For efficient and quick building of intelligent, doc-based apps using open-source Agentset platform for RAG

2. GitHub MCP Server -> https://github.com/github/github-mcp-server
Integrates GitHub APIs into your workflow, allowing to build AI tools and apps that interact with GitHub's ecosystem

3. arXiv MCP -> https://github.com/andybrandt/mcp-simple-arxiv
Allows working with research papers on arXiv through effective search and access to their metadata, abstracts, and links

4. MCP Run Python -> https://github.com/pydantic/pydantic-ai/tree/main/mcp-run-python
Enables to run Python code in a sandbox via Pyodide in Deno, so it can be isolated from the rest of the operating system

5. Safe Local Python Executor -> https://github.com/maxim-saplin/mcp_safe_local_python_executor
A lightweight tool for running LLM-generated Python code locally, using Hugging Face’s LocalPythonExecutor (from smolagents framework) and exposing it via MCP for AI assistant integration

6. Cursor MCP Installer -> https://github.com/matthewdcage/cursor-mcp-installer
Allows to automatically add MCP servers to Cursor for development convenience

7. Basic Memory -> https://memory.basicmachines.co/docs/introduction
This knowledge management system connects to LLMs and lets you build a persistent semantic graph from AI conversations with AI agents

Read further in the comments 👇

If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

reacted to clem's post with 🔥 1 day ago

Post

4963

Today, we're unveiling two new open-source AI robots! HopeJR for $3,000 & Reachy Mini for $300 🤖🤖🤖

Let's go open-source AI robotics!

5 replies

reacted to ProCreations's post with 🚀 1 day ago

Post

1655

60 followers,
yay

1 reply

reacted to jeffboudier's post with 🚀 6 days ago

Post

2394

👏 Congrats @jinanz adding TimesFM times series forecasting to Transformers!

Learn how to use TimesFM in this blog post by the Nutanix team: https://huggingface.co/blog/Nutanix/introducing-timesfm-for-time-series-forcasting

reacted to openfree's post with 🔥 7 days ago

Post

2493

🧠 AI Brand Naming with 15 Specialized Theories

🎯 Core Features
15 Expert Theories for professional brand naming
Bilingual Support Korean/English for global brands
Unified Evaluation System creativity/memorability/relevance scores
Real-time Visualization theory-specific custom designs

openfree/Naming

🔬 Applied Theories
Cognitive Theories (4)
🟦 Square Theory - Semantic square structure with 4-word relationships
🔊 Sound Symbolism - Psychological connections between phonemes and meaning
🧠 Cognitive Load - Minimized processing for instant recognition
👁️ Gestalt Theory - Perceptual principles where whole exceeds parts

Creative Theories (3)
🔀 Conceptual Blending - Merging concepts to create new meanings
🔧 SCAMPER Method - 7 creative transformation techniques
🌿 Biomimicry - Nature-inspired wisdom from 3.8 billion years of evolution

Strategic Theories (2)
✅ Jobs-to-be-Done - Customer-centric problem-solving focus
💭 Design Thinking - Human-centered innovation methodology

Cultural Theories (3)
🎭 Jung's Archetype - 12 universal archetypes for emotional connection
🌐 Linguistic Relativity - Cross-cultural thinking patterns consideration
🧬 Memetics - Cultural transmission and evolutionary potential

Differentiation Theories (3)
⚡ Von Restorff Effect - Uniqueness for 30x better recall
🎨 Color Psychology - Emotional associations and color meanings
🌍 Network Effects - Value maximization through network structures

💫 Special Features
Each theory provides unique visualizations and customized analysis:

Square Theory → 4-corner relationship diagram
Blending → Concept fusion flowchart
Color → Interactive color palette display
Theory-specific insights for each approach

🎨 Output Information
Core: Brand name, slogan, values, emotions, personality
Visual: Colors, concepts, typography styles
Linguistic: Pronunciation, etymology, global adaptability
Strategic: Differentiation, positioning, growth potential
Theory-specific...

reacted to fdaudens's post with 🤗 7 days ago

Post

2832

🎵 Dream come true for content creators! TIGER AI can extract voice, effects & music from ANY audio file 🤯
This lightweight model uses frequency band-split technology to separate speech like magic. Kudos to @fffiloni for the amazing demo! fffiloni/TIGER-audio-extraction

reacted to ProCreations's post with 🚀 7 days ago

Post

2854

Eyyyy 50 followers 🤯

1 reply

reacted to clem's post with 🤗 7 days ago

Post

3191

It's just become easier to share your apps on the biggest AI app store (aka HF spaces) for unlimited storage, more visibility and community interactions.

Just pick a React, Svelte, or Vue template when you create your space or add app_build_command: npm run build in your README's YAML and app_file: build/index.html in your README's YAML block.

Or follow this link: https://huggingface.co/new-space?sdk=static

Let's build!

1 reply

reacted to fdaudens's post with ❤️ 7 days ago

Post

3749

Just completed the AI Agents course and wow, that capstone project really makes you understand how to build agents that can handle real-world complexity!

The final project uses the GAIA dataset - your agent has to solve tasks like analyzing Excel files, processing audio recordings, answering questions about YouTube videos, and diving into research papers. This isn't toy examples, it's the messy, multimodal stuff agents need to handle in practice.

Whether you’re just getting started with agents or want to go deeper with tools like LangChain, LlamaIndex, and SmolAgents, this course has tons of useful stuff. A few key insights:
- Code agents are incredibly versatile once you get the architecture right
- The sweet spot is finding the right balance of guidance vs autonomy for each use case
- Once the logic clicks, the possibilities really are endless - it's like letting LLMs break free from the chatbox

The course is free and the certification deadline is July 1st, 2025.

The Hugging Face team built something special here. If you're tired of AI that impresses in demos but fails in practice, this is your path to building agents that actually deliver. https://huggingface.co/learn/agents-course/unit0/introduction

Best part? There's the MCP course next!

reacted to Jofthomas's post with 🔥 12 days ago

Post

2670

Meet our new agentic model : 𝗗𝗲𝘃𝘀𝘁𝗿𝗮𝗹

Devstral is an open-source LLM built software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌.

𝗞𝗲𝘆 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 :
• 🤖 𝗔𝗴𝗲𝗻𝘁𝘀 : perfect for Agentic coding
• 🍃 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁: Devstral is a 𝟮𝟰𝗕 parameter based on Mistral small.
• ©️ 𝗔𝗽𝗮𝗰𝗵𝗲 𝟮.𝟬, meaning fully open-source !
• 📄 A 𝟭𝟮𝟴𝗸 context window.

📚Blog : https://mistral.ai/news/devstral
⚡API : The model is also available on our API under the name 𝗱𝗲𝘃𝘀𝘁𝗿𝗮𝗹-𝘀𝗺𝗮𝗹𝗹-𝟮𝟱𝟬𝟱
🤗 repo : mistralai/Devstral-Small-2505

Can't wait to see what you will build with it !

1 reply

reacted to clem's post with 🔥 13 days ago

Post

3439

Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!

25 replies

reacted to ProCreations's post with 🤗 16 days ago

Post

3179

Eyyy thank you guys for 40 followers!

reacted to nicolay-r's post with 🔥 16 days ago

Post

2361

🚀 For those who interested in minimalistic integration of LLMs inferece with predefined reasoning shema, excited to share the latest bulk chain 1.1.0. It represents a no-string solution for deploying your LLM for efficient inference over data iterators.
✨ Key Features:
- Full async inference support + Including streaming mode for real-time output
- simplified inference API
🔗 Check out the repo: https://github.com/nicolay-r/bulk-chain

💡 Special thanks to @RicardoLee for his work on effective async LLaMA-3 deployment that helped shape this release:
https://github.com/RicardoLeeV587/Llama3-FastInference

reacted to Jaward's post with 👍 16 days ago

Post

1890

I gave rectified flow a try, so here is nanoRF - a lightweight implementation of a Rectified Flow Transformer model, ~ 618k parameters, 6 layers deep, dim 64, patch size 4, learning rate 5e-4 trained on my 8bg ram m2 macbookair for 2k epochs.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanoRF.ipynb
See demo: https://x.com/Jaykef_/status/1923718725578129838
Reference Paper: https://arxiv.org/abs/2403.03206.

reacted to cbensimon's post with 🔥 19 days ago

Post

5714

🚀 ZeroGPU medium size is now available as a power-user feature

Nothing too fancy for now—ZeroGPU Spaces still default to large (70GB VRAM)—but this paves the way for:
- 💰 size-based quotas / pricing (medium will offer significantly more usage than large)
- 🦣 the upcoming xlarge size (141GB VRAM)

You can as of now control GPU size via a Space variable. Accepted values:
- auto (future default)
- medium
- large (current default)

The auto mode checks total CUDA tensor size during startup:
- More than 30GB → large
- Otherwise → medium

3 replies

reacted to RiverZ's post with 🤗 30 days ago

Post

3158

🚀 Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer～

🎨 Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔓 Code is now open source!
🔥 Huggingface DEMO:
RiverZ/ICEdit

🌐 Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
🏠 GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
🤗 Huggingface:
sanaka87/ICEdit-MoE-LoRA

📄 arxiv Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔥 Why it’s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods — extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger — think of it as the “DeepSeek of image editing” 👀

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video — happy to send it your way!