TopContributors-SpaceLikes (Top Contributors: Space Likes)

posted an update 10 days ago

Post

4124

Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing

2 replies

·

jbilcke-hf

posted an update 22 days ago

Post

2729

Did you know that there is a UI wrapper around https://github.com/a-r-r-o-w/finetrainers which is a great library made by @a-r-r-o-w for finetuning AI video models?

The UI is called VideoModelStudio (or VMS in casual chat)

All you have to do is to duplicate this space:
jbilcke-hf/VideoModelStudio

jbilcke-hf

posted an update 22 days ago

Post

1823

Hi everyone,

I've seen some unsuccessful attempts at running Wan2GP inside a Hugging Face Space, which is a shame as it is a great Gradio app!

So here is a fork that you can use, with some instructions on how to do this:

jbilcke-hf/Wan2GP_you_must_clone_this_space_to_use_it#1

Note : some things like persistent models/storage/custom LoRAs might not be fully working out of the box. If you need those, you might have to dig into the Wan2GP codebase, see how to tweak the storage folder. Happy hacking!

fffiloni

posted an update 4 months ago

Post

15132

I was thinking i need to step up my game on training Flux LoRas models, time to have some fun ! ☀️

Expect a new drop per week on aesthetics that catched my attention, here are 3 of them that worked really well !

fffiloni/cute-comic-800
fffiloni/carbo-800
fffiloni/oniric-750

3 replies

·

mvaloatto

updated a Space 5 months ago

Top Contributors: Space Likes

🚀

Creators of Spaces with the most cumulative all-time likes

fffiloni

posted an update 5 months ago

Post

3581

Explain like i'm 5 the last take from @thomwolf on X about Dario's essay on DeepSeek:

—› Open-source AI is like a big cookbook that everyone can read and improve. Instead of a few chefs keeping their recipes secret, anyone can cook, test, and invent new things.

If only one company controls AI, everything stops if they have a problem—like when the internet goes down. With open-source, many people can help, making sure it keeps running smoothly.

AI isn’t just a race between two countries; it’s a team effort around the world. By sharing, we move faster and create safer technology for everyone.
—
🤗

akhaliq

posted an update 6 months ago

Post

22061

Google drops Gemini 2.0 Flash Thinking

a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more

now available in anychat, try it out: https://huggingface.co/spaces/akhaliq/anychat

4 replies

·

jbilcke-hf

posted an update 6 months ago

Post

15413

Doing some testing with HunyuanVideo on the Hugging Face Inference Endpoints 🤗

prompt: "a Shiba Inu is acting as a DJ, he wears sunglasses and is mixing and scratching with vinyl discs at a Ibiza sunny sand beach party"

1280x720, 22 steps, 121 frames

There are still some things to iron out regarding speed and memory usage, right now it takes 20min on an A100 (see attached charts)

but you can check it out here:

https://huggingface.co/jbilcke-hf/HunyuanVideo-for-InferenceEndpoints

There are various things I want to try like the 100% diffusers version and other models (LTX-Video..)

akhaliq

posted an update 7 months ago

Post

21754

QwQ-32B-Preview is now available in anychat

A reasoning model that is competitive with OpenAI o1-mini and o1-preview

try it out: https://huggingface.co/spaces/akhaliq/anychat

1 reply

·

akhaliq

posted an update 7 months ago

Post

4683

New model drop in anychat

allenai/Llama-3.1-Tulu-3-8B is now available

try it here: https://huggingface.co/spaces/akhaliq/anychat

akhaliq

posted an update 7 months ago

Post

3520

anychat

supports chatgpt, gemini, perplexity, claude, meta llama, grok all in one app

try it out there: https://huggingface.co/spaces/akhaliq/anychat

fffiloni

posted an update 8 months ago

Post

13996

DimensionX is out for you to try and duplicate 🤗
—> fffiloni/DimensionX

Discuss Paper: DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion (2411.04928)

Examples by the amazing William Lamkin @phanes

4 replies

·

fffiloni

posted an update 9 months ago

Post

20027

Visionary Walter Murch (editor for Francis Ford Coppola), in 1999:

“ So let's suppose a technical apotheosis some time in the middle of the 21st century, when it somehow becomes possible for one person to make an entire feature film, with virtual actors. Would this be a good thing?

If the history of oil painting is any guide, the broadest answer would be yes, with the obvious caution to keep a wary eye on the destabilizing effect of following too intently a hermetically personal vision. One need only look at the unraveling of painting or classical music in the 20th century to see the risks.

Let's go even further, and force the issue to its ultimate conclusion by supposing the diabolical invention of a black box that could directly convert a single person's thoughts into a viewable cinematic reality. You would attach a series of electrodes to various points on your skull and simply think the film into existence.

And since we are time-traveling, let us present this hypothetical invention as a Faustian bargain to the future filmmakers of the 21st century. If this box were offered by some mysterious cloaked figure in exchange for your eternal soul, would you take it?

The kind of filmmakers who would accept, even leap, at the offer are driven by the desire to see their own vision on screen in as pure a form as possible. They accept present levels of collaboration as the evil necessary to achieve this vision. Alfred Hitchcock, I imagine, would be one of them, judging from his description of the creative process: "The film is already made in my head before we start shooting."”
—
Read "A Digital Cinema of the Mind? Could Be" by Walter Murch: https://archive.nytimes.com/www.nytimes.com/library/film/050299future-film.html

1 reply

·

multimodalart

posted an update 11 months ago

Post

35206

New feature 🔥
Image models and LoRAs now have little previews 🤏

If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr , @alvdansen , @DoctorDiffusion , @e-n-v-y , @KappaNeuro @ostris

3 replies

·

jbilcke-hf

posted an update 12 months ago

Post

32535

I am working to add automatic speech bubbles to the AI Comic Factory!

This is not finished yet - there are a few things to tweak - but so far results are pretty promising!

(it's available through a "beta" toggle)

9 replies

·

akhaliq

posted an update about 1 year ago

Post

21036

Phased Consistency Model

Phased Consistency Model (2405.18407)

The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models. However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory. In this paper, we identify three key flaws in the current design of LCM. We investigate the reasons behind these limitations and propose the Phased Consistency Model (PCM), which generalizes the design space and addresses all identified limitations. Our evaluations demonstrate that PCM significantly outperforms LCM across 1--16 step generation settings. While PCM is specifically designed for multi-step refinement, it achieves even superior or comparable 1-step generation results to previously state-of-the-art specifically designed 1-step methods. Furthermore, we show that PCM's methodology is versatile and applicable to video generation, enabling us to train the state-of-the-art few-step text-to-video generator.

fffiloni

posted an update about 1 year ago

Post

19554

🇫🇷
Quel impact de l’IA sur les filières du cinéma, de l’audiovisuel et du jeu vidéo?
Etude prospective à destination des professionnels
— CNC & BearingPoint | 09/04/2024

Si l’Intelligence Artificielle (IA) est utilisée de longue date dans les secteurs du cinéma, de l’audiovisuel et du jeu vidéo, les nouvelles applications de l’IA générative bousculent notre vision de ce dont est capable une machine et possèdent un potentiel de transformation inédit. Elles impressionnent par la qualité de leurs productions et suscitent par conséquent de nombreux débats, entre attentes et appréhensions.

Le CNC a donc décider de lancer un nouvel Observatoire de l’IA Afin de mieux comprendre les usages de l’IA et ses impacts réels sur la filière de l’image. Dans le cadre de cet Observatoire, le CNC a souhaité dresser un premier état des lieux à travers la cartographie des usages actuels ou potentiels de l’IA à chaque étape du processus de création et de diffusion d’une œuvre, en identifiant les opportunités et risques associés, notamment en termes de métiers et d’emploi. Cette étude CNC / Bearing Point en a présenté les principaux enseignements le 6 mars, lors de la journée CNC « Créer, produire, diffuser à l’heure de l’intelligence artificielle ».

Le CNC publie la version augmentée de la cartographie des usages de l’IA dans les filières du cinéma, de l’audiovisuel et du jeu vidéo.

Lien vers la cartographie complète: https://www.cnc.fr/documents/36995/2097582/Cartographie+des+usages+IA_rapport+complet.pdf/96532829-747e-b85e-c74b-af313072cab7?t=1712309387891

4 replies

·

akhaliq

posted an update about 1 year ago

Post

21252

Chameleon

Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.

multimodalart

posted an update about 1 year ago

Post

28406

The first open Stable Diffusion 3-like architecture model is JUST out 💣 - but it is not SD3! 🤔

It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model 🖼️✨, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english 🤝 chinese understanding

Try it out by yourself here ▶️ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)

In the paper they claim to be SOTA open source based on human preference evaluation!

jbilcke-hf

posted an update about 1 year ago

Post

30035

Added a new button to the AI Stories Factory for more fun!

jbilcke-hf/ai-stories-factory

11 replies

·

AI & ML interests

Team members 5

TopContributors-SpaceLikes's activity

Top Contributors: Space Likes