Spaces:

acvlab
/

FantasyTalking

Running on Zero

Apply for community grant: Academic project (gpu)

by Chao8Chao - opened Apr 28

Alibaba AMAP CV Lab org Apr 28

Creating a realistic animatable avatar from a single static portrait remains challenging. Existing approaches often struggle to capture subtle facial expressions, the associated global body movements, and the dynamic background. To address these limitations, we propose a novel framework that leverages a pretrained video diffusion transformer model to generate high-fidelity, coherent talking portraits with controllable motion dynamics. At the core of our work is a dual-stage audio-visual alignment strategy. In the first stage, we employ a clip-level training scheme to establish coherent global motion by aligning audio-driven dynamics across the entire scene, including the reference portrait, contextual objects, and background. In the second stage, we refine lip movements at the frame level using a lip-tracing mask, ensuring precise synchronization with audio signals. To preserve identity without compromising motion flexibility, we replace the commonly used reference network with a facial-focused cross-attention module that effectively maintains facial consistency throughout the video. Furthermore, we integrate a motion intensity modulation module that explicitly controls expression and body motion intensity, enabling controllable manipulation of portrait movements beyond mere lip motion. Extensive experimental results show that our proposed approach achieves higher quality with better realism, coherence, motion intensity, and identity preservation. Ours project page: https://fantasy-amap.github.io/fantasy-talking/.

hysts

Apr 28

Hi @Chao8Chao , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

bonams

Apr 28

gpu abort error. may be im testing it too early

janne

Apr 30

•

edited Apr 30

Has anybody been able to run this thing even using a predefined sample image and audio? I have not so far. Not using this space or my own private spaces. It appears to require at least A100 with very long time to run. You will easily run out of credits available before it finishes, even with PRO subscription.
EDIT: A pop-up says that at least 1200s (20min) of computing is needed with ZERO, not very convenient...

Daemontatox

May 1

Ok so i got it to run and got results.

It took 850 seconds for the example to render and get 4 secs of video, and the results are just garbage at best , its literally the image moving its mouth up and down hoping the audio would match and not even half audio was used in the video.

If i used wan i2v and just played the audio in the background, i would get better results.

ZennyKenny

May 1

I uploaded my own image / audio pair and got a timeout error after 700 seconds. Agree that current compute is insufficient. Probably best to run on a cloned private Space with self-funded compute or locally.

Ryhverse

May 3

I, too, get this pop-up that says that at least 1200s (20min) of computing is needed with ZERO and then redirects to subscribe to a pro plan, which is not what I want. I thought anyone could test the space, but it seems like it's either broken or the config is. Please fix it this seems like a fun project!

d74g0n

May 4

lol I got directed to pro.. bought it lol got 4 seconds of half matched mouths then the big ZeroGPU nope lul. I feel... misled and less wealthy. This is a new feel for me. well played.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment