ByteDance drops OmniHumanπ₯ This is peak SOTA performance - flawless natural gestures with perfect lip sync and facial expressions. This is the second time they've released SOTA level talking-heads only this time with hands and body motion. Project: https://omnihuman-lab.github.io/
π§ Special Features: β’ π Support for PDF/text files up to 2MB β’ π― Precise context understanding β’ β‘ Fast response time β’ π Secure file handling
Full source code available - ready to integrate into your projects!
Finally, an open-source AI that turns your lyrics into full songs is hereβmeet YuE! Unlike other tools that only create short clips, YuE can make entire songs (up to 5 minutes) with vocals, melody, and instruments all working together. Letsss go!
π’ For those who wish to apply DeepSeek-R1 for handling tabular / streaming data using schema of prompts (CoT), the OpenRouter AI hosts API for accessing: https://openrouter.ai/deepseek/deepseek-r1
πΊ below is a screenshot of how to quick start the demo, in which you can test your schema for LLM responses. It would ask to type all the parameters first for completing the requests (which is text within this example).
π To apply it for JSONL/CSV data, you can use --src shell parameter for passing the related file
β³ As for time, OpenRouter finds me relatively slow with 30~40 seconds per request
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!
π§ͺ Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.
π§ Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.
π₯ Step 3: show we can go from base model -> SFT -> RL via multi-stage training.