ZP

community
Activity Feed

AI & ML interests

None defined yet.

ZP2HF's activity

dn6Β 
posted an update 9 months ago
view post
Post
2993
Sharing for anyone using Diffusers from_single_file loading and affected by the Runway SD 1.5 issue.

If you have runwayml/stable-diffusion-v1-5 saved locally in your HF cache then loading single file checkpoints in the following way should still work.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>")


If you do not have the model repo saved in your cache, then automatically inferring the pipeline config will not work since the reference repo runwayml/stable-diffusion-v1-5 doesn't exist anymore.

You can use an alternative SD1.5 repo id to still configure your pipeline.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>", config="Lykon/DreamShaper")


We're working on resolving the issue ASAP.
  • 2 replies
Β·
xianbaoΒ 
posted an update 9 months ago
view post
Post
2217
With the open-weight release of CogVideoX-5B from THUDM, i.e. GLM team, the Video Generation Model (how about calling it VGM) field has officially became the next booming "LLM"

What does the landscape look like? What are other video generation models? This collection below is all your need.

xianbao/video-generation-models-66c350163c74f60f5c412af6

The above video is generated by @a-r-r-o-w with CogVideoX-5B, taken from a nice lookout for the field!
multimodalartΒ 
posted an update 11 months ago
xianbaoΒ 
posted an update about 1 year ago
view post
Post
2000
Why Apache 2.0 Matters for LLMs πŸ€”

@01AI_Yi recently switched from a permissive & commercially friendly license, to Apache 2.0. And the community loved it! πŸš€

@JustinLin610 also had a poll on model license and the majority votes for Apache 2.0.

Why it is a Big Deal? ⬇️

πŸ“š Legal Simplicity: Custom licenses need costly & time-consuming legal review. Apache 2.0 is well-known & easier for legal teams to handle.

πŸ‘©β€πŸ’» Developer-Friendly: Legal docs are a pain for devs! Apache 2.0 is well-known and tech-friendly, making it easier for non-native developers to understand the implications too.

πŸ”— Easier Integration: Apache 2.0 is compatible with many other licenses, simplifying tasks like model merging with models of different licensing requirements.

🚫 No Permission Needed: Custom licenses often require explicit permission and additional documentation work of filling forms, creating barriers. Apache 2.0 removes this hurdle, letting devs focus on innovation.

There are a lot interesting discussions from
@JustinLin610 's poll: https://x.com/JustinLin610/status/1793559737482764375 which inspired this thread.

Any other thoughts? Let me know ^^
  • 1 reply
Β·
xianbaoΒ 
posted an update about 1 year ago
view post
Post
1279
DeepSeekV2 is a big deal. Not only because its significant improvements to both key components of Transformer: the Attention layer and FFN layer.

It has also completed disrupted the Chines LLM market and forcing the competitors to drop the price to 1% of the original price.

---

There are two key components in Transformer architecture: the self-attention layer, which captures relationships between tokens in context, and the Feed-Forward Network (FFN) layer, which stores knowledge.

DeepSeek V2 introduces optimizations to both:

Attention layer normally uses KV Cache to reduce repetitive compute, but it consumes significant GPU RAM, limiting concurrent requests. DeepSeek V2 introduces Multi-head Latent Attention (MLA), which stores only a small latent representation, resulting in substantial RAM savings.

DeepSeek V2 utilizes 162 experts instead of the usual 8 as in Mixtral. This approach segments experts into finer granularity for higher specialization and more accurate knowledge acquisition. Activating only a small subset of experts for each token, leads to efficient processing.

It disrupted the market by dropping API prices to $0.14 per 1M tokens. This dramatic reduction forced competitors like GLM, Ernie, and QWen to follow suit, lowering their prices to 1% of their original offerings. Now, users can access these APIs at 1/35th the cost of ChatGPT-4o.
multimodalartΒ 
posted an update about 1 year ago
view post
Post
28358
The first open Stable Diffusion 3-like architecture model is JUST out πŸ’£ - but it is not SD3! πŸ€”

It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model πŸ–ΌοΈβœ¨, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english 🀝 chinese understanding

Try it out by yourself here ▢️ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)

In the paper they claim to be SOTA open source based on human preference evaluation!
xianbaoΒ 
posted an update about 1 year ago
multimodalartΒ 
posted an update over 1 year ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! πŸ“

Model
πŸ“ 2 base model variants mentioned: 2B and 8B sizes

πŸ“ New architecture in all abstraction levels:
- πŸ”½ UNet; ⬆️ Multimodal Diffusion Transformer, bye cross attention πŸ‘‹
- πŸ†• Rectified flows for the diffusion process
- 🧩 Still a Latent Diffusion Model

πŸ“„ 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

πŸ—ƒοΈ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
πŸ” A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
✏️ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
βœ… State of the art in automated evals for composition and prompt understanding
βœ… Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
Β·
multimodalartΒ 
posted an update over 1 year ago
xianbaoΒ 
posted an update over 1 year ago
view post
Post
Welcome Bunny! A family of lightweight but powerful multimodal models from BAAI

With detailed work on dataset curation, the Bunny-3B model built upon SigLIP and Phi-2 achieves performance on par with 13B models.

Model: BAAI/bunny-phi-2-siglip-lora

  • 2 replies
Β·
xianbaoΒ 
posted an update over 1 year ago
view post
Post
There appears to be a huge misunderstanding regarding the licensing requirements for open sourced Chinese speaking speaking LLMs on
@huggingface


I initially shared this misconception too, but after conducting some research, I came up with the list below.

Veryimpressive!

xianbaoΒ 
posted an update over 1 year ago
view post
Post
Vision LLM for #edgecomputing ?

@openbmb , who OS'ed the UltraFeedback dataset before, released a series of strong eco-friendly yet powerful LLMs

- MiniCPM: 2B model that competes with Mistral-7B

- MiniCPM-V: 3B vision LLM on edge!
- MiniCPM-V: 3B vision LLM on edge!
multimodalartΒ 
posted an update over 1 year ago
view post
Post
It seems February started with a fully open source AI renaissance 🌟

Models released with fully open dataset, training code, weights βœ…

LLM - allenai/olmo-suite-65aeaae8fe5b6b2122b46778 🧠
Embedding - nomic-ai/nomic-embed-text-v1 πŸ“š (sota!)

And it's literally February 1st - can't wait to see what else the community will bring πŸ‘€