David Leon's picture

3 7 3

David Leon

DavidLeon

·

https://www.linkedin.com/in/daweileng/

AI & ML interests

AIGC & LMM

Recent Activity

upvoted a paper about 1 month ago

RzenEmbed: Towards Comprehensive Multimodal Retrieval

upvoted a paper about 2 months ago

EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

new activity about 2 months ago

qihoo360/RzenEmbed:Update README.md

View all activity

Organizations

upvoted a paper about 1 month ago

RzenEmbed: Towards Comprehensive Multimodal Retrieval

Paper • 2510.27350 • Published Oct 31 • 1

upvoted a paper about 2 months ago

EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

Paper • 2511.00956 • Published Nov 2 • 4

New activity in qihoo360/RzenEmbed about 2 months ago

Update README.md

#1 opened about 2 months ago by

commented a paper 2 months ago

FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

Paper • 2510.10921 • Published Oct 13 • 10 •

liked a model 2 months ago

qihoo360/fg-clip2-base

Zero-Shot Image Classification • 0.4B • Updated Nov 6 • 10.6k • 22

upvoted a collection 2 months ago

FG-CLIP 2

FG-CLIP 2 is the foundation model for fine-grained vision-language understanding in both English and Chinese. • 10 items • Updated Nov 6 • 5

upvoted a paper 2 months ago

FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model

Paper • 2510.10921 • Published Oct 13 • 10

liked a Space 3 months ago

MMEB Leaderboard

The massive multimodal embedding benchmark

authored a paper 5 months ago

Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection

Paper • 2502.16223 • Published Feb 22

liked a model 5 months ago

qihoo360/fg-clip-base

Zero-Shot Image Classification • 0.2B • Updated Oct 9 • 1.53k • 10

upvoted a collection 5 months ago

FG-CLIP

New generation of CLIP with strong fine grained discrimination capability • 6 items • Updated Oct 15 • 4

commented a paper 8 months ago

FG-CLIP: Fine-Grained Visual and Textual Alignment

Paper • 2505.05071 • Published May 8 • 18 •

upvoted a paper 8 months ago

FG-CLIP: Fine-Grained Visual and Textual Alignment

Paper • 2505.05071 • Published May 8 • 18

authored a paper 9 months ago

RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

Paper • 2502.14377 • Published Feb 20 • 12

authored 3 papers about 1 year ago

Bridge Diffusion Model: bridge non-English language-native text-to-image diffusion model with English communities

Paper • 2309.00952 • Published Sep 2, 2023

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

Paper • 2408.08189 • Published Aug 15, 2024 • 17

Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task

Paper • 2409.04005 • Published Sep 6, 2024 • 19

upvoted a paper over 1 year ago

Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task

Paper • 2409.04005 • Published Sep 6, 2024 • 19

replied to HugoLaurencon's post over 1 year ago

Would you share the total training cost info? as traing of IDEFICS2-8B used "approximately 1.5 billion images and 225 billion text tokens" which is quite huge for a 8B sized LMM model