jiakai's picture

jiakai

real-jiakai

AI & ML interests

LLM && Smart QA

Recent Activity

liked a model about 8 hours ago
ASLP-lab/DiffRhythm-base
liked a model about 8 hours ago
alibaba-pai/Wan2.1-Fun-1.3B-Control
reacted to tonywu71's post with πŸ”₯ about 12 hours ago
ColPali: A new approach to efficient and intelligent document retrieval πŸš€ Our latest research paper, "ColPali: Efficient Document Retrieval with Vision Language Models," introduces a groundbreaking approach to large-scale visual document analysis. By leveraging Vision Language Models (VLMs), we have created a new framework for document retrieval that's both powerful and efficient. Key Insights: πŸ’‘ ColPali combines ColBERT's multi-vector strategy with VLMs' document understanding capabilities βš™οΈ ColPali is based on PaliGemma-3B (SigLIP, Gemma-2B) + a linear projection layer and is trained to maximize the similarity between the document and the query embeddings πŸ“Š The Vision Document Retrieval benchmark (ViDoRe) is a challenging dataset that spans various industry topics and aims at matching real-life retrieval scenarios πŸ† ColPali outperforms existing models on all datasets in ViDoRe (average NDCG@5 of 81.3% vs 67.0% for the best baseline model) ⚑ ColPali is faster at document embedding compared to traditional PDF parser pipelines, making ColPali viable for industrial use πŸ” ColPali is highly interpretable thanks to patch-based similarity maps Dive deeper into ColPali and explore our resources: πŸ“‘ Full paper: arxiv.org/abs/2407.01449 πŸ› οΈ Datasets, model weights, evaluation code, leaderboard, demos: huggingface.co/vidore Shoutout to my amazing co-authors Manuel Faysse (@manu) and Hugues Sibille (@HugSib). We are grateful for the invaluable feedback from Bilel Omrani, Gautier Viaud, Celine Hudelot, and Pierre Colombo. This work is sponsored by ILLUIN Technology. ✨
View all activity

Organizations

Guangxi Minzu University's profile picture

real-jiakai's activity

upvoted an article about 12 hours ago
view article
Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

β€’ 244
upvoted an article 5 days ago
view article
Article

The New and Fresh analytics in Inference Endpoints

β€’ 17
upvoted 2 articles 7 days ago
view article
Article

Open R1: How to use OlympicCoder locally for coding?

β€’ 50
view article
Article

AI Policy: πŸ€— Response to the White House AI Action Plan RFI

β€’ 21
upvoted an article 7 days ago
view article
Article

NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets

β€’ 29