Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
3
18
Jithin James
jithinjames
Follow
21world's profile picture
1 follower
ยท
2 following
https://jithinjk.github.io/
jithinrocs
jithinjk
AI & ML interests
None yet
Recent Activity
liked
a Space
about 1 month ago
agents-course/First_agent_template
reacted
to
merve
's
post
with ๐ฅ
6 months ago
If you have documents that do not only have text and you're doing retrieval or RAG (using OCR and LLMs), give it up and give ColPali and vision language models a try ๐ค Why? Documents consist of multiple modalities: layout, table, text, chart, images. Document processing pipelines often consist of multiple models and they're immensely brittle and slow. ๐ฅฒ How? ColPali is a ColBERT-like document retrieval model built on PaliGemma, it operates over image patches directly, and indexing takes far less time with more accuracy. You can use it for retrieval, and if you want to do retrieval augmented generation, find the closest document, and do not process it, give it directly to a VLM like Qwen2-VL (as image input) and give your text query. ๐ค This is much faster + you do not lose out on any information + much easier to maintain too! ๐ฅณ Multimodal RAG https://huggingface.co/collections/merve/multimodal-rag-66d97602e781122aae0a5139 ๐ฌ Document AI (made it way before, for folks who want structured input/output and can fine-tune a model) https://huggingface.co/collections/merve/awesome-document-ai-65ef1cdc2e97ef9cc85c898e ๐
View all activity
Organizations
jithinjames
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
upvoted
an
article
6 months ago
view article
Article
MTEB: Massive Text Embedding Benchmark
Oct 19, 2022
โข
67