AI & ML interests

Computer Vision

Recent Activity

Add project page link

#1 opened about 22 hours ago by
nielsr
merve 
posted an update 3 days ago
view post
Post
385
Dataset Viewer for PDFs just landed on Hugging Face 📖🤗 you can now preview all the PDFs easier than before!

on top of this, there's PdfFolder format to load the PDF datasets quicker 💨
> to use it, your dataset should follow a directory format like folder/train/doc1.pdf, folder/train/doc1.pdf
> if you want to include bounding boxes, labels etc. you can keep them in a metadata.csv file in the same folder 🤝

read document dataset docs https://huggingface.co/docs/datasets/main/en/document_dataset
check all the document datasets here https://huggingface.co/datasets?modality=modality:document&sort=trending 📖
merve 
posted an update 4 days ago
view post
Post
525
we've merged LightGlue keypoint matcher to Hugging Face transformers! it allows commercial use when paired with an open-source keypoint detector 🙏🏻

it works very well, try it yourself: ETH-CVG/LightGlue

here's an in-the-wild test with two images of the same place ⤵️
  • 1 reply
·
merve 
posted an update 5 days ago
view post
Post
4240
Release picks of the past week is here! Find more models, datasets, Spaces here merve/june-20-releases-68594824d1f4dfa61aee3433

🖼️ VLMs/OCR
> moonshotai/Kimi-VL-A3B-Thinking-2506 is a powerful reasoning vision LM, 3B active params, smarter with less tokens, supports long documents, videos 👏 (OS)
> nanonets/Nanonets-OCR-s is 3.75B params OCR model based on Qwen2.5VL-3B-Instruct (OS)

💬 LLMs
> moonshotai/Kimi-Dev-72B is a strong coding model based on Qwen2.5-72B (OS)
> Mistral released mistralai/Mistral-Small-3.2-24B-Instruct-2506, an update to their former model with better function calling & instruction following (OS)

🗣️ Audio
> Google released google/magenta-realtime, real time music generation & audio synthesis (cc-by-4)
> kyutai released new speech-to-text models that come in 1B & 2B ( kyutai/stt-1b-en_fr, stt-2b-en_fr) with 0.5s and 2.5s delay

3D
> Tencent released tencent/Hunyuan3D-2.1 an image-to-3D model (see below)
merve 
posted an update 6 days ago
merve 
posted an update 8 days ago