huggingPartyParis

community

https://partiful.com/e/oWOMGoPxB5D37qw5F8yN

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

anton-l authored a paper about 2 months ago

SmolVLM: Redefining small and efficient multimodal models

hamedrahimi authored a paper 3 months ago

Contextualized Topic Coherence Metrics

hamedrahimi authored a paper 3 months ago

Demographic User Modeling for Social Robotics with Multimodal Pre-trained Models

View all activity

HuggingPartyParis's activity

afmck

authored a paper 2 months ago

Command A: An Enterprise-Ready Large Language Model

Paper • 2504.00698 • Published Apr 1 • 26

MarcBotet

authored a paper 2 months ago

Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation

Paper • 2503.21780 • Published Mar 27 • 8

osanseviero

authored a paper 2 months ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 51

Xssama

authored a paper 8 months ago

Zero-shot Model-based Reinforcement Learning using Large Language Models

Paper • 2410.11711 • Published Oct 15, 2024 • 9

devendrachaplot

authored a paper 8 months ago

Pixtral 12B

Paper • 2410.07073 • Published Oct 9, 2024 • 66

Xssama

authored a paper 8 months ago

Large Language Models as Markov Chains

Paper • 2410.02724 • Published Oct 3, 2024 • 34

ragavsachdeva

authored a paper 9 months ago

The Manga Whisperer: Automatically Generating Transcriptions for Comics

Paper • 2401.10224 • Published Jan 18, 2024 • 2

ananthu-aniraj

authored a paper 9 months ago

PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers

Paper • 2407.04538 • Published Jul 5, 2024 • 2

HugoLaurencon

authored a paper 9 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 131

Sharvareev

authored a paper 10 months ago

DC3DO: Diffusion Classifier for 3D Objects

Paper • 2408.06693 • Published Aug 13, 2024 • 11

ragavsachdeva

authored a paper 10 months ago

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

Paper • 2408.00298 • Published Aug 1, 2024 • 11

HugoLaurencon

authored a paper about 1 year ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 104

HugoLaurencon

posted an update about 1 year ago

Post

3055

We release Idefics2-chatty, the chatbot-optimized version of Idefics2: HuggingFaceM4/idefics2-8b-chatty

Idefics2-chatty is better at following instructions and following Chain-of-Thoughts reasoning.

Moreover, we also release a paper, containing a lot of findings on how to build an efficient and performant Vision-Language Model: What matters when building vision-language models? (2405.02246)

How are you going to use the model, or what data are you going to fine-tune it on?

5 replies

faaabian

authored a paper about 1 year ago

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30, 2024 • 78

HugoLaurencon

posted an update about 1 year ago

Post

2562

Idefics2 is trained mostly on OBELICS, our open interleaved image-text document dataset.

Training on interleaved data is crucial to reaching high performance on VQA tasks, taking an arbitrary number of images as input, and doing in-context learning.

Dataset: HuggingFaceM4/OBELICS
Nomic visualization: https://atlas.nomic.ai/map/f2fba2aa-3647-4f49-a0f3-9347daeee499/ee4a84bd-f125-4bcc-a683-1b4e231cb10f
Link to OBELICS thread: https://twitter.com/HugoLaurencon/status/1694005892839006301

HugoLaurencon

posted an update about 1 year ago

Post

3007

The Cauldron is a massive collection of 50 high-quality datasets, all converted to the user/assistant format, and ready to use to fine-tune any Vision Language Model.

The Cauldron covers a wide range of tasks, including general visual question answering, counting, captioning, text transcription, document understanding, chart/figure understanding, table understanding, visual reasoning, geometry, spotting differences between 2 images or converting a screenshot to a code.

HuggingFaceM4/the_cauldron

helboukkouri

authored a paper about 1 year ago

CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters

Paper • 2010.10392 • Published Oct 20, 2020 • 1

osanseviero

posted an update about 1 year ago

Post

13769

Diaries of Open Source. Part 15 🤗

🕵️‍♀️Idefics 2 is out, a multimodal open-source model with very nice capabilities
Models, demo, and datasets: HuggingFaceM4/idefics2-661d1971b7c50831dd3ce0fe
Blog: https://hf.co/blog/idefics2

💾Snowflake released snowflake-arctic-embed, a family of powerful small embedding models
Model: Snowflake/snowflake-arctic-embed-m
Blog: https://www.snowflake.com/blog/introducing-snowflake-arctic-embed-snowflakes-state-of-the-art-text-embedding-family-of-models/

✨Pile-T5, EleutherAI's T5 model trained on 2T tokens
Blog: https://blog.eleuther.ai/pile-t5/
Models: EleutherAI/pile-t5-65a76a0d0022dd270b385a66
GitHub: https://github.com/EleutherAI/improved-t5

🤖CodeQwen1.5-7B base and chat models. Models trained on 3T tokens strong benchmark results for code generation, editing and SQL
Blog post: https://qwenlm.github.io/blog/codeqwen1.5/
Demo: Qwen/CodeQwen1.5-7b-Chat-demo
Models: Qwen/CodeQwen1.5-7B and Qwen/CodeQwen1.5-7B-Chat

Misc
🦉 DocOwl1.5: Unified Stucture Learning for OCR-free Document Understanding mPLUG/DocOwl
👀Cerule - a tiny Vision LM model Tensoic/Cerule-v0.1
ChemLLM - a LLM for chemistry and molecule science ⚗️https://hf.co/AI4Chem/ChemLLM-7B-Chat-1.5-DPO
Distil Whisper Large
📝New pdf/OCR datasets with 19 samples pixparse/pdf-document-ocr-datasets-660701430b0346f97c4bc628
🔥Gretel AI high quality text-to-sql synthetic dataset gretelai/synthetic_text_to_sql

5 replies

HugoLaurencon

posted an update about 1 year ago

Post

3111

We release Idefics2-8B, a foundation vision language model with SOTA results for its size on many benchmarks.

For Idefics2, we adopted a simple architecture:
-Images are fed to a vision encoder, then to a modality projection to match the input dimension of the LLM, and finally to a perceiver resampler for efficient pooling.
-Interleaved image-text data are then passed to the LLM.

During the pre-training:
-The modality projection and perceiver resampler weights are newly initialized.
-We start with pre-trained models for the vision encoder and the LLM, and continue the training with LoRA.
-In total, we see 1.5T images!

We pre-train on 3 types of data, all publicly available:
-Interleaved image-text documents: our dataset OBELICS HuggingFaceM4/OBELICS
-Image caption pairs: only synthetic captions!
-PDF documents: IDL and PDFA

We kept the aspect ratio of the images with the Patch n' Pack strategy, with a resolution of up to 980x980.
At inference, it's also more efficient for lower-resolution images.

For the SFT, we build The Cauldron, a collection of 50 high-quality datasets in the user/assistant format.
It is a ready-to-use dataset for the fine-tuning of any VLM.
HuggingFaceM4/the_cauldron

Most current models, like LLaVA-NeXT, encode images with an excessive number of tokens, like 2880.
Instead, we put a focus on being efficient at inference by training on a mix of images encoded with 64 tokens, and 320 tokens.
The result is that we perform favorably compared to the best models in our size class, while being efficient at inference.

osanseviero

posted an update about 1 year ago

Post

11303

Diaries of Open Source. Part 14 🤗

🔥CohereForAI releases Command R+, an open 104B model with:
- Tool usage capabilities
- Specialized in RAGs
- Multilingual
It's one of the first models to surpass GPT-4 in the lmsys arena, check it out!
Model: https://hf.co/CohereForAI/c4ai-command-r-plus
Official demo: https://hf.co/spaces/CohereForAI/c4ai-command-r-plus
Quantized: https://hf.co/CohereForAI/c4ai-command-r-plus-4bit

🎉Google releases a new version of their Gemma instruct models, with improved quality, nicer to converse, and a fancier RL algorithm. The model is similar to Llama 2 70B in the Chat Arena!
Models: google/gemma-release-65d5efbccdbb8c4202ec078b
Try it out in HuggingChat https://hf.co/chat/models/google/gemma-1.1-7b-it

🪄VoiceCraft, a speech editing and TTS SOTA open model
Paper: VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (2403.16973)
Model: pyp1/VoiceCraft

💻Google released CodeGemma, a family of code generation, completion, and chat models
Blog post: https://hf.co/blog/codegemma
Models: google/codegemma-release-66152ac7b683e2667abdee11
Report: https://storage.googleapis.com/deepmind-media/gemma/codegemma_report.pdf

Misc models:
🦖T-Rex2, a very powerful object detection model for many applications https://github.com/IDEA-Research/T-Rex
👀 CT-RATE : A 3D dataset paired with text reports ibrahimhamamci/CT-RATE
🐙Octopus v2: a Gemma-based model trained for Android API - extremely fast, better than Llama+RAG, great results NexaAIDev/Octopus-v2

2 replies

AI & ML interests

Recent Activity

Team members 962

HuggingPartyParis's activity