Frank Sommers's picture

Frank Sommers PRO

fsommers

·

fsommers

AI & ML interests

None yet

Recent Activity

upvoted a collection 1 day ago

upvoted a paper 9 days ago

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

upvoted a paper 15 days ago

Many-Shot In-Context Learning in Multimodal Foundation Models

View all activity

Organizations

fsommers's activity

upvoted a collection 1 day ago

D-FINE

State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated 2 days ago • 45

upvoted a paper 9 days ago

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

Paper • 2504.18415 • Published 12 days ago • 41

upvoted a paper 15 days ago

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published May 16, 2024 • 33

upvoted a paper 28 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published about 1 month ago • 180

upvoted an article about 1 month ago

Article

Tool Use, Unified

Aug 12, 2024

• 102

upvoted 2 papers about 1 month ago

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published Mar 18 • 19

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Paper • 2312.02949 • Published Dec 5, 2023 • 15

upvoted 2 papers about 2 months ago

TULIP: Towards Unified Language-Image Pretraining

Paper • 2503.15485 • Published Mar 19 • 48

Aligning Multimodal LLM with Human Preference: A Survey

Paper • 2503.14504 • Published Mar 18 • 23

upvoted a collection about 2 months ago

Gemma 3 Release

24 items • Updated 20 days ago • 356

upvoted 2 papers 2 months ago

NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering

Paper • 2502.10868 • Published Feb 15 • 2

ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

Paper • 2502.18017 • Published Feb 25 • 20

upvoted 2 articles 2 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

Feb 20

• 243

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

• 161

upvoted a paper 2 months ago

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 132

upvoted 2 papers 3 months ago

Scalable Vision Language Model Training via High Quality Data Curation

Paper • 2501.05952 • Published Jan 10 • 2

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 186

upvoted 2 collections 3 months ago

ColQwen2 Models

Pre-trained checkpoints for the ColQwen2 model. • 4 items • Updated Jan 23 • 4

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated 9 days ago • 463

upvoted an article 3 months ago

Article

We now support VLMs in smolagents!

Jan 24

• 101