Phan Hoang's picture

Phan Hoang

phanhoang

·

AI & ML interests

None yet

Organizations

None yet

phanhoang's activity

upvoted a paper 6 months ago

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 64

upvoted an article 7 months ago

Article

ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models

By

•

Oct 18, 2024

• 20

upvoted an article 8 months ago

Article

Visually Multilingual: Introducing mcdse-2b

By

•

Oct 27, 2024

• 41

upvoted a collection 8 months ago

DocLayout-YOLO

Dataset and model for DocLayout-YOLO • 10 items • Updated Jan 14 • 17

upvoted a paper 8 months ago

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Paper • 2410.02757 • Published Oct 3, 2024 • 38

upvoted a collection 8 months ago

📑Trending Papers - September 9⃣️

10 items • Updated Mar 28 • 9

upvoted 3 collections 9 months ago

Emu3

Emu3: Next-Token Prediction is All You Need • 7 items • Updated Feb 13 • 73

Molmo

Artifacts for open multimodal language models. • 5 items • Updated Apr 30 • 305

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Apr 28 • 618

upvoted 2 papers 9 months ago

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 78

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 85

upvoted an article 9 months ago

Article

Making LLMs lighter with AutoGPTQ and transformers

By

and 5 others •

Aug 23, 2023

• 54

upvoted 2 collections 9 months ago

Awesome Document AI

A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11, 2024 • 80

Qwen2-VL

Vision-language model series based on Qwen2 • 16 items • Updated Apr 28 • 218

upvoted an article 9 months ago

Article

Fine-tune Llama 3 with ORPO

By

•

Apr 22, 2024

• 237

upvoted a paper 9 months ago

Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models

Paper • 2408.02442 • Published Aug 5, 2024 • 21

upvoted 2 collections 10 months ago

Function Calling Dataset

7 items • Updated Dec 5, 2023 • 8

Papers I want to read

Papers in my to-read list • 259 items • Updated Jan 10 • 31

upvoted 2 articles 10 months ago

Article

Tool Use, Unified

Aug 12, 2024

• 107

Article

Introducing TextImage Augmentation for Document Images

By

and 2 others •

Aug 6, 2024

• 33