Token-Efficient Long Video Understanding for Multimodal LLMs Paper โข 2503.04130 โข Published Mar 6 โข 93
Token-Efficient Long Video Understanding for Multimodal LLMs Paper โข 2503.04130 โข Published Mar 6 โข 93
view post Post 2791 Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning (2411.18203)Critic-V has been accepted by CVPR2025!Bonus! VRI-160K uploaded now! di-zhang-fdu/R1-Vision-Reasoning-Instructions See translation ๐ฅ 4 4 + Reply
FB-BEV: BEV Representation from Forward-Backward View Transformations Paper โข 2308.02236 โข Published Aug 4, 2023
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers Paper โข 2109.03814 โข Published Sep 8, 2021
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications Paper โข 2401.06197 โข Published Jan 11, 2024 โข 1
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving Paper โข 2312.09245 โข Published Dec 14, 2023
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper โข 2403.09626 โข Published Mar 14, 2024 โข 15
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Paper โข 2211.05778 โข Published Nov 10, 2022
Driving with InternVL: Oustanding Champion in the Track on Driving with Language of the Autonomous Grand Challenge at CVPR 2024 Paper โข 2412.07247 โข Published Dec 10, 2024
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Paper โข 2501.14818 โข Published Jan 20 โข 4
GameFactory: Creating New Games with Generative Interactive Videos Paper โข 2501.08325 โข Published Jan 14 โข 66
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper โข 2501.04698 โข Published Jan 8 โข 15
view post Post 1613 News! ChemVLM Codes Opensource Now! https://github.com/AI4Chem/ChemVlm See translation 1 reply ยท ๐ค 4 4 + Reply
StyleMaster: Stylize Your Video with Artistic Generation and Translation Paper โข 2412.07744 โข Published Dec 10, 2024 โข 19
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper โข 2412.07760 โข Published Dec 10, 2024 โข 56
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation Paper โข 2412.07759 โข Published Dec 10, 2024 โข 18
view post Post 1845 ChemVLM has been accepted by AAAI2025! Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM (2408.07246)Try have a chat wiht him๐ค. AI4Chem/ChemVLM-26B-1-2 See translation ๐ 4 4 + Reply
view post Post 3083 The first version of LLaMA-O1 has been uploaded to HF now!Here We Come!Supervised: SimpleBerry/LLaMA-O1-Supervised-1129Base(Pretrain): SimpleBerry/LLaMA-O1-Base-1127Supervised Finetune Dataset: SimpleBerry/OpenLongCoT-SFTPretraining Dataset: SimpleBerry/OpenLongCoT-Pretrain-1202RLHF is on the way! View our GitHub Repo:https://github.com/SimpleBerry/LLaMA-O1Our ongoing related researches: Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394) LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884) Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning (2411.18203) @AdinaY @akhaliq @jwu323 ------GGUF:https://huggingface.co/Lyte/LLaMA-O1-Supervised-1129-Q4_K_M-GGUFonline Demo (CPU-only): SimpleBerry/LLaMA-O1-Supervised-1129-Demo See translation 3 replies ยท ๐ 13 13 ๐ค 3 3 ๐ฅ 1 1 + Reply