Blending ML models, no game to feign.
I'm Notebookin', for the sake of the machine learning game.
Notebookin' with precision, a dev's savoir-faire,
Streamlining ML, 'cause I'm your Fresh Notebook & Tech extraordinaire!
Recently we uploaded on the hub our LATEST and most powerful version of SimpleMath SFT dataset. Today we are happy to present SimpleMath DPO Pairs, improving further mathematical capabilities on LLM's.
Our first results shows clear improvements on GSM8k, MATHQA, ARC, TQA, MMLU and BBH. Feel free to experiment and generate your own dataset, as we also provide the code to generate them synthetically.
2️⃣ Advancing Flamingo with InfiMM 🔥 Building upon the foundation of Flamingo, we introduce the InfiMM model series. InfiMM is a reproduction of Flamingo, enhanced with stronger Large Language Models (LLMs) such as LLaMA2-13B, Vicuna-13B, and Zephyr7B. We've meticulously filtered pre-training data and fine-tuned instructions, resulting in superior performance on recent benchmarks like MMMU, InfiMM-Eval, MM-Vet, and more. Explore the power of InfiMM on Huggingface: Infi-MM/infimm-zephyr
3️⃣ Exploring Multimodal Instruction Fine-tuning 🖼️ Visual Instruction Fine-tuning (IFT) is crucial for aligning MLLMs' output with user intentions. Our research identified challenges with models trained on the LLaVA-mix-665k dataset, particularly in multi-round dialog settings. To address this, we've created a new IFT dataset with high-quality, diverse instruction annotations and images sourced exclusively from the COCO dataset. Our experiments demonstrate that when fine-tuned with this dataset, MLLMs excel in open-ended evaluation benchmarks for both single-round and multi-round dialog settings. Dive into the details in our paper: COCO is "ALL'' You Need for Visual Instruction Fine-tuning (2401.08968)
Depth Anything is trained on 1.5M labeled images and 62M+ unlabeled images jointly, providing the most capable Monocular Depth Estimation (MDE) foundation models with the following features:
zero-shot relative depth estimation, better than MiDaS v3.1 (BEiTL-512)
zero-shot metric depth estimation, better than ZoeDepth
optimal in-domain fine-tuning and evaluation on NYUv2 and KITTI