Xavier Manuel Mountain

XaevrM

https://github.com/Perceptive-Focus-EN/Notebooks_Hugging_Face

AI & ML interests

Blending ML models, no game to feign. I'm Notebookin', for the sake of the machine learning game. Notebookin' with precision, a dev's savoir-faire, Streamlining ML, 'cause I'm your Fresh Notebook & Tech extraordinaire!

Recent Activity

updated a collection about 2 months ago

models

updated a collection about 2 months ago

models

updated a collection 5 months ago

models

View all activity

Organizations

updated a collection about 2 months ago

models

Collection

4 items • Updated May 24 • 1

updated a collection 5 months ago

models

Collection

4 items • Updated May 24 • 1

upvoted a collection 6 months ago

models

Collection

4 items • Updated May 24 • 1

liked a dataset over 1 year ago

Karzan/tts-dataset

Viewer • Updated Mar 7, 2024 • 53 • 13 • 1

replied to fblgit's post over 1 year ago

Could you perhaps make one with al the top machine learning algorithms with different use case flows?

This would be great to add as a function to call for the GenAIs since algos play a big role in how things are perceived and executed.

reacted to fblgit's post with ❤️ over 1 year ago

Post

Presenting: SimpleMath

Recently we uploaded on the hub our LATEST and most powerful version of SimpleMath SFT dataset.
Today we are happy to present SimpleMath DPO Pairs, improving further mathematical capabilities on LLM's.

Our first results shows clear improvements on GSM8k, MATHQA, ARC, TQA, MMLU and BBH. Feel free to experiment and generate your own dataset, as we also provide the code to generate them synthetically.

fblgit/simple-math
fblgit/simple-math-DPO
fblgit/UNA-34BeagleSimpleMath-32K-v1

2 replies

reacted to xiaotianhan's post with ❤️ over 1 year ago

Post

Thrilled to share some of our recent work in the field of Multimodal Large Language Models (MLLMs).

1️⃣ A Survey on Multimodal Reasoning 📚
Are you curious about the reasoning abilities of MLLMs? In our latest survey, we delve into the world of multimodal reasoning. We comprehensively review existing evaluation protocols, categorize the frontiers of MLLMs, explore recent trends in their applications for reasoning-intensive tasks, and discuss current practices and future directions. For an in-depth exploration, check out our paper: Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning (2401.06805)

2️⃣ Advancing Flamingo with InfiMM 🔥
Building upon the foundation of Flamingo, we introduce the InfiMM model series. InfiMM is a reproduction of Flamingo, enhanced with stronger Large Language Models (LLMs) such as LLaMA2-13B, Vicuna-13B, and Zephyr7B. We've meticulously filtered pre-training data and fine-tuned instructions, resulting in superior performance on recent benchmarks like MMMU, InfiMM-Eval, MM-Vet, and more. Explore the power of InfiMM on Huggingface: Infi-MM/infimm-zephyr

3️⃣ Exploring Multimodal Instruction Fine-tuning 🖼️
Visual Instruction Fine-tuning (IFT) is crucial for aligning MLLMs' output with user intentions. Our research identified challenges with models trained on the LLaVA-mix-665k dataset, particularly in multi-round dialog settings. To address this, we've created a new IFT dataset with high-quality, diverse instruction annotations and images sourced exclusively from the COCO dataset. Our experiments demonstrate that when fine-tuned with this dataset, MLLMs excel in open-ended evaluation benchmarks for both single-round and multi-round dialog settings. Dive into the details in our paper: COCO is "ALL'' You Need for Visual Instruction Fine-tuning (2401.08968)

Stay tuned for more exciting developments.
Special thanks to all our collaborators: @Ye27 @wwyssh @Yongfei @Yi-Qi638 @xudonglin @KhalilMrini @lllliuhhhhggg @Borise @Hongxia

liked 2 models over 1 year ago

facebook/mms-tts

Text-to-Speech • Updated Jul 25, 2023 • 167

ControlNet-1-1-preview/control_v11p_sd15_lineart

Updated Apr 14, 2023 • 14.1k • 29

updated 2 Spaces over 1 year ago

Livebook

📓

Livebook

📓

liked a Space over 1 year ago

945

ReplaceAnything

📚

Replace objects in images with new content

reacted to akhaliq's post with 👍 over 1 year ago

Post

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

paper page: Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data (2401.10891)
demo: LiheYoung/Depth-Anything

Depth Anything is trained on 1.5M labeled images and 62M+ unlabeled images jointly, providing the most capable Monocular Depth Estimation (MDE) foundation models with the following features:

zero-shot relative depth estimation, better than MiDaS v3.1 (BEiTL-512)

zero-shot metric depth estimation, better than ZoeDepth

optimal in-domain fine-tuning and evaluation on NYUv2 and KITTI