More like sir cartier cash carti yung carti king vamp carti baby boi guapo
alkinun
AtAndDev
AI & ML interests
LLMs, Alignment, Merging, Unsloth, DPO, SFT, ORPO, SPIN..
Recent Activity
liked
a model
29 minutes ago
NousResearch/Genstruct-7B
Organizations
AtAndDev's activity

reacted to
merve's
post with π₯
3 days ago
Post
1100
New GUI model by Salesforce AI & Uni HK: Jedi
tianbaoxiexxx/Jedi xlangai/Jedi-7B-1080p π€
Based on Qwen2.5-VL with Apache 2.0 license
prompt with below screenshot β select "find more"
tianbaoxiexxx/Jedi xlangai/Jedi-7B-1080p π€
Based on Qwen2.5-VL with Apache 2.0 license
prompt with below screenshot β select "find more"
L playlist sorry

reacted to
attackerElvies's
post with π€
3 days ago
Post
1696
HALOOO MY COMMUNITY

reacted to
mlabonne's
post with ππβ€οΈπ₯π
9 days ago
Post
14187
βοΈ AutoAbliteration
I made a Colab notebook to automatically abliterate models.
It's quite general, so you can do interesting stuff like blocking a given language in the model outputs.
π» Colab: https://colab.research.google.com/drive/1RmLv-pCMBBsQGXQIM8yF-OdCNyoylUR1?usp=sharing
I made a Colab notebook to automatically abliterate models.
It's quite general, so you can do interesting stuff like blocking a given language in the model outputs.
π» Colab: https://colab.research.google.com/drive/1RmLv-pCMBBsQGXQIM8yF-OdCNyoylUR1?usp=sharing

reacted to
codelion's
post with π₯
9 days ago
Post
2318
Introducing AutoThink: Adaptive reasoning for LLMs that improves performance by 43% on reasoning benchmarks!
Instead of using fixed thinking budgets, AutoThink:
- Classifies query complexity (HIGH/LOW) using adaptive classification
- Dynamically allocates thinking tokens based on complexity
- Uses steering vectors derived from Pivotal Token Search to guide reasoning patterns
Results on DeepSeek-R1-Distill-Qwen-1.5B:
- GPQA-Diamond: 31.06% vs 21.72% baseline (+9.34 points)
- MMLU-Pro: 26.38% vs 25.58% baseline (+0.8 points)
- Uses fewer tokens than baseline approaches
Works with any local reasoning model - DeepSeek, Qwen, Llama, custom models. The technique combines our research on Pivotal Token Search (PTS) implementation and adaptive classification frameworks.
Paper: AutoThink: efficient inference for reasoning LLMs
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5253327
Code and examples:
https://github.com/codelion/optillm/tree/main/optillm/autothink
PTS implementation and technical details:
https://github.com/codelion/pts
https://huggingface.co/blog/codelion/pts
Adaptive classifier framework:
https://github.com/codelion/adaptive-classifier
Would love to hear your thoughts on adaptive resource allocation for LLM reasoning! Have you experimented with similar approaches?
Instead of using fixed thinking budgets, AutoThink:
- Classifies query complexity (HIGH/LOW) using adaptive classification
- Dynamically allocates thinking tokens based on complexity
- Uses steering vectors derived from Pivotal Token Search to guide reasoning patterns
Results on DeepSeek-R1-Distill-Qwen-1.5B:
- GPQA-Diamond: 31.06% vs 21.72% baseline (+9.34 points)
- MMLU-Pro: 26.38% vs 25.58% baseline (+0.8 points)
- Uses fewer tokens than baseline approaches
Works with any local reasoning model - DeepSeek, Qwen, Llama, custom models. The technique combines our research on Pivotal Token Search (PTS) implementation and adaptive classification frameworks.
Paper: AutoThink: efficient inference for reasoning LLMs
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5253327
Code and examples:
https://github.com/codelion/optillm/tree/main/optillm/autothink
PTS implementation and technical details:
https://github.com/codelion/pts
https://huggingface.co/blog/codelion/pts
Adaptive classifier framework:
https://github.com/codelion/adaptive-classifier
Would love to hear your thoughts on adaptive resource allocation for LLM reasoning! Have you experimented with similar approaches?

reacted to
merve's
post with π
10 days ago
Post
2537
emerging trend: models that can understand image + text and generate image + text
don't miss out ‡οΈ
> MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA
> BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL
both by ByteDance! π±
I keep track of all any input β any output models here https://huggingface.co/collections/merve/any-to-any-models-6822042ee8eb7fb5e38f9b62
don't miss out ‡οΈ
> MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA
> BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL
both by ByteDance! π±
I keep track of all any input β any output models here https://huggingface.co/collections/merve/any-to-any-models-6822042ee8eb7fb5e38f9b62

reacted to
ProCreations's
post with π
10 days ago

reacted to
m-ric's
post with π₯
10 days ago
Post
2560
A new research paper from KAIST builds on smolagents to push boundaries of distillation π₯³
β‘οΈ "Distilling LLM Agent into Small Models with Retrieval and Code Tools" teaches that, when trying to distil reasoning capability from a strong LLM ("teacher") into a smaller one ("student"), it's much better to use Agent traces than CoT traces.
Advantages are:
1. Improved generalization
Intuitively, this is because your agent can encounter more "surprising" results by interacting with its environment : for example, a web research called by the LLM teacher in agent mode can bring results that the LLM teacher would not have generated in CoT.
2. Reduce hallucinations
The trace won't hallucinate tool call outputs!
Thank you @akseljoonas for mentioning this paper!
β‘οΈ "Distilling LLM Agent into Small Models with Retrieval and Code Tools" teaches that, when trying to distil reasoning capability from a strong LLM ("teacher") into a smaller one ("student"), it's much better to use Agent traces than CoT traces.
Advantages are:
1. Improved generalization
Intuitively, this is because your agent can encounter more "surprising" results by interacting with its environment : for example, a web research called by the LLM teacher in agent mode can bring results that the LLM teacher would not have generated in CoT.
2. Reduce hallucinations
The trace won't hallucinate tool call outputs!
Thank you @akseljoonas for mentioning this paper!

reacted to
AdinaY's
post with πβ€οΈπ
11 days ago
Post
2824
Orsta π₯ vision language models trained with V-Triune, a unified reinforcement learning system by MiniMax AI
One-RL-to-See-Them-All/one-rl-to-see-them-all-6833d27abce23898b2f9815a
β¨ 7B & 32B with MIT license
β¨ Masters 8 visual tasks: math, science QA, charts, puzzles, object detection, grounding, OCR, and counting
β¨ Uses Dynamic IoU rewards for better visual understanding
β¨Strong performance in visual reasoning and perception
One-RL-to-See-Them-All/one-rl-to-see-them-all-6833d27abce23898b2f9815a
β¨ 7B & 32B with MIT license
β¨ Masters 8 visual tasks: math, science QA, charts, puzzles, object detection, grounding, OCR, and counting
β¨ Uses Dynamic IoU rewards for better visual understanding
β¨Strong performance in visual reasoning and perception

reacted to
clem's
post with π€
11 days ago
Post
3219
It's just become easier to share your apps on the biggest AI app store (aka HF spaces) for unlimited storage, more visibility and community interactions.
Just pick a React, Svelte, or Vue template when you create your space or add
Or follow this link: https://huggingface.co/new-space?sdk=static
Let's build!
Just pick a React, Svelte, or Vue template when you create your space or add
app_build_command: npm run build
in your README's YAML and app_file: build/index.html
in your README's YAML block.Or follow this link: https://huggingface.co/new-space?sdk=static
Let's build!

reacted to
Kseniase's
post with ππ
11 days ago
Post
4250
12 Types of JEPA
JEPA, or Joint Embedding Predictive Architecture, is an approach to building AI models introduced by Yann LeCun. It differs from transformers by predicting the representation of a missing or future part of the input, rather than the next token or pixel. This encourages conceptual understanding, not just low-level pattern matching. So JEPA allows teaching AI to reason abstractly.
Here are 12 types of JEPA you should know about:
1. I-JEPA -> Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (2301.08243)
A non-generative, self-supervised learning framework designed for processing images. It works by masking parts of the images and then trying to predict those masked parts
2. MC-JEPA -> MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features (2307.12698)
Simultaneously interprets video data - dynamic elements (motion) and static details (content) - using a shared encoder
3. V-JEPA -> Revisiting Feature Prediction for Learning Visual Representations from Video (2404.08471)
Presents vision models trained by predicting future video features, without pretrained image encoders, text, negative sampling, or reconstruction
4. UI-JEPA -> UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity (2409.04081)
Masks unlabeled UI sequences to learn abstract embeddings, then adds a fine-tuned LLM decoder for intent prediction.
5. Audio-based JEPA (A-JEPA) -> A-JEPA: Joint-Embedding Predictive Architecture Can Listen (2311.15830)
Masks spectrogram patches with a curriculum, encodes them, and predicts hidden representations.
6. S-JEPA -> S-JEPA: towards seamless cross-dataset transfer through dynamic spatial attention (2403.11772)
Signal-JEPA is used in EEG analysis. It adds a spatial block-masking scheme and three lightweight downstream classifiers
7. TI-JEPA -> TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems (2503.06380)
Text-Image JEPA uses self-supervised, energy-based pre-training to map text and images into a shared embedding space, improving cross-modal transfer to downstream tasks
Find more types below π
Also, explore the basics of JEPA in our article: https://www.turingpost.com/p/jepa
If you liked it, subscribe to the Turing Post: https://www.turingpost.com/subscribe
JEPA, or Joint Embedding Predictive Architecture, is an approach to building AI models introduced by Yann LeCun. It differs from transformers by predicting the representation of a missing or future part of the input, rather than the next token or pixel. This encourages conceptual understanding, not just low-level pattern matching. So JEPA allows teaching AI to reason abstractly.
Here are 12 types of JEPA you should know about:
1. I-JEPA -> Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (2301.08243)
A non-generative, self-supervised learning framework designed for processing images. It works by masking parts of the images and then trying to predict those masked parts
2. MC-JEPA -> MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features (2307.12698)
Simultaneously interprets video data - dynamic elements (motion) and static details (content) - using a shared encoder
3. V-JEPA -> Revisiting Feature Prediction for Learning Visual Representations from Video (2404.08471)
Presents vision models trained by predicting future video features, without pretrained image encoders, text, negative sampling, or reconstruction
4. UI-JEPA -> UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity (2409.04081)
Masks unlabeled UI sequences to learn abstract embeddings, then adds a fine-tuned LLM decoder for intent prediction.
5. Audio-based JEPA (A-JEPA) -> A-JEPA: Joint-Embedding Predictive Architecture Can Listen (2311.15830)
Masks spectrogram patches with a curriculum, encodes them, and predicts hidden representations.
6. S-JEPA -> S-JEPA: towards seamless cross-dataset transfer through dynamic spatial attention (2403.11772)
Signal-JEPA is used in EEG analysis. It adds a spatial block-masking scheme and three lightweight downstream classifiers
7. TI-JEPA -> TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems (2503.06380)
Text-Image JEPA uses self-supervised, energy-based pre-training to map text and images into a shared embedding space, improving cross-modal transfer to downstream tasks
Find more types below π
Also, explore the basics of JEPA in our article: https://www.turingpost.com/p/jepa
If you liked it, subscribe to the Turing Post: https://www.turingpost.com/subscribe