arcee-train (Arcee Training Org)

bartowski

posted an update 14 days ago

Post

8774

Was going to post this on /r/LocalLLaMa, but apparently it's without moderation at this time :')

bartowski/mistralai_Mistral-Small-3.2-24B-Instruct-2506-GGUF

Was able to use previous mistral chat templates, some hints from Qwen templates, and Claude to piece together a seemingly working chat template, tested it with llama.cpp server and got perfect results, though lmstudio still seems to be struggling for some reason (don't know how to specify a jinja file there)

Outlined the details of the script and results in my llama.cpp PR to add the jinja template:

https://github.com/ggml-org/llama.cpp/pull/14349

Start server with a command like this:

./llama-server -m /models/mistralai_Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf --jinja --chat-template-file /models/Mistral-Small-3.2-24B-Instruct-2506.jinja

and it should be perfect! Hoping it'll work for ALL tools if lmstudio gets an update or something, not just llama.cpp, but very happy to see it works flawlessly in llama.cpp

In the meantime, will try to open a PR to minja to make the strftime work, but no promises :)

chargoddard

authored 4 papers 27 days ago

Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

Paper • 2406.14971 • Published Jun 21, 2024

Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation

Paper • 2410.08371 • Published Oct 10, 2024 • 2

INTELLECT-1 Technical Report

Paper • 2412.01152 • Published Dec 2, 2024 • 2

Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit

Paper • 2506.06607 • Published Jun 7 • 2

samsja

authored 4 papers about 1 month ago

INTELLECT-1 Technical Report

Paper • 2412.01152 • Published Dec 2, 2024 • 2

METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring

Paper • 2501.02045 • Published Jan 3 • 21

INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

Paper • 2505.07291 • Published May 12 • 13

TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference

Paper • 2501.16007 • Published Jan 27 • 1

bartowski

posted an update 3 months ago

Post

38552

Access requests enabled for latest GLM models

While a fix is being implemented (https://github.com/ggml-org/llama.cpp/pull/12957) I want to leave the models up for visibility and continued discussion, but want to prevent accidental downloads of known broken models (even though there are settings that could fix it at runtime for now)

With this goal, I've enabled access requests. I don't really want your data, so I'm sorry that I don't think there's a way around that? But that's what I'm gonna do for now, and I'll remove the gate when a fix is up and verified and I have a chance to re-convert and quantize!

Hope you don't mind in the mean time :D

1 reply

·

abhishek

posted an update 4 months ago

Post

3994

🚀 I'm thrilled to announce the launch of Arcee Conductor, a game-changing platform that's about to revolutionize the way you interact with AI models! 🤖 As the pioneers of small language models (SLMs), we've been working tirelessly to bring you the most exciting innovation in the AI space.
Here's a quick TL;DR of what Arcee Conductor is all about:

🌟 Choice and flexibility: Get access to multiple models, including our powerful SLMs and third-party LLMs, to choose the best one for your specific use case
🤖 Intelligent routing: Our platform evaluates which model is best-suited for each of your queries, ensuring you get the most accurate results
📈 Cost savings: Reduce your AI costs with our affordable SLMs, while still having access to leading LLMs when needed
🚀 Easy to get started: Sign up now and try Arcee Conductor today, with 400 million tokens (a $200 value) on us! 🎁
📊 Proven track record: Our SLMs have already racked up 222K+ downloads on Hugging Face, with customers seeing significant cost savings and improved accuracy

For a limited time, you can get $200 credits to use with Conductor for FREE. Check it out here: https://conductor.arcee.ai

3 replies

·

bartowski

posted an update 6 months ago

Post

73247

Switching to author_model-name

I posted a poll on twitter, and others have mentioned the interest in me using the convention of including the author name in the model path when I upload.

It has a couple advantages, first and foremost of course is ensuring clarity of who uploaded the original model (did Qwen upload Qwen2.6? Or did someone fine tune Qwen2.5 and named it 2.6 for fun?)

The second thing is that it avoids collisions, so if multiple people upload the same model and I try to quant them both, I would normally end up colliding and being unable to upload both

I'll be implementing the change next week, there are just two final details I'm unsure about:

First, should the files also inherit the author's name?

Second, what to do in the case that the author name + model name pushes us past the character limit?

Haven't yet decided how to handle either case, so feedback is welcome, but also just providing this as a "heads up"

5 replies

·

bartowski

posted an update 7 months ago

Post

80417

Looks like Q4_0_N_M file types are going away

Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)

You can see the reference PR here:

https://github.com/ggerganov/llama.cpp/pull/10446

So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)

As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !

Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541

Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights

17 replies

·

bartowski

posted an update 7 months ago

Post

16523

Old mixtral model quants may be broken!

Recently Slaren over on llama.cpp refactored the model loader - in a way that's super awesome and very powerful - but with it came breaking of support for "split tensor MoE models", which applies to older mixtral models

You may have seen my upload of one such older mixtral model, ondurbin/bagel-dpo-8x7b-v0.2, and with the newest changes it seems to be able to run without issue

If you happen to run into issues with any other old mixtral models, drop a link here and I'll try to remake them with the new changes so that we can continue enjoying them :)

2 replies

·

abhishek

posted an update 7 months ago

Post

2551

🎉 SUPER BLACK FRIDAY DEAL 🎉

Train almost any model on a variety of tasks such as llm finetuning, text classification/regression, summarization, question answering, image classification/regression, object detection, tabular data, etc for FREE using AutoTrain locally. 🔥
https://github.com/huggingface/autotrain-advanced

samsja

authored a paper 8 months ago

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

Paper • 2407.07852 • Published Jul 10, 2024

abhishek

posted an update 8 months ago

Post

6023

INTRODUCING Hugging Face AutoTrain Client 🔥
Fine-tuning models got even easier!!!!
Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks 🤗

To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.

"pip install autotrain-advanced"

Github repo: https://github.com/huggingface/autotrain-advanced

6 replies

·

abhishek

authored a paper 9 months ago

AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21, 2024 • 60

abhishek

posted an update 9 months ago

Post

4426

AutoTrain: No-code training for state-of-the-art models (2410.15735)

bartowski

posted an update 9 months ago

Post

23930

In regards to the latest mistral model and GGUFs for it:

Yes, they may be subpar and may require changes to llama.cpp to support the interleaved sliding window

Yes, I got excited when a conversion worked and released them ASAP

That said, generation seems to work right now and seems to mimic the output from spaces that are running the original model

I have appended -TEST to the model names in an attempt to indicate that they are not final or perfect, but if people still feel mislead and that it's not the right thing to do, please post (civilly) below your thoughts, I will highly consider pulling the conversions if that's what people think is best. After all, that's what I'm here for, in service to you all !

6 replies

·

Arcee Training Org

AI & ML interests

Recent Activity

Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation

INTELLECT-1 Technical Report

Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit

INTELLECT-1 Technical Report

METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring

INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

AutoTrain: No-code training for state-of-the-art models

AI & ML interests

Recent Activity

Team members 29

arcee-train's activity