File size: 3,254 Bytes

afcd5b5
b6abd78
 
313d098
0ce0ddb
 
 
313d098
afcd5b5
 
313d098
 
fcbdffe
 
313d098
fcbdffe
b6abd78
fcbdffe
b6abd78
fcbdffe
b6abd78
 
fcbdffe
b6abd78
fcbdffe
b6abd78
 
 
 
fcbdffe
b6abd78
fcbdffe
b6abd78
 
 
fcbdffe
 
 
 
 
 
 
 
 
 
 
 
 
 
b6abd78
fcbdffe
b6abd78
fcbdffe
b68d310
 
 
 
 
 
 
 
b6abd78

---
language:
- en
license: cc-by-4.0
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
pipeline_tag: feature-extraction
---

# ARC-Encoder models

 This page houses `ARC8-Encoder_Mistral` from three different versions of pretrained ARC-Encoders. Architectures and methods to train them are described in the paper *ARC-Encoder: learning compressed text representations for large language models* available [here](https://arxiv.org/abs/2510.20535).
Code: [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder)

 ## Models Details 

 All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
- `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.    
- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://www.mistralai.com/news/announcing-mistral-7b/) base specifically with a pooling factor of 8.    
- `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8. 

 ### Uses

 As described in the [paper](https://arxiv.org/abs/2510.20535), the pretrained ARC-Encoders can be fine-tuned to perform various downstream tasks.
You can also adapt an ARC-Encoder to a new pooling factor (PF) by fine-tuning it on the desired PF.
For optimal results, we recommend fine-tuning toward a lower PF than the one used during pretraining.
To reproduce the results presented in the paper, you can use our released fine-tuning dataset, [ARC_finetuning](https://huggingface.co/datasets/kyutai/ARC_finetuning).

 ### Licensing

 ARC-Encoders are licensed under the CC-BY 4.0 license.
 
Terms of use: As the released models are pretrained from Llama3.2 3B backbone, ARC-Encoders are subject to the Llama Terms of Use found at [Llama license](https://www.llama.com/license/).

 ## Usage

 To load the pre-trained ARC-Encoders, use the following code snippet from the [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder):

 ```python
from embed_llm.models.augmented_model import load_and_save_released_models

# ARC8_Encoder_multi, ARC8_Encoder_Llama or ARC8_Encoder_Mistral
load_and_save_released_models(ARC8_Encoder_Mistral, hf_token=<HF_TOKEN>)
```

 ***Remark:*** This code snippet loads the model from Hugging Face and then creates appropriate folders at `<TMP_PATH>` containing the checkpoint and additional necessary files for fine-tuning or evaluation with the `ARC-Encoder` codebase. To reduce occupied memory space, you can then delete the model from your Hugging Face cache.

 ## Citations

 If you use one of these models, please cite:

 ```bibtex
@misc{pilchen2025arcencoderlearningcompressedtext,
      title={ARC-Encoder: learning compressed text representations for large language models}, 
      author={Hippolyte Pilchen and Edouard Grave and Patrick Pérez},
      year={2025},
      eprint={2510.20535},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.20535}, 
}
```