Commit
·
0ce0ddb
1
Parent(s):
bd86e2b
Push model using huggingface_hub.
Browse files- README.md +6 -38
- config.json +0 -5
- model.safetensors +3 -0
README.md
CHANGED
|
@@ -1,41 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
-
|
| 5 |
---
|
| 6 |
|
| 7 |
-
#
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
## Models Details
|
| 12 |
-
|
| 13 |
-
All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
|
| 14 |
-
- `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
|
| 15 |
-
- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://github.com/mistralai/mistral-finetune?tab=readme-ov-file) base specifically with a pooling factor of 8.
|
| 16 |
-
- `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
|
| 17 |
-
|
| 18 |
-
### Uses
|
| 19 |
-
|
| 20 |
-
As described in the [paper](https://github.com/kyutai-labs/ARC-Encoder/blob/main/ARC_Encoder_preprint.pdf), the pretrained ARC-Encoders can be fine-tuned to perform various downstream tasks.
|
| 21 |
-
You can also adapt an ARC-Encoder to a new pooling factor (PF) by fine-tuning it on the desired PF.
|
| 22 |
-
For optimal results, we recommend fine-tuning toward a lower PF than the one used during pretraining.
|
| 23 |
-
To reproduce the results presented in the paper, you can use our released fine-tuning dataset, [ARC_finetuning](https://huggingface.co/datasets/kyutai/ARC_finetuning).
|
| 24 |
-
|
| 25 |
-
### Licensing
|
| 26 |
-
|
| 27 |
-
ARC-Encoders are licensed under the CC-BY 4.0 license.
|
| 28 |
-
|
| 29 |
-
Terms of use: As the released models are pretrained from Llama3.2 3B backbone, ARC-Encoders are subject to the Llama Terms of Use found at [Llama license](https://www.llama.com/license/).
|
| 30 |
-
|
| 31 |
-
## Citations
|
| 32 |
-
|
| 33 |
-
If you use one of these models, please cite:
|
| 34 |
-
|
| 35 |
-
```bibtex
|
| 36 |
-
@techreport{pilchen2025arc_encoder,
|
| 37 |
-
title={ARC-Encoder: learning compressed text representations for large language models},
|
| 38 |
-
author={Pilchen, Hippolyte and Grave, Edouard and P{\'e}rez, Patrick},
|
| 39 |
-
year={2025}
|
| 40 |
-
}
|
| 41 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
+
tags:
|
| 3 |
+
- model_hub_mixin
|
| 4 |
+
- pytorch_model_hub_mixin
|
| 5 |
---
|
| 6 |
|
| 7 |
+
This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
|
| 8 |
+
- Library: [More Information Needed]
|
| 9 |
+
- Docs: [More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
|
@@ -12,7 +12,6 @@
|
|
| 12 |
-8
|
| 13 |
],
|
| 14 |
"cont_tok": true,
|
| 15 |
-
"memory_tokens": 0,
|
| 16 |
"n_truncated_layers": 2,
|
| 17 |
"pooling_module": {
|
| 18 |
"pool_type": "mean_pooled_queries",
|
|
@@ -25,19 +24,15 @@
|
|
| 25 |
"empty_init": 1,
|
| 26 |
"llms": [],
|
| 27 |
"model_args": {
|
| 28 |
-
"_sliding_window": null,
|
| 29 |
"dim": 3072,
|
| 30 |
"head_dim": 128,
|
| 31 |
"hidden_dim": 8192,
|
| 32 |
"max_batch_size": 1,
|
| 33 |
-
"model_type": "transformer",
|
| 34 |
"n_heads": 24,
|
| 35 |
"n_kv_heads": 8,
|
| 36 |
"n_layers": 28,
|
| 37 |
-
"non_parametric_norm": false,
|
| 38 |
"norm_eps": "1e-05",
|
| 39 |
"rope_theta": 500000.0,
|
| 40 |
-
"sliding_window": null,
|
| 41 |
"vocab_size": 128256
|
| 42 |
}
|
| 43 |
}
|
|
|
|
| 12 |
-8
|
| 13 |
],
|
| 14 |
"cont_tok": true,
|
|
|
|
| 15 |
"n_truncated_layers": 2,
|
| 16 |
"pooling_module": {
|
| 17 |
"pool_type": "mean_pooled_queries",
|
|
|
|
| 24 |
"empty_init": 1,
|
| 25 |
"llms": [],
|
| 26 |
"model_args": {
|
|
|
|
| 27 |
"dim": 3072,
|
| 28 |
"head_dim": 128,
|
| 29 |
"hidden_dim": 8192,
|
| 30 |
"max_batch_size": 1,
|
|
|
|
| 31 |
"n_heads": 24,
|
| 32 |
"n_kv_heads": 8,
|
| 33 |
"n_layers": 28,
|
|
|
|
| 34 |
"norm_eps": "1e-05",
|
| 35 |
"rope_theta": 500000.0,
|
|
|
|
| 36 |
"vocab_size": 128256
|
| 37 |
}
|
| 38 |
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0982da4a9f4a6ee1e38f085502769443d91242c3f681968c64b9c32af92d506e
|
| 3 |
+
size 12104403784
|