kyutai
/

ARC8_Encoder_Mistral

Feature Extraction

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions

HippolyteP commited on Oct 15

Commit

bd86e2b

·

1 Parent(s): 702cad8

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -11,9 +11,9 @@ language:
  ## Models Details
  All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
-- `ARC8-Encoder_Llama`, trained on 6.5B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
-- `ARC8-Encoder_Mistral`, trained on 6.5B tokens on [Mistral-7B](https://github.com/mistralai/mistral-finetune?tab=readme-ov-file) base specifically with a pooling factor of 8.
-- `ARC8-Encoder_multi`, trained by sampling among the two decoders above using 6.5B tokens for each one with a pooling factor of 8.
  ### Uses

  ## Models Details
  All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
+- `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
+- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://github.com/mistralai/mistral-finetune?tab=readme-ov-file) base specifically with a pooling factor of 8.
+- `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
  ### Uses