Safetensors
llama
Pclanglais commited on
Commit
931e55c
·
verified ·
1 Parent(s): 9891392

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -6
README.md CHANGED
@@ -12,19 +12,34 @@ language:
12
  - nl
13
  - pl
14
  ---
15
- **Pleias-Pico-Preview** is an early preview of a 360 million parameter base model trained by Pleias on Common Corpus.
16
 
17
  Like all the base and specialized models from Pleias, Pleias-Nano-Preview has only been trained on open data out of copyright (public domain) or under a permissible license.
18
 
19
  ## Description
20
- Pleias-Pico-Preview is a transformer base model, entirely pretrained from scratch, using an architecture similar to Llama/GPT-Neox for easier deployment/inference.
21
 
22
- Pleias-Pico-Preview has demonstrated unusual abilities for multilingual generation in its size range. Fully supported languages include English, French, Spanish, German, Italian, Dutch, Latin and Portuguese.
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Training
25
- Pleias-Pico-Preview was trained at Jean Zay on 64 h100s for 46 hours. Training was done on a filtered and enhanced version of Common Corpus with XX tokens.
 
 
26
 
27
  ## Update
28
- Pleias-Pico-Preview is currently released as an early preview. The model will undergo several more round of post-training to enhance reasoning capacities and fine-tunability, and better prepare the model for a generalist instruct version.
29
 
30
- Pleias-Pico-Preview can be used for continuous pretraining and full-fine-tuning and specialized versions have been successfully trained for RAG retrieval, translation or OCR correction. Give the small size of the model we do not recommend fine-tuning methods based on LORA.
 
12
  - nl
13
  - pl
14
  ---
15
+ **Pleias-360m-Preview** is an early preview of a 360 million parameter base model trained by Pleias on Common Corpus.
16
 
17
  Like all the base and specialized models from Pleias, Pleias-Nano-Preview has only been trained on open data out of copyright (public domain) or under a permissible license.
18
 
19
  ## Description
20
+ Pleias-360m-Preview is a transformer base model, entirely pretrained from scratch, using an architecture similar to Llama/GPT-Neox for easier deployment/inference.
21
 
22
+ It includes the following features, that would apply to any responsibly trained variant:
23
+ * Only trained on open data under a permissible license and in compliance with the European AI Act. By design, all Pleias model are unable to output copyrighted content.
24
+ * Extensive multilingual support for main European languages.
25
+ * A new tokenizer designed for enhanced document processing tasks and better multilingual support.
26
+ * Extremely low level of toxicity and problematic content.
27
+
28
+ Pleias-360m-Preview has demonstrated unusual abilities for multilingual generation in its size range. Fully supported languages include English, French, Spanish, German, Italian, Dutch, Latin and Portuguese.
29
+
30
+ ## Recommended use
31
+ As a base model, Pleias-360m-Preview is only able to run continuation prompts.
32
+
33
+ Text generation is currently able to support a range of creative writing tasks in multiple European languages. For more consistent results we recommend using a low or null temperature with a slight repetition penalty (1.1-1.2).
34
+
35
+ Pleias-360m-Preview has been successfully adapted for continuous pretraining and full-fine-tuning on document processing tasks such as RAG, translation or OCR correction. Given the small size of the model we do not recommend fine-tuning methods based on LORA.
36
 
37
  ## Training
38
+ Pleias-360m-Preview was fully pretrained at Jean Zay on 64 h100s for 46 hours with Nanotron, the pretraining library from HuggingFace. We provide the complete settings as a yaml file as part of our release.
39
+
40
+ Training schedule includes 518,000 steps (batch size 1,024) on a filtered and enhanced version of Common Corpus (1,086,324,736,000 tokens).
41
 
42
  ## Update
43
+ Pleias-360m-Preview is currently released as an early preview.
44
 
45
+ The model will undergo several more round of post-training to enhance reasoning capacities and fine-tunability as well as in anticipation of a generalist instruct version.