Commit
·
3bcd940
1
Parent(s):
1648537
Update readme and doc from the 80b repo
Browse files
README.md
CHANGED
|
@@ -305,11 +305,15 @@ Similarly to the base IDEFICS models, we performed checkpoint selection to stop
|
|
| 305 |
|
| 306 |
## Hardware
|
| 307 |
|
| 308 |
-
The IDEFICS models were trained on an AWS SageMaker cluster
|
|
|
|
|
|
|
|
|
|
|
|
|
| 309 |
|
| 310 |
## Software
|
| 311 |
|
| 312 |
-
The training software is built on top of HuggingFace Transformers + Accelerate, and DeepSpeed ZeRO-3 for training, and [WebDataset](https://github.com/webdataset/webdataset) for data loading.
|
| 313 |
|
| 314 |
|
| 315 |
# Bias, Risks, and Limitations
|
|
|
|
| 305 |
|
| 306 |
## Hardware
|
| 307 |
|
| 308 |
+
The IDEFICS models were trained on an AWS SageMaker cluster with 8x80GB A100 GPUs nodes and EFA network.
|
| 309 |
+
|
| 310 |
+
- IDEFICS-80B took ~28 days of training on 64 nodes (512 GPUs).
|
| 311 |
+
- IDEFICS-80b-instruct finetuned the base model for ~3 days on 48 nodes (384 GPUs).
|
| 312 |
+
|
| 313 |
|
| 314 |
## Software
|
| 315 |
|
| 316 |
+
The training software is built on top of HuggingFace Transformers + Accelerate, and [DeepSpeed ZeRO-3](https://github.com/microsoft/DeepSpeed) for training, and [WebDataset](https://github.com/webdataset/webdataset) for data loading.
|
| 317 |
|
| 318 |
|
| 319 |
# Bias, Risks, and Limitations
|