OutOfMemoryError: CUDA out of memory.

#204

by RohitSuresh15 - opened Jul 12, 2024

Jul 12, 2024

I have two GPUs
[0] NVIDIA GeForce RTX 3090
[1] NVIDIA GeForce RTX 3090
but when i try to load the model
model_name = 'meta-llama/Meta-Llama-3-8B'
model = AutoModelForCausalLM.from_pretrained(model_name, token=access_token)
I get out of memory error

Maj081

Jul 12, 2024

Did you check the cuda availability and that the model is properly load into the gpu?

biu12

Nov 7, 2024

I meet the same problem.
System: two RTX 4090Ti

Maj081

Nov 7, 2024

•

edited Nov 7, 2024

I meet the same problem.
System: two RTX 4090Ti

Which model? The 8b?

biu12

Nov 7, 2024

llama3.2 3b and llama3 8b

Maj081

Nov 7, 2024

I meet the same problem.
System: two RTX 4090Ti

Which model? The 8b?

Try to use Nvidia-smi command. There u can see if the gpu ram is used properly

biu12

Nov 7, 2024

I have used this command.The informantion shows the result of out of the out of memory.I don't konw how to resolve it,but i konw the model should not need the memory size.

Maj081

Nov 7, 2024

Did u try different weights, maybe float8 or lower.

biu12

Nov 7, 2024

I'm trying to fine tune the model.Could you say the way more clearly?I'm new to large models, so I don't quite understand what you mean

Maj081

Nov 7, 2024

You have different options to test:
Reduce the Batch Size.
Reduce the precision such as half-precision (FP16), instead of single-precision (FP32). You can also get lower.

Both would reduce the memory load.

Maj081

Nov 7, 2024

Maybe u you can also use seq_len to reduce the amount to feed the model

biu12

Nov 8, 2024

What you mean is to modify these configuration parameters：？
--model_name_or_path
llama3_8b
--tokenizer_name_or_path
llama3_8b
--dataset_dir

--per_device_train_batch_size
1
--per_device_eval_batch_size
1
--do_train
1
--do_eval
1
--seed
42
--bf16
1
--num_train_epochs
3
--lr_scheduler_type
cosine
--learning_rate
1e-4
--warmup_ratio
0.05
--weight_decay
0.1
--logging_strategy
steps
--logging_steps
10
--save_strategy
steps
--save_total_limit
3
--evaluation_strategy
steps
--eval_steps
100
--save_steps
200
--gradient_accumulation_steps
8
--preprocessing_num_workers
8
--max_seq_length
1024
--output_dir

--overwrite_output_dir
1
--ddp_timeout
30000
--logging_first_step
True
--lora_rank
64
--lora_alpha
128
--trainable
"q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
--lora_dropout
0.05
--modules_to_save
"embed_tokens,lm_head"
--torch_dtype
bfloat16
--validation_file

--load_in_kbits
16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment