Can somebody train models on my dataset? Enderchef/ICONN-1-BasicChat-Data-SuperLite

#918
by Enderchef - opened

Enderchef/ICONN-1-BasicChat-Data-SuperLite

Enderchef changed discussion title from Can somebody train models on my dataset? to Can somebody train models on my dataset? Enderchef/ICONN-1-BasicChat-Data-SuperLite

I guess I could but with only 190 rows the dataset is really short. I guess I could bumb the learning rate by a lot and overtrain it by doing like 6 epochs. What model do you want to train on it?

Llama 3 or 4 please! Thank you so much! I can increase the rows if you need.

I think I will make on meta-llama/Llama-3.1-8B-Instruct. I will likely so late evening as I'm currently training a different model. Regarding the rows it would likely make sense if you increase it. I think around 500 rows are the minimum that gives high quality results but I can try with your current dataset and see what happens if I just bump learning rate and epochs.

Okay. I updated the rows and made it 500 rows. Take as much time as you need to train the model.

It's now training using the following axolotl configurations. The training will take around 2 hours.

base_model: /dpool/Meta-Llama-3.1-8B-Instruct-abliterated
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false

datasets:
  - path: /apool/axolotl/0001.parquet
    chat_template: llama3
    type:
      system_prompt: ""
      field_system: system
      field_instruction: input
      field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out

adapter: lora
lora_model_dir:

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.00001

bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_limit_all_gathers: true
  fsdp_sync_module_states: true
  fsdp_offload_params: true
  fsdp_use_orig_params: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sharding_strategy: FULL_SHARD
special_tokens:
  pad_token: <|end_of_text|>

Which of the two should I use for chatbot? Also, to run it locally which files should I download? Also, I saw the picklescan marked it unsafe.

Which of the two should I use for chatbot? Also, to run it locally which files should I download? Also, I saw the picklescan marked it unsafe.

I already answered you in https://huggingface.co/nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-ICONN-1-BasicChat-Lora/discussions/1 and https://huggingface.co/nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-ICONN-1-BasicChat-Lora/discussions/2

Just use the SafeTensors model or the soon to be created GGUF.

Sign up or log in to comment