Can somebody train models on my dataset? Enderchef/ICONN-1-BasicChat-Data-SuperLite
Enderchef/ICONN-1-BasicChat-Data-SuperLite
I guess I could but with only 190 rows the dataset is really short. I guess I could bumb the learning rate by a lot and overtrain it by doing like 6 epochs. What model do you want to train on it?
Llama 3 or 4 please! Thank you so much! I can increase the rows if you need.
I think I will make on meta-llama/Llama-3.1-8B-Instruct. I will likely so late evening as I'm currently training a different model. Regarding the rows it would likely make sense if you increase it. I think around 500 rows are the minimum that gives high quality results but I can try with your current dataset and see what happens if I just bump learning rate and epochs.
Okay. I updated the rows and made it 500 rows. Take as much time as you need to train the model.
It's now training using the following axolotl configurations. The training will take around 2 hours.
base_model: /dpool/Meta-Llama-3.1-8B-Instruct-abliterated
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
datasets:
- path: /apool/axolotl/0001.parquet
chat_template: llama3
type:
system_prompt: ""
field_system: system
field_instruction: input
field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out
adapter: lora
lora_model_dir:
sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.00001
bf16: auto
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
- full_shard
- auto_wrap
fsdp_config:
fsdp_limit_all_gathers: true
fsdp_sync_module_states: true
fsdp_offload_params: true
fsdp_use_orig_params: false
fsdp_cpu_ram_efficient_loading: true
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_sharding_strategy: FULL_SHARD
special_tokens:
pad_token: <|end_of_text|>
Your model is ready!
- SafeTensors: https://huggingface.co/nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-ICONN-1-BasicChat
- Lora: https://huggingface.co/nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-ICONN-1-BasicChat-Lora
I also queued it so we soon have GGUFs of it:
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Meta-Llama-3.1-8B-Instruct-abliterated-ICONN-1-BasicChat-GGUF for quants to appear.
Which of the two should I use for chatbot? Also, to run it locally which files should I download? Also, I saw the picklescan marked it unsafe.
Which of the two should I use for chatbot? Also, to run it locally which files should I download? Also, I saw the picklescan marked it unsafe.
I already answered you in https://huggingface.co/nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-ICONN-1-BasicChat-Lora/discussions/1 and https://huggingface.co/nicoboss/Meta-Llama-3.1-8B-Instruct-abliterated-ICONN-1-BasicChat-Lora/discussions/2
Just use the SafeTensors model or the soon to be created GGUF.