cryptgpt / README.md
diwank's picture
End of training
d294467 verified
metadata
base_model: diwank/cryptgpt
tags:
  - axolotl
  - generated_from_trainer
model-index:
  - name: cryptgpt
    results: []

Built with Axolotl

See axolotl config

axolotl version: 0.4.1

# See:
# - https://github.com/karpathy/nanoGPT/blob/master/config/train_gpt2.py#L1
# - https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/examples/tiny-llama/pretrain.yml#L14
# - https://github.com/karpathy/nanoGPT/blob/master/train.py#L35

base_model: diwank/cryptgpt
hub_model_id: diwank/cryptgpt

model_type: GPT2LMHeadModel
tokenizer_type: AutoTokenizer
trust_remote_code: true  # required for CryptGPTTokenizer
resize_token_embeddings_to_32x: true
output_dir: ./outputs/model-out

datasets:
  - path: diwank/encrypted-openwebtext
    type: completion

dataset_prepared_path: ./cryptgpt-prepared-dataset
val_set_size: 0.04
shuffle_merged_datasets: false

sequence_len: 1024
pad_to_sequence_len: true
sample_packing: false
pretrain_multipack_attn: false
train_on_inputs: true

gradient_accumulation_steps: 1
micro_batch_size: 64
optimizer: adamw_bnb_8bit
adam_beta1: 0.9
adam_beta2: 0.95
seed: 42

lr_scheduler: cosine
learning_rate: 6e-4
cosine_min_lr_ratio: 0.1  # min: 6e-5
weight_decay: 0.1

bf16: auto
tf32: true
flash_attention: true
torch_compile: true
gradient_checkpointing: false
deepspeed: deepspeed_configs/zero2.json

max_steps: 1200000
eval_steps: 12000
save_steps: 12000
auto_resume_from_checkpoints: true
logging_steps: 1
eval_max_new_tokens: 128
eval_causal_lm_metrics: 
  - sacrebleu

wandb_project: cryptgpt-0.1
wandb_name: cryptgpt-run-07

cryptgpt

This model is a fine-tuned version of diwank/cryptgpt on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2717

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 512
  • optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 40912

Training results

Training Loss Epoch Step Validation Loss
10.9453 0.0000 1 10.9383
3.0117 0.2933 12000 2.8623
2.5234 0.5866 24000 2.4040
2.3398 0.8799 36000 2.2717

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.1.2+cu118
  • Datasets 2.19.1
  • Tokenizers 0.19.1