See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_resume_from_checkpoints: false
base_model: fxmarty/tiny-random-GemmaForCausalLM
bf16: auto
chat_template: llama3
dataset_prepared_path: null
dataset_processes: 6
datasets:
- data_files:
  - 8cfdb1f2cec27bcb_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/8cfdb1f2cec27bcb_train_data.json
  type:
    field_input: mzs
    field_instruction: formula
    field_output: smiles
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 3
eval_max_new_tokens: 128
eval_steps: 200
eval_table_size: null
evals_per_epoch: null
flash_attention: true
fp16: false
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 2
gradient_checkpointing: true
group_by_length: false
hub_model_id: error577/f59415d5-3f46-42ac-8615-c9ed0b877c86
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 64
lora_dropout: 0.1
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 32
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: null
micro_batch_size: 5
mlflow_experiment_name: /tmp/8cfdb1f2cec27bcb_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 3
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 200
sequence_len: 256
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.005
wandb_entity: null
wandb_mode: online
wandb_name: 466b54c0-5c4c-4a4c-8e95-f119cf998ff0
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 466b54c0-5c4c-4a4c-8e95-f119cf998ff0
warmup_steps: 30
weight_decay: 0.0
xformers_attention: null

f59415d5-3f46-42ac-8615-c9ed0b877c86

This model is a fine-tuned version of fxmarty/tiny-random-GemmaForCausalLM on the None dataset. It achieves the following results on the evaluation set:

Loss: 12.1446

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 5
eval_batch_size: 5
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 10
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 30
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
12.4711	0.0000	1	12.4787
12.2458	0.0087	200	12.2547
12.2186	0.0175	400	12.2168
12.2108	0.0262	600	12.2062
12.2047	0.0349	800	12.2000
12.2109	0.0436	1000	12.1962
12.2102	0.0524	1200	12.1915
12.205	0.0611	1400	12.1876
12.2112	0.0698	1600	12.1857
12.2105	0.0785	1800	12.1828
12.201	0.0873	2000	12.1808
12.1987	0.0960	2200	12.1781
12.1744	0.1047	2400	12.1741
12.1621	0.1135	2600	12.1698
12.1653	0.1222	2800	12.1665
12.1946	0.1309	3000	12.1653
12.1671	0.1396	3200	12.1644
12.1668	0.1484	3400	12.1638
12.1695	0.1571	3600	12.1632
12.1686	0.1658	3800	12.1626
12.1645	0.1746	4000	12.1626
12.1577	0.1833	4200	12.1610
12.16	0.1920	4400	12.1604
12.1869	0.2007	4600	12.1599
12.1604	0.2095	4800	12.1592
12.1863	0.2182	5000	12.1584
12.1862	0.2269	5200	12.1578
12.1725	0.2356	5400	12.1575
12.175	0.2444	5600	12.1571
12.176	0.2531	5800	12.1568
12.1537	0.2618	6000	12.1565
12.1648	0.2706	6200	12.1565
12.1619	0.2793	6400	12.1558
12.1525	0.2880	6600	12.1559
12.1509	0.2967	6800	12.1556
12.1541	0.3055	7000	12.1554
12.1641	0.3142	7200	12.1550
12.1652	0.3229	7400	12.1546
12.151	0.3317	7600	12.1545
12.1712	0.3404	7800	12.1547
12.1711	0.3491	8000	12.1544
12.1633	0.3578	8200	12.1543
12.1359	0.3666	8400	12.1541
12.1583	0.3753	8600	12.1532
12.1671	0.3840	8800	12.1532
12.151	0.3927	9000	12.1528
12.18	0.4015	9200	12.1523
12.165	0.4102	9400	12.1521
12.1582	0.4189	9600	12.1520
12.1574	0.4277	9800	12.1520
12.1565	0.4364	10000	12.1517
12.1704	0.4451	10200	12.1513
12.1493	0.4538	10400	12.1509
12.145	0.4626	10600	12.1503
12.1691	0.4713	10800	12.1500
12.1707	0.4800	11000	12.1499
12.1365	0.4888	11200	12.1492
12.1606	0.4975	11400	12.1492
12.1536	0.5062	11600	12.1488
12.1682	0.5149	11800	12.1485
12.1495	0.5237	12000	12.1484
12.153	0.5324	12200	12.1479
12.1401	0.5411	12400	12.1481
12.1544	0.5498	12600	12.1475
12.1771	0.5586	12800	12.1471
12.1821	0.5673	13000	12.1469
12.1471	0.5760	13200	12.1468
12.1544	0.5848	13400	12.1466
12.1588	0.5935	13600	12.1465
12.1316	0.6022	13800	12.1464
12.1473	0.6109	14000	12.1461
12.1784	0.6197	14200	12.1458
12.1317	0.6284	14400	12.1457
12.1707	0.6371	14600	12.1457
12.1673	0.6459	14800	12.1458
12.1294	0.6546	15000	12.1456
12.1368	0.6633	15200	12.1455
12.1495	0.6720	15400	12.1452
12.1463	0.6808	15600	12.1455
12.1472	0.6895	15800	12.1451
12.1705	0.6982	16000	12.1451
12.1373	0.7069	16200	12.1449
12.1503	0.7157	16400	12.1450
12.1322	0.7244	16600	12.1449
12.1579	0.7331	16800	12.1446
12.1375	0.7419	17000	12.1450
12.1522	0.7506	17200	12.1449
12.1554	0.7593	17400	12.1446

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

error577
/

f59415d5-3f46-42ac-8615-c9ed0b877c86

f59415d5-3f46-42ac-8615-c9ed0b877c86

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for error577/f59415d5-3f46-42ac-8615-c9ed0b877c86

Evaluation results