--- license: cc base_model: Lambent/cosmoem-4x1b tags: - generated_from_trainer model-index: - name: lisa-out results: [] --- Intuitively it seemed like LISA training should suit a MoE pretty well; though I don't know how well calibrated my intuitions are. Interesting thing about this one is it looks like it wasn't converging at the end of one epoch. Still more to learn. Nous capabilities: | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |-----------------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[CosMoEAlpacaLisa-4x1b](https://huggingface.co/Lambent/CosMoEAlpacaLisa-4x1b)| 23.44| 48.13| 41.13| 29.95| 35.66| Comparisons: | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |---------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[CosMoE-AlpacaLight-v0.6](https://huggingface.co/Lambent/CosMoE-AlpacaLight-v0.6)| 23.3| 52.15| 38.57| 29.01| 35.76| | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |-------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[CosmoAlpacaLisa-0.3-1b](https://huggingface.co/Lambent/CosmoAlpacaLisa-0.3-1b)| 23.79| 51.61| 40.25| 29.97| 36.41| | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |-------------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[CosmoAlpacaLight-1b](https://huggingface.co/Lambent/CosmoAlpacaLight-1b)| 24.28| 51.31| 40.33| 29.47| 36.35| | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |---------------------------------------------------------|------:|------:|---------:|-------:|------:| |[cosmo-1b](https://huggingface.co/HuggingFaceTB/cosmo-1b)| 22.97| 52.01| 38.02| 28.73| 35.43| [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.4.0` ```yaml base_model: Lambent/cosmoem-4x1b model_type: AutoModelForCausalLM tokenizer_type: LlamaTokenizer trust_remote_code: true load_in_8bit: false load_in_4bit: false strict: false datasets: - path: vicgalle/alpaca-gpt4 type: alpaca dataset_prepared_path: prepared-alpaca val_set_size: 0.05 output_dir: ./lisa-out sequence_len: 2048 sample_packing: true eval_sample_packing: false pad_to_sequence_len: true lisa_n_layers: 4 lisa_step_interval: 10 lisa_layers_attribute: model.layers adapter: lora_model_dir: lora_r: lora_alpha: lora_dropout: lora_target_linear: lora_fan_in_fan_out: wandb_project: CosMoE-AlpacaLisa wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 1 micro_batch_size: 1 num_epochs: 1 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.0005 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true loss_watchdog_threshold: 3.0 loss_watchdog_patience: 3 warmup_steps: 20 evals_per_epoch: 4 saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.002 fsdp: fsdp_config: special_tokens: ```

# lisa-out This model is a fine-tuned version of [Lambent/cosmoem-4x1b](https://huggingface.co/Lambent/cosmoem-4x1b) on the None dataset. It achieves the following results on the evaluation set: - Loss: 1.2588 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 20 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 1.197 | 0.0 | 1 | 1.5990 | | 1.4959 | 0.25 | 1383 | 1.4359 | | 1.6549 | 0.5 | 2766 | 1.3353 | | 1.3571 | 0.75 | 4149 | 1.2588 | ### Framework versions - Transformers 4.40.0.dev0 - Pytorch 2.1.2+cu118 - Datasets 2.18.0 - Tokenizers 0.15.0