BlackBeenie's picture
73c3a7b verified
library_name: transformers
tags: []
# Model Card for Model ID
The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters.
This model uses peft finetuning with NEFTune for robustness.
## Model Details
### Model Description
This model is a finetuned model of the [mistralai/Mistral-7B-v0.1](
## Training Details
### Training Data
This model is finetuned with [kaist-ai/CoT-Collection](
### Training Procedure
This model trained with SFT trainer and [NEFTune]( method.
(According to the paper, NEFTune adds noise to the embedding vectors during training)
#### Training Hyperparameters
- lora alpha: 16
- lora r: 64
- lora dropout: 0.05
- max sequence length: 4096
- learning rate: 2e-4
- max_grad_norm: 0.3
- weight_decay: 0.001
- gradient checkpoint: True
- optim: paged_adamw_32bit
- use_bf16: True
- use_4bit: True
- use_nested_quant: False
- bnb_4bit_compute_dtype: float16
- bnb_4bit_quant_type: nf4