ctrltokyo/llm_prompt_mask_fill_model
This model is a fine-tuned version of distilbert-base-uncased on the code_instructions_120k dataset. It achieves the following results on the evaluation set:
- Train Loss: 2.1215
- Validation Loss: 1.5672
- Epoch: 0
Model description
It's just distilbert-base-uncased with some fine tuning.
Intended uses & limitations
This model could be used for live autocompletion of PROMPTS in a coding-specific chatbot. Don't try this on code, because it won't work.
Training and evaluation data
Evaluated on 5% of training data. No further evaluation performed at this point. Trained on NVIDIA V100.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'inner_optimizer': {'class_name': 'AdamWeightDecay', 'config': {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 2e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 108, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}}, 'dynamic': True, 'initial_scale': 32768.0, 'dynamic_growth_steps': 2000}
- training_precision: mixed_float16
Training results
Train Loss | Validation Loss | Epoch |
---|---|---|
2.1215 | 1.5672 | 0 |
Framework versions
- Transformers 4.31.0
- TensorFlow 2.12.0
- Datasets 2.14.1
- Tokenizers 0.13.3
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ctrltokyo/llm_prompt_mask_fill_model
Base model
distilbert/distilbert-base-uncased