|
--- |
|
language: en |
|
tags: |
|
- llm |
|
- text-generation |
|
- pytorch |
|
license: mit |
|
datasets: |
|
- custom |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- bleu |
|
- rouge |
|
base_model: |
|
- google-bert/bert-base-uncased |
|
pipeline_tag: text-classification |
|
library_name: transformers |
|
--- |
|
|
|
# JKU-G3-LLM-v2 |
|
|
|
## Model Description |
|
|
|
JKU-G3-LLM-v2 is a language model developed by the ASER team at JKU. This second version builds upon previous architectures with enhanced training methodology and evaluation metrics. |
|
|
|
## Training Details |
|
|
|
### Hyperparameters |
|
- Epochs: 5 |
|
- Training Samples: Unknown |
|
- Batch Size: Unknown |
|
- Learning Rate: Unknown |
|
|
|
### Training Results |
|
| Epoch | Training Loss | Validation Loss | |
|
|-------|---------------|-----------------| |
|
| 1 | 1.404300 | 1.422856 | |
|
| 2 | 0.975000 | 1.363832 | |
|
| 3 | 0.608100 | 1.515084 | |
|
| 4 | 0.359300 | 1.794247 | |
|
| 5 | 0.221200 | 2.057410 | |
|
|
|
### Performance Metrics |
|
- Final Training Loss: 0.7827 |
|
- Accuracy: 0.6 |
|
- F1 Score: 0.6 |
|
- BLEU: 0.0 |
|
- Precisions: [0.667, 0.286, 0.2, 0.0] |
|
- Brevity Penalty: 1.0 |
|
- Length Ratio: 1.125 |
|
- Translation Length: 9, |
|
- Reference Length: 8 |
|
- ROUGE: |
|
- ROUGE-1: 0.583 |
|
- ROUGE-2: 0.286 |
|
- ROUGE-L: 0.583 |
|
|
|
### Computational Details |
|
- Total FLOS: 1.3368e+16 |
|
- Training Runtime: 5,246.06 seconds (~1.46 hours) |
|
- Training Samples/Second: 13.003 |
|
- Training Steps/Second: 1.626 |
|
|
|
## Intended Uses & Limitations |
|
|
|
This model is intended for research purposes and text generation tasks. Users should be aware of the following limitations: |
|
|
|
1. The model shows some overfitting tendencies as evidenced by decreasing training loss but increasing validation loss |
|
2. Performance metrics indicate room for improvement in generation quality |
|
3. The model may inherit biases present in the training data |
|
|
|
## How to Use |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("b-aser/jku-g3-llm-v2") |
|
tokenizer = AutoTokenizer.from_pretrained("b-aser/jku-g3-llm-v2") |
|
|
|
inputs = tokenizer("Your input text here", return_tensors="pt") |
|
outputs = model.generate(**inputs) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
|
|
You can customize this further by: |
|
1. Adding more specific details about the architecture |
|
2. Including information about the training data |
|
3. Adding more detailed usage examples |
|
4. Providing more specific environmental impact calculations |
|
5. Adding any ethical considerations or bias analyses you've conducted |
|
|
|
|
|
Contact Information |
|
For questions, feedback, or collaboration opportunities, please contact: |
|
|
|
- Maintainer : Aser T. Alemu |
|
- Email : [email protected] |
|
- GitHub : github.com/b-aser |