|
--- |
|
model-index: |
|
- name: LORA - Child-Friendly Language Model |
|
results: |
|
- task: |
|
name: AI2 ARC (Easy) |
|
type: multiple-choice |
|
dataset: |
|
name: ai2_arc_easy |
|
type: ai2_arc |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 0.854 |
|
- name: Normalized Accuracy |
|
type: exact_match |
|
value: 0.861 |
|
- task: |
|
name: AI2 ARC (Challenge) |
|
type: multiple-choice |
|
dataset: |
|
name: ai2_arc_challenge |
|
type: ai2_arc |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 0.625 |
|
- name: Normalized Accuracy |
|
type: exact_match |
|
value: 0.662 |
|
- task: |
|
name: Winogrande |
|
type: multiple-choice |
|
dataset: |
|
name: winogrande |
|
type: winogrande |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 0.762 |
|
license: mit |
|
library_name: transformers |
|
tags: |
|
- mistral |
|
- instruct |
|
- children |
|
- educational |
|
- fine-tuned |
|
- open-source |
|
- child-friendly |
|
- german |
|
- story-telling |
|
- safe-language |
|
- NLP |
|
- language-model |
|
- ai4kids |
|
inference: |
|
parameters: |
|
max_new_tokens: 512 |
|
temperature: 0.7 |
|
top_p: 0.9 |
|
do_sample: true |
|
model_creator: Oscar Finance / HeyQQ GmbH |
|
model_name: LORA - Child-Friendly Mistral |
|
model_type: causal-language-model |
|
base_model: mistralai/Mistral-Small-24B-Instruct-2501 |
|
language: |
|
- de |
|
finetuned_from: mistralai/Mistral-Small-24B-Instruct-2501 |
|
pipeline_tag: text-generation |
|
model_description: > |
|
LORA is a fine-tuned version of the Mistral Small 24B Instruct model optimized |
|
for child-friendly language generation in educational contexts. |
|
Built by Oscar Finance, LORA focuses on safe, age-appropriate, and engaging |
|
storytelling tailored for children aged 6 to 12. |
|
|
|
The model was trained using a curated blend of high-quality open datasets and |
|
manually created narratives that reflect pedagogical goals. |
|
|
|
Its outputs are filtered for emotional safety, simplicity, and positive tone, |
|
making it suitable for use in apps like Oscar, der Geldfuchs. |
|
training: |
|
data_preprocessing: | |
|
All data underwent strict filtering and alignment to ensure: |
|
- Age-appropriateness (CEFR A1–A2 level) |
|
- Positive tone and moral guidance |
|
- Gender-balanced characters and inclusive language |
|
training_regime: bf16 mixed precision |
|
compute_resources: |
|
type: NVIDIA H100 x2 |
|
memory: 2 x 80 GB HBM3 |
|
vram_required: 47.14 GB |
|
throughput: ~7.1k tokens/sec |
|
finetuning_method: supervised fine-tuning (SFT) |
|
training_time: ~100 hours |
|
intended_use: > |
|
LORA is intended to be used as a child-safe story and dialogue generator for |
|
educational applications, story-based learning platforms, and interactive |
|
learning tools. |
|
It is especially suited for German-language projects but supports English as |
|
well. |
|
limitations: > |
|
While LORA is designed for child-friendly outputs, it may still produce text |
|
that requires human moderation in edge cases. |
|
It is not suitable for legal, medical, or critical advisory tasks. |
|
ethical_considerations: | |
|
- Promotes safe, inclusive, and non-violent content |
|
- Avoids biased, gendered, or culturally insensitive material |
|
- Designed with educational partners and child psychologists |
|
authors: |
|
- name: Dima Rubanov, Matthias Neumayer, Andreas Schaubmaier |
|
affiliation: HeyQQ GmbH |
|
contact: [email protected] |
|
- name: Oscar Stories Team |
|
references: |
|
- name: Oscar Stories |
|
url: https://oscarstories.com |
|
model_card_contact: |
|
email: [email protected] |
|
organization: HeyQQ GmbH |
|
--- |
|
# Model Card for oscarstories/lorastral24b\_0604 |
|
## Model Description |
|
`lorastral24b_0604` is a fine-tuned version of the Mistral Small 24B Instruct model, optimized for generating safe, engaging, and educational stories for children aged 6–12. Developed by HeyQQ GmbH for the Oscar Stories platform, LORA uses age-aligned prompts and safe-language training data to produce structured narratives suitable for primary education. |
|
|
|
It supports German and is tailored for storytelling applications with moral, cognitive, and pedagogical goals. |
|
|
|
### Story Prompt Framework |
|
|
|
LORA uses a structured system and user prompt strategy to ensure consistency and simplicity: |
|
|
|
* **System Prompt**: |
|
`Du bist ein Geschichtenerzähler für Kinder. Du schreibst kurze Geschichten mit genau 3 Absätzen. Die Geschichten sind für Kinder der Schulstufen 1-4. Der erste Absatz ist der Anfang der Geschichte, der dritte Absatz ist der Schluss. Beende die Geschichte mit 'ENDE.'` |
|
|
|
* **User Prompt Template**: |
|
`Schreibe eine Geschichte über {article} namens {name}. {pronoun} liebt {interest}. Erkläre den Begriff {topic}. ############### Schulstufe {age_group}` |
|
|
|
## Model Details |
|
|
|
* **Developed by**: HeyQQ GmbH (Oscar Finance) |
|
* **License**: MIT |
|
* **Model type**: Transformer-based causal language model |
|
* **Base model**: `mistralai/Mistral-Small-24B-Instruct-2501` |
|
* **Languages**: German |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
* Educational storytelling for primary school (grades 1–4) |
|
|
|
### Out-of-Scope Use |
|
|
|
* Legal, medical, or high-stakes content generation |
|
* Unmoderated or open-ended chat applications |
|
* Use with user-defined prompts without constraints |
|
|
|
Here is the updated model card section for the **Evaluation** of the LORA model, formatted and structured for Hugging Face: |
|
|
|
--- |
|
|
|
## Evaluation |
|
|
|
### Benchmarks |
|
|
|
The model was evaluated on standard zero-shot multiple-choice benchmarks using the Language Model Evaluation Harness (lm-eval-harness) \[Gao et al., 2024]: |
|
|
|
| Task | Accuracy | ± SE | Normalized Accuracy | ± SE | |
|
| ------------- | -------- | -------- | ------------------- | -------- | |
|
| ARC-Easy | 85.40 % | ± 0.72 % | 86.11 % | ± 0.71 % | |
|
| ARC-Challenge | 62.54 % | ± 1.41 % | 66.21 % | ± 1.38 % | |
|
| Winogrande | 76.24 % | ± 1.20 % | – | – | |
|
|
|
|
|
### Readability (per age group) |
|
|
|
* **Flesch Reading Ease (German)**: |
|
|
|
* Grades 1–2: 77.99 |
|
* Grades 3–4: 75.80 |
|
* Grades 5–6: 76.84 |
|
→ All levels exceed the 70-point threshold, ideal for CEFR A1–A2 comprehension. |
|
* **Wiener Sachtextformel**: |
|
|
|
* Grades 1–2: 3.13 |
|
* Grades 3–4: 3.42 |
|
* Grades 5–6: 3.28 |
|
→ Well below the threshold of 5, indicating age-appropriate readability. |
|
|
|
|
|
### Bias Analysis |
|
|
|
* **GenBit Score** (target < 0.70–0.90): |
|
|
|
* Grades 1–2: 0.55 |
|
* Grades 3–4: 0.54 |
|
* Grades 5–6: 0.50 |
|
→ Indicates low systemic gender bias. |
|
* **Female-to-Male Representation**: |
|
|
|
* Grades 1–2: 59.26 % female |
|
* Grades 3–4: 52.98 % female |
|
* Grades 5–6: 65.82 % female |
|
→ Slight female overrepresentation, contextually explainable, remains within acceptable bounds. |
|
|
|
### Summary |
|
|
|
The **LORA** model meets or exceeds all **Minimum Performance Requirements (MPR)** across: |
|
|
|
* **Readability**: Suitable for ages 6–12 based on Flesch and Wiener metrics. |
|
* **Fairness**: Low gender bias with near-balanced representation. |
|
* **Robustness**: Stable lexical and statistical behavior. |
|
|
|
Evaluation results validate the model’s effectiveness as an educational language model for children, supporting safe and inclusive content generation. |
|
|
|
## Recommendations |
|
|
|
* Content is optimized for German-language learning environments. |
|
* Outputs should still be reviewed in production systems targeting children. |
|
* Prompt structure should be followed for consistent output quality. |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline |
|
tokenizer = AutoTokenizer.from_pretrained("oscarstories/lorastral24b_0604") |
|
model = AutoModelForCausalLM.from_pretrained("oscarstories/lorastral24b_0604") |
|
generator = pipeline("text-generation", model=model, tokenizer=tokenizer) |
|
prompt = "Schreibe eine Geschichte über das Mädchen namens Emma. Sie liebt Pferde. Erkläre den Begriff Demokratie. ############### Schulstufe 3-4" |
|
output = generator(prompt, max_new_tokens=200, temperature=0.7, top_p=0.9, do_sample=True) |
|
print(output[0]["generated_text"]) |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Our dataset is a carefully curated collection of high-quality, child-appropriate content designed specifically for developing language models for young audiences. This documentation outlines our data collection process, cleaning pipeline, and GDPR compliance measures. |
|
|
|
Our corpus consists of two main public sources: |
|
|
|
### 1. [Klexikon](https://klexikon.zum.de/) |
|
|
|
* **Type**: German-language children's encyclopedia |
|
* **Audience**: Children aged 6–12 |
|
* **License**: Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA 4.0) |
|
|
|
Klexikon is a well-established German online encyclopedia created specifically for children. Unlike open-edit platforms, Klexikon enforces strict editorial guidelines and article reviews to ensure all content is suitable for young readers. Articles are written in simple, educationally aligned language and cover a wide range of school-relevant topics. |
|
|
|
Klexikon is publicly recognized as a reliable educational resource, including endorsements by the German Federal Ministry of Family Affairs. |
|
|
|
### 2. [KiwiThek](https://kiwithek.wien/) |
|
|
|
* **Type**: Austrian children's encyclopedia and learning wiki |
|
* **Audience**: Primary school learners and German language beginners |
|
* **License**: Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA 4.0), unless otherwise noted |
|
|
|
KiwiThek is a project of the Wiener Bildungsserver (Vienna Education Network). It features articles in both standard German and “Deutsch einfach” — a simplified German variant tailored for children and language learners. This approach enhances accessibility and makes the content especially suitable for AI systems focused on inclusive learning. |
|
|
|
|
|
### Preprocessing |
|
|
|
* Readability metrics vocabulary simplification |
|
* Positive tone and emotional safety checks |
|
* Balanced character/gender representation |
|
|
|
### Hyperparameters & Setup |
|
|
|
* **Precision**: bf16 mixed |
|
* **Hardware**: 2 × NVIDIA H100 (80 GB HBM3) |
|
* **VRAM used**: 47.14 GB |
|
* **Throughput**: \~7,100 tokens/sec |
|
* **Training duration**: \~100 GPU hours |
|
|
|
## Environmental Impact |
|
|
|
* Training used \~100 hours on 2 × H100 GPUs (16-bit precision) |
|
* Throughput-optimized to reduce idle cycles per token |
|
|
|
## Authors and Contact |
|
|
|
* **Lead Authors**: Dima Rubanov, Matthias Neumayer, Andreas Schaubmaier |
|
* **Organization**: HeyQQ GmbH |
|
* **Contact**: [[email protected]](mailto:[email protected]) |
|
* **Project Website**: [https://oscarstories.com](https://oscarstories.com) |
|
|
|
--- |