Swabian-German Translation Model (DPO-Enhanced)

This model fine-tunes LLAMA 3.1 8B for bidirectional translation between Standard German and Swabian dialect, enhanced through Direct Preference Optimization (DPO).

Model Details

  • Base Model: LLAMA 3.1 8B
  • Training Method: Two-stage fine-tuning (SFT + DPO)
  • Training Data: 12,000+ word-pair translations with contextual sentences
  • Hardware Requirements: Compatible with single-GPU setups (thanks to QLoRA)

Intended Use

  • Translating between Standard German and Swabian dialect
  • Understanding and preserving regional linguistic variations
  • Educational purposes for language learners

Usage

Basic Translation

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "your-username/swabian-translator-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Example translation from Swabian to Standard German
def translate(text, direction="to_german"):
    if direction == "to_german":
        prompt = f"Übersetze ins Hochdeutsche: {text}"
    else:
        prompt = f"Übersetze ins Schwäbische: {text}"
    
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=100)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
swabian_text = "Du hosch ja a blaus Mol am Arm!"
german_translation = translate(swabian_text, "to_german")
print(german_translation)  # Expected: "Du hast ja einen Bluterguss am Arm!"

Translation Examples

Swabian to German:

Input: "I han koi Zeit"
Output: "Ich habe keine Zeit"

Input: "Des goht et"
Output: "Das geht nicht"

Input: "Wo bisch du her komma?"
Output: "Woher kommst du?"

German to Swabian:

Input: "Ich verstehe das nicht"
Output: "I versteh des et"

Input: "Das schmeckt sehr gut"
Output: "Des schmeckt arg guat"

Model Architecture & Training

Training Process

  1. Initial Dataset Preparation

    • Base dataset: 12,000+ word pairs from Schwäbisch-Schwätza wordbook
    • Context enhancement using LLM-generated sentences
    • Manual verification and cleanup
  2. SFT (Supervised Fine-Tuning)

    • QLoRA implementation for efficient training
    • 2 epochs on the complete dataset
    • Loss convergence at ~0.8
  3. DPO (Direct Preference Optimization)

    • 300 carefully curated preference pairs
    • 3 epochs of preference learning
    • Focus on natural and accurate translations

Technical Implementation

  • Quantized training using QLoRA
  • 4-bit precision for efficient resource usage
  • Training framework: UnslothAI
  • Single GPU training (~16GB VRAM required)

Limitations and Considerations

  1. Dialect Variations

    • Swabian varies significantly by region
    • Model focuses on common/standard Swabian expressions
    • May not capture all local variations
  2. Translation Quality

    • Best performance on common phrases and expressions
    • May struggle with very colloquial or context-dependent translations
    • Not recommended for official or legal translations
  3. Technical Limitations

    • Input length limited to 512 tokens
    • Generation speed affected by quantization
    • Memory requirements: ~8GB RAM minimum

Community and Contributions

We welcome community contributions to improve the model:

  • Additional training data
  • Regional variant documentation
  • Bug reports and fixes
  • Performance improvements

Please submit issues or pull requests through the Hugging Face repository.

Citation and Attribution

@misc{swabian-german-translator,
  author = {[Your Name]},
  title = {Swabian-German Translation Model},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Spaces using Mario12355/swabian_german_translator 2