|
|
--- |
|
|
language: es |
|
|
license: llama3.3 |
|
|
library_name: peft |
|
|
tags: |
|
|
- llama |
|
|
- llama-3.3 |
|
|
- peft |
|
|
- lora |
|
|
- qlora |
|
|
- conversational |
|
|
- spanish |
|
|
- workplace-safety |
|
|
- violence-prevention |
|
|
- chat |
|
|
- instruction-tuning |
|
|
base_model: meta-llama/Llama-3.3-70B-Instruct |
|
|
datasets: |
|
|
- bertin-project/alpaca-spanish |
|
|
model-index: |
|
|
- name: neurona |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# Neurona - a spanish workplace violence prevention and sexual harassment support model |
|
|
|
|
|
**Neurona** is a specialized fine-tuned version of Meta's `Llama-3.3-70B-Instruct` model, designed for Spanish-language conversations about workplace violence prevention and sexual harassment support. This PEFT (LoRA) adapter provides empathetic, professional, and informative responses to users seeking guidance and support in workplace safety situations. |
|
|
|
|
|
Fine-tuned using QLoRA on NVIDIA H100 GPU with a curated dataset of workplace violence prevention conversations. |
|
|
|
|
|
The repo with the finetuning scripts can be found [here](https://github.com/juanmvsa/llama3-3-70b-finetuning?tab=readme-ov-file). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type:** PEFT LoRA Adapter |
|
|
- **Base Model:** `meta-llama/Llama-3.3-70B-Instruct` |
|
|
- **Fine-tuning Method:** QLoRA (4-bit Quantized Low-Rank Adaptation) |
|
|
- **Language:** Spanish (es) |
|
|
- **Domain:** Workplace safety, violence prevention, and sexual harassment support |
|
|
- **License:** Llama 3.3 Community License |
|
|
- **Parameters:** LoRA adapter (~150M trainable parameters) |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is intended to be used as a conversational AI assistant to provide: |
|
|
- Educational information about workplace violence and harassment. |
|
|
- Guidance on reporting procedures and seeking help. |
|
|
- Empathetic support for individuals in difficult workplace situations. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
This model is **not** a substitute for professional legal, psychological, or crisis intervention services. It should not be used for: |
|
|
- Providing legal advice. |
|
|
- Medical or psychological diagnosis. |
|
|
- Emergency or crisis situations. |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### Requirements |
|
|
|
|
|
```bash |
|
|
pip install transformers torch peft bitsandbytes accelerate |
|
|
``` |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
This is a PEFT LoRA adapter that must be loaded on top of the base Llama 3.3 70B Instruct model: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
|
|
|
# Load base model and tokenizer |
|
|
base_model_id = "meta-llama/Llama-3.3-70B-Instruct" |
|
|
adapter_model_id = "juan/llama-33-70b-workplace-safety-es" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(base_model_id) |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
base_model_id, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto", |
|
|
load_in_4bit=True # Enable 4-bit quantization for memory efficiency |
|
|
) |
|
|
|
|
|
# Load PEFT adapter |
|
|
model = PeftModel.from_pretrained(base_model, adapter_model_id) |
|
|
|
|
|
# Specialized system prompt for workplace violence prevention |
|
|
system_prompt = """Eres un asistente especializado en prevención de violencia laboral y acoso sexual en el entorno de trabajo. Tu objetivo es proporcionar apoyo empático, información precisa y recursos específicos a personas que puedan estar experimentando situaciones difíciles en su lugar de trabajo. |
|
|
|
|
|
IMPORTANTE: Siempre mantén un tono profesional pero cálido, valida las emociones del usuario, y proporciona información práctica basada en protocolos establecidos.""" |
|
|
|
|
|
# Example conversation |
|
|
messages = [ |
|
|
{"role": "system", "content": system_prompt}, |
|
|
{"role": "user", "content": "Creo que estoy sufriendo acoso laboral, ¿qué puedo hacer?"}, |
|
|
] |
|
|
|
|
|
input_ids = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
add_generation_prompt=True, |
|
|
return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
outputs = model.generate( |
|
|
input_ids, |
|
|
max_new_tokens=512, |
|
|
eos_token_id=tokenizer.eos_token_id, |
|
|
do_sample=True, |
|
|
temperature=0.6, |
|
|
top_p=0.9, |
|
|
) |
|
|
response = outputs[0][input_ids.shape[-1]:] |
|
|
print(tokenizer.decode(response, skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Memory Requirements |
|
|
|
|
|
| Configuration | GPU Memory | RAM | Storage | |
|
|
|---------------|------------|-----|---------| |
|
|
| **4-bit quantized** | 8GB+ VRAM | 16GB+ | 20GB+ | |
|
|
| **Full precision** | 40GB+ VRAM | 64GB+ | 150GB+ | |
|
|
|
|
|
### Hardware Recommendations |
|
|
|
|
|
- **Recommended:** RTX 4090, A100, H100 (with 4-bit quantization) |
|
|
- **Minimum:** RTX 3090, V100 (with 4-bit quantization) |
|
|
- **CPU inference:** Possible but very slow (32GB+ RAM required) |
|
|
|
|
|
### Inference Script |
|
|
|
|
|
This repository includes a comprehensive inference script (`inference.py`) that supports: |
|
|
- Interactive model selection between base Llama 3.3 70B and Neurona |
|
|
- Side-by-side comparison of model responses |
|
|
- Single inference mode and interactive chat mode |
|
|
- Automatic quantization and memory optimization |
|
|
|
|
|
Usage examples: |
|
|
```bash |
|
|
# Interactive model selection |
|
|
python inference.py --interactive --token your_hf_token |
|
|
|
|
|
# Direct comparison mode |
|
|
python inference.py --interactive --single --prompt "¿Qué hacer ante acoso laboral?" --token your_hf_token |
|
|
|
|
|
# Neurona model only |
|
|
python inference.py --model meta-llama/Llama-3.3-70B-Instruct --token your_hf_token |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
- **Training Set:** A custom dataset of 48 Spanish instruction-response pairs focused on workplace violence prevention. |
|
|
- **Validation Set:** 1000 samples from the `bertin-project/alpaca-spanish` dataset to ensure general conversational quality. |
|
|
|
|
|
The training data was carefully curated to include empathetic, professional, and relevant responses for the target domain. |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
### Fine-tuning with QLoRA |
|
|
The model was fine-tuned using 4-bit NormalFloat (NF4) quantization and LoRA. |
|
|
|
|
|
- **LoRA `r`:** 128 |
|
|
- **LoRA `alpha`:** 32 |
|
|
- **LoRA `dropout`:** 0.05 |
|
|
- **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `embed_tokens`, `lm_head` |
|
|
|
|
|
### Hyperparameters |
|
|
- **Learning Rate:** 1e-4 |
|
|
- **Scheduler:** Cosine |
|
|
- **Epochs:** 3 |
|
|
- **Per-Device Batch Size:** 1 (optimized for H100) |
|
|
- **Gradient Accumulation Steps:** 32 (effective batch size: 32) |
|
|
- **Warmup Steps:** 100 |
|
|
- **Weight Decay:** 0.01 |
|
|
- **Gradient Clipping:** 0.5 |
|
|
|
|
|
### Hardware and Software |
|
|
- **GPU:** NVIDIA H100 PCIe (79.6GB effective memory) |
|
|
- **Software:** PyTorch 2.4.0, TRL, PEFT, bitsandbytes, accelerate |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Training Metrics |
|
|
| Metric | Value | |
|
|
|---|---| |
|
|
| **Training Loss** | 1.7418 | |
|
|
| **Mean Token Accuracy** | 63.63% | |
|
|
| **Entropy** | 1.1294 | |
|
|
| **Training Runtime** | 224 seconds (3.73 minutes) | |
|
|
| **Total FLOPs** | 2.33 × 10¹⁶ | |
|
|
| **Total Tokens Processed** | 54,621 | |
|
|
| **Samples per Second** | 0.429 | |
|
|
| **Global Steps** | 3 | |
|
|
|
|
|
### Conversation Quality |
|
|
A multi-dimensional evaluation framework was used to assess conversation quality, with a composite score of **0.73** (target > 0.65). |
|
|
|
|
|
| Metric | Score | |
|
|
|---|---| |
|
|
| **Empathy Score** | 0.67 | |
|
|
| **Domain Relevance** | 0.81 | |
|
|
| **Professional Tone** | 0.74 | |
|
|
|
|
|
## Limitations & Ethical Considerations |
|
|
|
|
|
### Model Limitations |
|
|
- **Domain Specificity:** Optimized for Spanish workplace violence prevention; may not perform well on general tasks |
|
|
- **Data Coverage:** Based on 32 training examples; may not cover all workplace situation nuances |
|
|
- **Cultural Context:** Designed for Spanish-speaking workplace environments |
|
|
- **Response Length:** Optimized for conversational responses, not long-form content |
|
|
|
|
|
### Ethical Guidelines |
|
|
- **Not Professional Services:** This model provides educational information only, not legal or psychological advice |
|
|
- **Crisis Situations:** For immediate danger, contact emergency services (112 in Spain, 911 in US) |
|
|
- **Privacy:** Users should not share sensitive personal information |
|
|
- **Bias Awareness:** Responses may reflect biases present in training data |
|
|
- **Human Oversight:** Recommend human review for critical workplace decisions |
|
|
|
|
|
### Safety Considerations |
|
|
- **Emergency Situations:** Always prioritize professional emergency services |
|
|
- **Legal Matters:** Consult qualified employment lawyers for legal advice |
|
|
- **Mental Health:** Seek licensed mental health professionals for psychological support |
|
|
- **Workplace Policies:** Follow your organization's specific HR protocols |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research or applications, please cite it as: |
|
|
|
|
|
```bibtex |
|
|
@misc{neurona-2025, |
|
|
author = {Juan MVS}, |
|
|
title = {Neurona: Spanish Workplace Violence Prevention Chatbot}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
journal = {Hugging Face Hub}, |
|
|
howpublished = {\url{https://huggingface.co/juanmvs/neurona}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- **Base Model:** Meta AI for Llama 3.3 70B Instruct |
|
|
- **Framework:** Hugging Face Transformers and PEFT libraries |
|
|
- **Training Infrastructure:** NVIDIA H100 GPU |
|
|
- **Validation Dataset:** Bertin Project for Spanish Alpaca dataset |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
This is a complete finetuning project that includes: |
|
|
- **Training Script:** `finetune_llama33_70b.py` - Comprehensive QLoRA training pipeline |
|
|
- **Inference Script:** `inference.py` - Interactive inference and model comparison |
|
|
- **Upload Script:** `upload_to_hf.py` - HuggingFace model upload utility |
|
|
- **Configuration:** `pyproject.toml` - Complete dependency and project configuration |
|
|
- **Training Data:** `ft_data.json` - 48 curated Spanish workplace safety conversations |
|
|
|
|
|
### Key Dependencies |
|
|
- PyTorch 2.4.0 with CUDA 12.1 support |
|
|
- Transformers ≥4.45.0 for Llama 3.3 compatibility |
|
|
- PEFT ≥0.12.0 for LoRA implementation |
|
|
- TRL ≥0.11.0 for supervised fine-tuning |
|
|
- BitsAndBytes ≥0.43.0 for 4-bit quantization |
|
|
- Weights & Biases for experiment tracking |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions about this model or collaboration opportunities: |
|
|
- **email:** [email protected] |
|
|
- **Model Repository:** [juanmvs/neurona](https://huggingface.co/juanmvs/neurona) |
|
|
|
|
|
--- |
|
|
|
|
|
**⚠️ Disclaimer:** This AI model is for educational and informational purposes only. For workplace violence situations requiring immediate intervention, please contact appropriate emergency services, HR departments, or professional counselors. |
|
|
|