neurona / README.md

Update README.md

f6af6d9 verified about 2 months ago

9.91 kB

	---
	language: es
	license: llama3.3
	library_name: peft
	tags:
	- llama
	- llama-3.3
	- peft
	- lora
	- qlora
	- conversational
	- spanish
	- workplace-safety
	- violence-prevention
	- chat
	- instruction-tuning
	base_model: meta-llama/Llama-3.3-70B-Instruct
	datasets:
	- bertin-project/alpaca-spanish
	model-index:
	- name: neurona
	results: []
	---

	# Neurona - a spanish workplace violence prevention and sexual harassment support model

	Neurona is a specialized fine-tuned version of Meta's `Llama-3.3-70B-Instruct` model, designed for Spanish-language conversations about workplace violence prevention and sexual harassment support. This PEFT (LoRA) adapter provides empathetic, professional, and informative responses to users seeking guidance and support in workplace safety situations.

	Fine-tuned using QLoRA on NVIDIA H100 GPU with a curated dataset of workplace violence prevention conversations.

	The repo with the finetuning scripts can be found [here](https://github.com/juanmvsa/llama3-3-70b-finetuning?tab=readme-ov-file).

	## Model Details

	- Model Type: PEFT LoRA Adapter
	- Base Model: `meta-llama/Llama-3.3-70B-Instruct`
	- Fine-tuning Method: QLoRA (4-bit Quantized Low-Rank Adaptation)
	- Language: Spanish (es)
	- Domain: Workplace safety, violence prevention, and sexual harassment support
	- License: Llama 3.3 Community License
	- Parameters: LoRA adapter (~150M trainable parameters)

	## Intended Use

	This model is intended to be used as a conversational AI assistant to provide:
	- Educational information about workplace violence and harassment.
	- Guidance on reporting procedures and seeking help.
	- Empathetic support for individuals in difficult workplace situations.

	### Out-of-Scope Use
	This model is not a substitute for professional legal, psychological, or crisis intervention services. It should not be used for:
	- Providing legal advice.
	- Medical or psychological diagnosis.
	- Emergency or crisis situations.

	## How to Use

	### Requirements

	```bash
	pip install transformers torch peft bitsandbytes accelerate
	```

	### Basic Usage

	This is a PEFT LoRA adapter that must be loaded on top of the base Llama 3.3 70B Instruct model:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load base model and tokenizer
	base_model_id = "meta-llama/Llama-3.3-70B-Instruct"
	adapter_model_id = "juan/llama-33-70b-workplace-safety-es"

	tokenizer = AutoTokenizer.from_pretrained(base_model_id)
	base_model = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	load_in_4bit=True # Enable 4-bit quantization for memory efficiency
	)

	# Load PEFT adapter
	model = PeftModel.from_pretrained(base_model, adapter_model_id)

	# Specialized system prompt for workplace violence prevention
	system_prompt = """Eres un asistente especializado en prevención de violencia laboral y acoso sexual en el entorno de trabajo. Tu objetivo es proporcionar apoyo empático, información precisa y recursos específicos a personas que puedan estar experimentando situaciones difíciles en su lugar de trabajo.

	IMPORTANTE: Siempre mantén un tono profesional pero cálido, valida las emociones del usuario, y proporciona información práctica basada en protocolos establecidos."""

	# Example conversation
	messages = [
	{"role": "system", "content": system_prompt},
	{"role": "user", "content": "Creo que estoy sufriendo acoso laboral, ¿qué puedo hacer?"},
	]

	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	outputs = model.generate(
	input_ids,
	max_new_tokens=512,
	eos_token_id=tokenizer.eos_token_id,
	do_sample=True,
	temperature=0.6,
	top_p=0.9,
	)
	response = outputs[0][input_ids.shape[-1]:]
	print(tokenizer.decode(response, skip_special_tokens=True))
	```

	### Memory Requirements

	\| Configuration \| GPU Memory \| RAM \| Storage \|
	\|---------------\|------------\|-----\|---------\|
	\| 4-bit quantized \| 8GB+ VRAM \| 16GB+ \| 20GB+ \|
	\| Full precision \| 40GB+ VRAM \| 64GB+ \| 150GB+ \|

	### Hardware Recommendations

	- Recommended: RTX 4090, A100, H100 (with 4-bit quantization)
	- Minimum: RTX 3090, V100 (with 4-bit quantization)
	- CPU inference: Possible but very slow (32GB+ RAM required)

	### Inference Script

	This repository includes a comprehensive inference script (`inference.py`) that supports:
	- Interactive model selection between base Llama 3.3 70B and Neurona
	- Side-by-side comparison of model responses
	- Single inference mode and interactive chat mode
	- Automatic quantization and memory optimization

	Usage examples:
	```bash
	# Interactive model selection
	python inference.py --interactive --token your_hf_token

	# Direct comparison mode
	python inference.py --interactive --single --prompt "¿Qué hacer ante acoso laboral?" --token your_hf_token

	# Neurona model only
	python inference.py --model meta-llama/Llama-3.3-70B-Instruct --token your_hf_token
	```

	## Training Data

	- Training Set: A custom dataset of 48 Spanish instruction-response pairs focused on workplace violence prevention.
	- Validation Set: 1000 samples from the `bertin-project/alpaca-spanish` dataset to ensure general conversational quality.

	The training data was carefully curated to include empathetic, professional, and relevant responses for the target domain.

	## Training Procedure

	### Fine-tuning with QLoRA
	The model was fine-tuned using 4-bit NormalFloat (NF4) quantization and LoRA.

	- LoRA `r`: 128
	- LoRA `alpha`: 32
	- LoRA `dropout`: 0.05
	- Target Modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `embed_tokens`, `lm_head`

	### Hyperparameters
	- Learning Rate: 1e-4
	- Scheduler: Cosine
	- Epochs: 3
	- Per-Device Batch Size: 1 (optimized for H100)
	- Gradient Accumulation Steps: 32 (effective batch size: 32)
	- Warmup Steps: 100
	- Weight Decay: 0.01
	- Gradient Clipping: 0.5

	### Hardware and Software
	- GPU: NVIDIA H100 PCIe (79.6GB effective memory)
	- Software: PyTorch 2.4.0, TRL, PEFT, bitsandbytes, accelerate

	## Evaluation

	### Training Metrics
	\| Metric \| Value \|
	\|---\|---\|
	\| Training Loss \| 1.7418 \|
	\| Mean Token Accuracy \| 63.63% \|
	\| Entropy \| 1.1294 \|
	\| Training Runtime \| 224 seconds (3.73 minutes) \|
	\| Total FLOPs \| 2.33 × 10¹⁶ \|
	\| Total Tokens Processed \| 54,621 \|
	\| Samples per Second \| 0.429 \|
	\| Global Steps \| 3 \|

	### Conversation Quality
	A multi-dimensional evaluation framework was used to assess conversation quality, with a composite score of 0.73 (target > 0.65).

	\| Metric \| Score \|
	\|---\|---\|
	\| Empathy Score \| 0.67 \|
	\| Domain Relevance \| 0.81 \|
	\| Professional Tone \| 0.74 \|

	## Limitations & Ethical Considerations

	### Model Limitations
	- Domain Specificity: Optimized for Spanish workplace violence prevention; may not perform well on general tasks
	- Data Coverage: Based on 32 training examples; may not cover all workplace situation nuances
	- Cultural Context: Designed for Spanish-speaking workplace environments
	- Response Length: Optimized for conversational responses, not long-form content

	### Ethical Guidelines
	- Not Professional Services: This model provides educational information only, not legal or psychological advice
	- Crisis Situations: For immediate danger, contact emergency services (112 in Spain, 911 in US)
	- Privacy: Users should not share sensitive personal information
	- Bias Awareness: Responses may reflect biases present in training data
	- Human Oversight: Recommend human review for critical workplace decisions

	### Safety Considerations
	- Emergency Situations: Always prioritize professional emergency services
	- Legal Matters: Consult qualified employment lawyers for legal advice
	- Mental Health: Seek licensed mental health professionals for psychological support
	- Workplace Policies: Follow your organization's specific HR protocols

	## Citation

	If you use this model in your research or applications, please cite it as:

	```bibtex
	@misc{neurona-2025,
	author = {Juan MVS},
	title = {Neurona: Spanish Workplace Violence Prevention Chatbot},
	year = {2025},
	publisher = {Hugging Face},
	journal = {Hugging Face Hub},
	howpublished = {\url{https://huggingface.co/juanmvs/neurona}}
	}
	```

	## Acknowledgments

	- Base Model: Meta AI for Llama 3.3 70B Instruct
	- Framework: Hugging Face Transformers and PEFT libraries
	- Training Infrastructure: NVIDIA H100 GPU
	- Validation Dataset: Bertin Project for Spanish Alpaca dataset

	## Project Structure

	This is a complete finetuning project that includes:
	- Training Script: `finetune_llama33_70b.py` - Comprehensive QLoRA training pipeline
	- Inference Script: `inference.py` - Interactive inference and model comparison
	- Upload Script: `upload_to_hf.py` - HuggingFace model upload utility
	- Configuration: `pyproject.toml` - Complete dependency and project configuration
	- Training Data: `ft_data.json` - 48 curated Spanish workplace safety conversations

	### Key Dependencies
	- PyTorch 2.4.0 with CUDA 12.1 support
	- Transformers ≥4.45.0 for Llama 3.3 compatibility
	- PEFT ≥0.12.0 for LoRA implementation
	- TRL ≥0.11.0 for supervised fine-tuning
	- BitsAndBytes ≥0.43.0 for 4-bit quantization
	- Weights & Biases for experiment tracking

	## Contact

	For questions about this model or collaboration opportunities:
	- email: [email protected]
	- Model Repository: [juanmvs/neurona](https://huggingface.co/juanmvs/neurona)

	---

	⚠️ Disclaimer: This AI model is for educational and informational purposes only. For workplace violence situations requiring immediate intervention, please contact appropriate emergency services, HR departments, or professional counselors.