---
language:
  - en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen1.5-1.8B-Chat
tags:
  - emotions
  - vad
  - dialogue
  - multi-label
  - empathy
  - psychology
  - evaluation
datasets:
  - OpenDataLab/DailyDialog
  - goemotions
  - empathetic_dialogues
model-index:
  - name: Emoloom-2B
    results:
      - task:
          type: text-classification
          name: Multi-label Emotion + VAD (in-text JSON)
        dataset:
          name: Mixed (GoEmotions + Empathetic Dialogues)
          type: custom
        metrics:
          - type: macro_f1
            value: 0.350
          - type: macro_precision
            value: 0.500
          - type: macro_recall
            value: 0.269
          - type: vad_1_minus_rmse
            value: 0.942
          - type: parse_ok
            value: 1.000
      - task:
          type: zero-shot-eval
          name: Cross-corpus Quick Eval (DailyDialog)
        dataset:
          name: OpenDataLab/DailyDialog
          type: dialog
        metrics:
          - type: macro_f1
            value: 0.307
          - type: vad_1_minus_rmse
            value: 0.807
          - type: parse_ok
            value: 0.976
---

# Emoloom-2B

**Emoloom-2B** is a ~2B-parameter emotion understanding model that outputs **multi-label emotion categories** and **continuous VAD** (Valence, Arousal, Dominance) for dialogue utterances. It is fine-tuned from **Qwen/Qwen1.5-1.8B-Chat** with SFT on a curated mix of GoEmotions and Empathetic Dialogues, plus consistency constraints to keep JSON outputs robust and parsing-friendly.

> Output format (single line JSON):
> `{"labels": ["sad","anxious"], "vad": {"v": 0.42, "a": 0.31, "d": 0.28}, "rationale": "short evidence"}`

---

## ✨ Highlights

- **Dual signal**: multi-label categories + continuous VAD in \[0,1], two decimals.
- **Robust JSON**: training disables KV cache during generation for consistent formatting.
- **Long-tail focus**: sampling and weak-label cleanup reduce “mode collapse” onto majority classes.
- **Paper-ready figures**: bundled plotting code exports high-res bar/radar/CI-band PNGs.

---

## 📊 Results (dev & cross-corpus)

| Exp                        | Macro-F1 | Macro-P | Macro-R | VAD(1-RMSE) | ParseOK | n(dev) |
|---------------------------:|:--------:|:-------:|:-------:|:-----------:|:-------:|-------:|
| `sft_qwen_mix2080`         | **0.3500** | 0.5000 | 0.2693 | **0.9417**  | 1.000   | 3663   |
| `sft_qwen_mix5050`         | 0.3470   | 0.5000 | 0.2657 | 0.9337      | 1.000   | 3309   |
| `sft_qwen_mix8020`         | 0.3341   | 0.5000 | 0.2509 | 0.9135      | 1.000   | 2068   |
| `sft_qwen_mix2080_dd_quick` (DailyDialog, quick) | 0.3071 | 0.5000 | 0.2136 | 0.8066 | 0.976 | 6261 |

Notes:
- `ParseOK` = fraction of generations that are valid, one-line JSON.
- VAD score is reported as **1 − RMSE** (higher is better).

---

## 🧠 Model Details

- **Base**: `Qwen/Qwen1.5-1.8B-Chat`
- **Size**: ~1.8B params
- **Architecture**: causal decoder-only transformer
- **Precision**: BF16 training, eval in BF16/FP16/FP32 fallback
- **Tokenizer**: Qwen tokenizer (pad set to EOS if missing)

---

## 🧾 Training Data & Processing

- **Sources**: GoEmotions (multi-label), Empathetic Dialogues (dialogue empathy).
- **Mixing**: ratios explored (20:80, 50:50, 80:20); **20:80** gave the best trade-off.
- **QC**: remove toxic/unclear; enforce min VAD confidence; short rationale template.
- **Target JSON**: `{labels, vad:{v,a,d}, rationale}` with two-decimal VAD.

---

## ⚙️ Fine-tuning Setup (SFT)

- **Max length**: typically 1024–1536 tokens (adaptive truncation for stability)
- **Batch**: micro-batch 1, gradient accumulation up to 128 (OOM-safe)
- **LR**: ~1.2e-5 cosine decay, warmup ~3%
- **Stability**: gradient checkpointing; `use_cache=False` at train/eval

---

## ✅ Evaluation

- Prompts build a short **system** + **user** pair (context + utterance).
- Greedy decode, max_new_tokens ~196 (quick eval uses 48).
- Metrics:
  - Multi-label **Macro-F1 / P / R** on gold label space
  - VAD **1−RMSE** on \[v,a,d]
  - **ParseOK** for JSON validity

---

## 🚀 Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import json, torch

name = "Lixeeone/Emoloom-2B"
tok = AutoTokenizer.from_pretrained(name, trust_remote_code=True, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
if tok.pad_token_id is None:
    tok.pad_token = tok.eos_token
model.config.use_cache = False  # keep output format stable

context = "We argued last night but made up this morning."
utterance = "I’m still a bit shaken though."

sys = ("You are an empathetic assistant. Identify emotion labels (multi-label) "
       "and estimate VAD (Valence, Arousal, Dominance in [0,1]). Respond with STRICT one-line JSON only.")
usr = (
    "Task: Read the text and provide emotion labels and VAD with two decimals, plus a brief rationale (<=30 words).\n"
    "Return JSON ONLY, single line:\n"
    '{{"labels": [...], "vad": {{"v": 0.00, "a": 0.00, "d": 0.00}}, "rationale": "..."}}\n'
    f"Context: {context}\n"
    f'Text: "{utterance}"'
)

msgs = [{"role":"system","content":sys},{"role":"user","content":usr}]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
inp = tok(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inp, max_new_tokens=128, do_sample=False, use_cache=False)
gen = tok.decode(out[0][inp["input_ids"].shape[1]:], skip_special_tokens=True)

pred = json.loads(gen)  # {"labels":[...], "vad":{"v":..,"a":..,"d":..}, "rationale": "..."}
print(pred)