Emoloom-2B / README.md

Update README.md

44a52df verified 17 days ago

5.75 kB





	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	base_model: Qwen/Qwen1.5-1.8B-Chat
	tags:
	- emotions
	- vad
	- dialogue
	- multi-label
	- empathy
	- psychology
	- evaluation
	datasets:
	- OpenDataLab/DailyDialog
	- goemotions
	- empathetic_dialogues
	model-index:
	- name: Emoloom-2B
	results:
	- task:
	type: text-classification
	name: Multi-label Emotion + VAD (in-text JSON)
	dataset:
	name: Mixed (GoEmotions + Empathetic Dialogues)
	type: custom
	metrics:
	- type: macro_f1
	value: 0.350
	- type: macro_precision
	value: 0.500
	- type: macro_recall
	value: 0.269
	- type: vad_1_minus_rmse
	value: 0.942
	- type: parse_ok
	value: 1.000
	- task:
	type: zero-shot-eval
	name: Cross-corpus Quick Eval (DailyDialog)
	dataset:
	name: OpenDataLab/DailyDialog
	type: dialog
	metrics:
	- type: macro_f1
	value: 0.307
	- type: vad_1_minus_rmse
	value: 0.807
	- type: parse_ok
	value: 0.976
	---

	# Emoloom-2B

	Emoloom-2B is a ~2B-parameter emotion understanding model that outputs multi-label emotion categories and continuous VAD (Valence, Arousal, Dominance) for dialogue utterances. It is fine-tuned from Qwen/Qwen1.5-1.8B-Chat with SFT on a curated mix of GoEmotions and Empathetic Dialogues, plus consistency constraints to keep JSON outputs robust and parsing-friendly.

	> Output format (single line JSON):
	> `{"labels": ["sad","anxious"], "vad": {"v": 0.42, "a": 0.31, "d": 0.28}, "rationale": "short evidence"}`

	---

	## ✨ Highlights

	- Dual signal: multi-label categories + continuous VAD in \[0,1], two decimals.
	- Robust JSON: training disables KV cache during generation for consistent formatting.
	- Long-tail focus: sampling and weak-label cleanup reduce “mode collapse” onto majority classes.
	- Paper-ready figures: bundled plotting code exports high-res bar/radar/CI-band PNGs.

	---

	## 📊 Results (dev & cross-corpus)

	\| Exp \| Macro-F1 \| Macro-P \| Macro-R \| VAD(1-RMSE) \| ParseOK \| n(dev) \|
	\|---------------------------:\|:--------:\|:-------:\|:-------:\|:-----------:\|:-------:\|-------:\|
	\| `sft_qwen_mix2080` \| 0.3500 \| 0.5000 \| 0.2693 \| 0.9417 \| 1.000 \| 3663 \|
	\| `sft_qwen_mix5050` \| 0.3470 \| 0.5000 \| 0.2657 \| 0.9337 \| 1.000 \| 3309 \|
	\| `sft_qwen_mix8020` \| 0.3341 \| 0.5000 \| 0.2509 \| 0.9135 \| 1.000 \| 2068 \|
	\| `sft_qwen_mix2080_dd_quick` (DailyDialog, quick) \| 0.3071 \| 0.5000 \| 0.2136 \| 0.8066 \| 0.976 \| 6261 \|

	Notes:
	- `ParseOK` = fraction of generations that are valid, one-line JSON.
	- VAD score is reported as 1 − RMSE (higher is better).

	---

	## 🧠 Model Details

	- Base: `Qwen/Qwen1.5-1.8B-Chat`
	- Size: ~1.8B params
	- Architecture: causal decoder-only transformer
	- Precision: BF16 training, eval in BF16/FP16/FP32 fallback
	- Tokenizer: Qwen tokenizer (pad set to EOS if missing)

	---

	## 🧾 Training Data & Processing

	- Sources: GoEmotions (multi-label), Empathetic Dialogues (dialogue empathy).
	- Mixing: ratios explored (20:80, 50:50, 80:20); 20:80 gave the best trade-off.
	- QC: remove toxic/unclear; enforce min VAD confidence; short rationale template.
	- Target JSON: `{labels, vad:{v,a,d}, rationale}` with two-decimal VAD.

	---

	## ⚙️ Fine-tuning Setup (SFT)

	- Max length: typically 1024–1536 tokens (adaptive truncation for stability)
	- Batch: micro-batch 1, gradient accumulation up to 128 (OOM-safe)
	- LR: ~1.2e-5 cosine decay, warmup ~3%
	- Stability: gradient checkpointing; `use_cache=False` at train/eval

	---

	## ✅ Evaluation

	- Prompts build a short system + user pair (context + utterance).
	- Greedy decode, max_new_tokens ~196 (quick eval uses 48).
	- Metrics:
	- Multi-label Macro-F1 / P / R on gold label space
	- VAD 1−RMSE on \[v,a,d]
	- ParseOK for JSON validity

	---

	## 🚀 Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import json, torch

	name = "Lixeeone/Emoloom-2B"
	tok = AutoTokenizer.from_pretrained(name, trust_remote_code=True, use_fast=True)
	model = AutoModelForCausalLM.from_pretrained(name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
	if tok.pad_token_id is None:
	tok.pad_token = tok.eos_token
	model.config.use_cache = False # keep output format stable

	context = "We argued last night but made up this morning."
	utterance = "I’m still a bit shaken though."

	sys = ("You are an empathetic assistant. Identify emotion labels (multi-label) "
	"and estimate VAD (Valence, Arousal, Dominance in [0,1]). Respond with STRICT one-line JSON only.")
	usr = (
	"Task: Read the text and provide emotion labels and VAD with two decimals, plus a brief rationale (<=30 words).\n"
	"Return JSON ONLY, single line:\n"
	'{{"labels": [...], "vad": {{"v": 0.00, "a": 0.00, "d": 0.00}}, "rationale": "..."}}\n'
	f"Context: {context}\n"
	f'Text: "{utterance}"'
	)

	msgs = [{"role":"system","content":sys},{"role":"user","content":usr}]
	prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
	inp = tok(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	out = model.generate(**inp, max_new_tokens=128, do_sample=False, use_cache=False)
	gen = tok.decode(out[0][inp["input_ids"].shape[1]:], skip_special_tokens=True)

	pred = json.loads(gen) # {"labels":[...], "vad":{"v":..,"a":..,"d":..}, "rationale": "..."}
	print(pred)