Sam-2.0 / README.md

Update README.md

88cf0cd verified 16 days ago

4.18 kB

	---
	license: mit
	datasets:
	- pfb30/multi_woz_v22
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- Sam-2
	- text-generation
	---

	# 🧠 Model Card: Sam‑2.0

	## 📌 Model Overview
	Sam‑2.0 is a minimal, modular, decoder‑only Transformer architecture designed for chat‑style reasoning tasks.
	It emphasizes reproducibility, ablation‑friendly design, and clean benchmarking across input modalities.

	- Architecture: Decoder‑only Transformer with RMSNorm, SwiGLU feed‑forward, and causal masking
	- Training Objective: Causal language modeling (CLM) with role‑based label masking
	- Checkpoint: `sam2-epoch35.safetensors`
	- Final Train Loss: 1.04
	- Validation Loss: Not tracked in this run
	- Training Duration: ~6272 s over 35 epochs
	- Framework: PyTorch + Hugging Face Transformers (custom model class)

	## 🧱 Model Architecture
	\| Component \| Description \|
	\|-------------------\|-----------------------------------------------------------------------------\|
	\| Backbone \| Decoder‑only Transformer stack \|
	\| Normalization \| RMSNorm \|
	\| Attention \| Multi‑head self‑attention (causal) \|
	\| Feed‑Forward \| SwiGLU activation with dropout \|
	\| Positional Bias \| Learned absolute positions (no RoPE in this minimal variant) \|
	\| Head \| Tied‑embedding LM head \|
	\| Checkpoint Format \| `safetensors` with metadata for reproducibility \|

	## 🧪 Training Details
	- Dataset: [pfb30/multi_woz_v22](https://huggingface.co/datasets/pfb30/multi_woz_v22)
	- Batch Size: 8
	- Optimizer: AdamW
	- Learning Rate: 2 × 10⁻⁴ (constant in this run)
	- Loss Function: Cross‑entropy over assistant tokens only
	- Hardware: Kaggle GPU runtime
	- Logging: Step‑wise loss tracking, no validation during training

	## 📊 Evaluation
	\| Metric \| Value \| Notes \|
	\|------------------\|-------------\|---------------------------------------\|
	\| Final Train Loss \| 1.04 \| Achieved at Epoch 35/35 \|
	\| Validation Loss \| — \| Not tracked in this run \|
	\| Inference Speed \| Fast \| Lightweight architecture \|
	\| Generalisation \| TBD \| To be compared against Sam‑2.5 \|

	## 🔧 Intended Use
	- Research: Benchmarking modular architectures and ablation studies
	- Education: Reasoning scaffolds and logic quizzes
	- Deployment: Lightweight agents for chat and dialogue modeling

	## 🚫 Limitations
	- No validation tracking — generalisation must be inferred via external harnesses
	- Trained on MultiWOZ v2.2 only — may not generalize to other domains without fine‑tuning
	- Minimal architecture — no RoPE/MQA in this variant

	## 📁 Files
	- `sam2-epoch35.safetensors` — final checkpoint
	- `config.json` — architecture and training config
	- `tokenizer.json` — tokenizer with special tokens
	- `README.md` — training logs and setup instructions

	## 🧩 How to Load
	```python
	from transformers import AutoTokenizer
	import torch
	from sam2 import Sam2, Sam2Config # your custom model class

	tok = AutoTokenizer.from_pretrained("Smilyai-labs/Sam-2.0")
	cfg = Sam2Config(**json.load(open("config.json")))
	model = Sam2(cfg)
	state = torch.load("sam2-epoch35.safetensors", map_location="cpu")
	model.load_state_dict(state)
	model.eval()

	prompt = "<\|user\|> Hello! <\|eot\|>\n<\|assistant\|>"
	ids = tok.encode(prompt, return_tensors="pt")
	with torch.no_grad():
	for _ in range(50):
	logits = model(ids)
	next_id = torch.argmax(logits[:, -1, :], dim=-1, keepdim=True)
	ids = torch.cat([ids, next_id], dim=1)
	if next_id.item() == tok.eos_token_id:
	break

	print(tok.decode(ids[0]))