---
license: mit
datasets:
- pfb30/multi_woz_v22
language:
- en
pipeline_tag: text-generation
---
model_card = """
# 🧠 Model Card: Sam‑2.0

## 📌 Model Overview
**Sam‑2.0** is a modular, head‑agnostic Transformer architecture designed for chat‑style and multimodal reasoning tasks. It emphasizes reproducibility, ablation‑friendly design, and clean benchmarking across input modalities.

- **Architecture**: Transformer encoder with RoPE positional encoding, MQA attention, and modular input adapters  
- **Training Objective**: Causal language modeling (CLM)  
- **Checkpoint**: `sam2-epoch35.safetensors`  
- **Final Train Loss**: 1.04  
- **Validation Loss**: Not tracked in this run  
- **Training Duration**: ~6272s over 35 epochs  
- **Framework**: PyTorch + Hugging Face Transformers (custom registry)

## 🧱 Model Architecture
| Component         | Description                                                                 |
|------------------|-----------------------------------------------------------------------------|
| Backbone         | Transformer encoder with RoPE and MQA                                       |
| Input Adapter    | Tokenizer-driven byte-level embedding layer                                 |
| Positional Bias  | Rotary embeddings (RoPE)                                                    |
| Attention        | Multi-query attention (MQA)                                                 |
| Head             | Head-agnostic registry (default: classification placeholder)                |
| Checkpoint Format| `safetensors` with metadata for reproducibility                             |

## 🧪 Training Details
- **Dataset**: Synthetic chat-style corpus with adversarial prompt patterns  
- **Batch Size**: 1055 steps per epoch  
- **Optimizer**: AdamW  
- **Learning Rate Schedule**: Cosine decay with warmup  
- **Loss Function**: Cross-entropy over token predictions  
- **Hardware**: Kaggle TPUv2 (simulated)  
- **Logging**: Step-wise loss tracking, no validation during training

## 📊 Evaluation
| Metric         | Value       | Notes                                 |
|----------------|-------------|---------------------------------------|
| Final Train Loss | 1.04      | Achieved at Epoch 35/35               |
| Validation Loss  | —         | Not tracked in this run               |
| Inference Speed  | Fast      | Optimized for edge deployment         |
| Generalisation   | TBD       | To be compared against Sam‑2.5        |

## 🔧 Intended Use
- **Research**: Benchmarking modular architectures and ablation studies  
- **Education**: Reasoning scaffolds and logic quizzes  
- **Deployment**: Lightweight agents for chat and multimodal fusion (with adapters)

## 🚫 Limitations
- No validation tracking — generalisation must be inferred via external harnesses  
- Trained on synthetic data — may not generalize to real-world dialogue without fine-tuning  
- Head is placeholder — downstream tasks require custom head registration

## 📁 Files
- `sam2-epoch35.safetensors` — final checkpoint  
- `config.yaml` — architecture and training config  
- `tokenizer.json` — byte-level tokenizer  
- `README.md` — training logs and setup instructions

## 🧩 How to Load
```python
from sam2 import build_sam2_model
import torch

model = build_sam2_model(config="config.yaml")
model.load_state_dict(torch.load("sam2-epoch35.safetensors"))
model.eval()