Smilyai-labs
/

Sam-2.0

+---
+license: mit
+datasets:
+- pfb30/multi_woz_v22
+language:
+- en
+pipeline_tag: text-generation
+---
+model_card = """
+# 🧠 Model Card: Sam‑2.0
+## 📌 Model Overview
+**Sam‑2.0** is a modular, head‑agnostic Transformer architecture designed for chat‑style and multimodal reasoning tasks. It emphasizes reproducibility, ablation‑friendly design, and clean benchmarking across input modalities.
+- **Architecture**: Transformer encoder with RoPE positional encoding, MQA attention, and modular input adapters
+- **Training Objective**: Causal language modeling (CLM)
+- **Checkpoint**: `sam2-epoch35.safetensors`
+- **Final Train Loss**: 1.04
+- **Validation Loss**: Not tracked in this run
+- **Training Duration**: ~6272s over 35 epochs
+- **Framework**: PyTorch + Hugging Face Transformers (custom registry)
+## 🧱 Model Architecture
+| Component         | Description                                                                 |
+|------------------|-----------------------------------------------------------------------------|
+| Backbone         | Transformer encoder with RoPE and MQA                                       |
+| Input Adapter    | Tokenizer-driven byte-level embedding layer                                 |
+| Positional Bias  | Rotary embeddings (RoPE)                                                    |
+| Attention        | Multi-query attention (MQA)                                                 |
+| Head             | Head-agnostic registry (default: classification placeholder)                |
+| Checkpoint Format| `safetensors` with metadata for reproducibility                             |
+## 🧪 Training Details
+- **Dataset**: Synthetic chat-style corpus with adversarial prompt patterns
+- **Batch Size**: 1055 steps per epoch
+- **Optimizer**: AdamW
+- **Learning Rate Schedule**: Cosine decay with warmup
+- **Loss Function**: Cross-entropy over token predictions
+- **Hardware**: Kaggle TPUv2 (simulated)
+- **Logging**: Step-wise loss tracking, no validation during training
+## 📊 Evaluation
+| Metric         | Value       | Notes                                 |
+|----------------|-------------|---------------------------------------|
+| Final Train Loss | 1.04      | Achieved at Epoch 35/35               |
+| Validation Loss  | —         | Not tracked in this run               |
+| Inference Speed  | Fast      | Optimized for edge deployment         |
+| Generalisation   | TBD       | To be compared against Sam‑2.5        |
+## 🔧 Intended Use
+- **Research**: Benchmarking modular architectures and ablation studies
+- **Education**: Reasoning scaffolds and logic quizzes
+- **Deployment**: Lightweight agents for chat and multimodal fusion (with adapters)
+## 🚫 Limitations
+- No validation tracking — generalisation must be inferred via external harnesses
+- Trained on synthetic data — may not generalize to real-world dialogue without fine-tuning
+- Head is placeholder — downstream tasks require custom head registration
+## 📁 Files
+- `sam2-epoch35.safetensors` — final checkpoint
+- `config.yaml` — architecture and training config
+- `tokenizer.json` — byte-level tokenizer
+- `README.md` — training logs and setup instructions
+## 🧩 How to Load
+```python
+from sam2 import build_sam2_model
+import torch
+model = build_sam2_model(config="config.yaml")
+model.load_state_dict(torch.load("sam2-epoch35.safetensors"))
+model.eval()