--- license: mit datasets: - pfb30/multi_woz_v22 language: - en pipeline_tag: text-generation --- model_card = """ # 🧠 Model Card: Sam‑2.0 ## 📌 Model Overview **Sam‑2.0** is a modular, head‑agnostic Transformer architecture designed for chat‑style and multimodal reasoning tasks. It emphasizes reproducibility, ablation‑friendly design, and clean benchmarking across input modalities. - **Architecture**: Transformer encoder with RoPE positional encoding, MQA attention, and modular input adapters - **Training Objective**: Causal language modeling (CLM) - **Checkpoint**: `sam2-epoch35.safetensors` - **Final Train Loss**: 1.04 - **Validation Loss**: Not tracked in this run - **Training Duration**: ~6272s over 35 epochs - **Framework**: PyTorch + Hugging Face Transformers (custom registry) ## 🧱 Model Architecture | Component | Description | |------------------|-----------------------------------------------------------------------------| | Backbone | Transformer encoder with RoPE and MQA | | Input Adapter | Tokenizer-driven byte-level embedding layer | | Positional Bias | Rotary embeddings (RoPE) | | Attention | Multi-query attention (MQA) | | Head | Head-agnostic registry (default: classification placeholder) | | Checkpoint Format| `safetensors` with metadata for reproducibility | ## 🧪 Training Details - **Dataset**: Synthetic chat-style corpus with adversarial prompt patterns - **Batch Size**: 1055 steps per epoch - **Optimizer**: AdamW - **Learning Rate Schedule**: Cosine decay with warmup - **Loss Function**: Cross-entropy over token predictions - **Hardware**: Kaggle TPUv2 (simulated) - **Logging**: Step-wise loss tracking, no validation during training ## 📊 Evaluation | Metric | Value | Notes | |----------------|-------------|---------------------------------------| | Final Train Loss | 1.04 | Achieved at Epoch 35/35 | | Validation Loss | — | Not tracked in this run | | Inference Speed | Fast | Optimized for edge deployment | | Generalisation | TBD | To be compared against Sam‑2.5 | ## 🔧 Intended Use - **Research**: Benchmarking modular architectures and ablation studies - **Education**: Reasoning scaffolds and logic quizzes - **Deployment**: Lightweight agents for chat and multimodal fusion (with adapters) ## 🚫 Limitations - No validation tracking — generalisation must be inferred via external harnesses - Trained on synthetic data — may not generalize to real-world dialogue without fine-tuning - Head is placeholder — downstream tasks require custom head registration ## 📁 Files - `sam2-epoch35.safetensors` — final checkpoint - `config.yaml` — architecture and training config - `tokenizer.json` — byte-level tokenizer - `README.md` — training logs and setup instructions ## 🧩 How to Load ```python from sam2 import build_sam2_model import torch model = build_sam2_model(config="config.yaml") model.load_state_dict(torch.load("sam2-epoch35.safetensors")) model.eval()