You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Modern-Transformer-Decoder-Tiny

A lightweight PyTorch transformer decoder trained for conversational AI with modern architecture features.

Model Details

  • Model Type: Transformer Decoder with Grouped-Query Attention
  • Parameters: 23,744
  • Architecture: Custom implementation combining LLaMA and Qwen-3 features
  • Training: Word-level tokenization on conversation data
  • Format: SafeTensors (ready for further training)

Features

  • Grouped-Query Attention (GQA) for memory efficiency
  • Rotary Position Embeddings (RoPE) for position encoding
  • RMSNorm pre-normalization for training stability
  • SwiGLU activation in feed-forward networks
  • KV caching for efficient inference
  • SafeTensors format for safe loading

Model Architecture

Vocab Size: 35
Hidden Size: 32
Layers: 2
Attention Heads: 2
KV Groups: 1
Max Sequence Length: 32

Usage

Loading the Model

import torch
from safetensors.torch import load_file

# Load SafeTensors weights
weights = load_file("model.safetensors")

# Load your custom model class
from your_code import TransformerDecoder, TransformerConfig

# Load config
import json
with open("config.json", "r") as f:
    config_dict = json.load(f)

config = TransformerConfig(
    vocab_size=config_dict["vocab_size"],
    embed_dim=config_dict["hidden_size"],
    num_layers=config_dict["num_hidden_layers"],
    num_heads=config_dict["num_attention_heads"],
    kv_groups=config_dict["kv_groups"],
    max_seq_len=config_dict["max_position_embeddings"]
)

# Create and load model
model = TransformerDecoder(config)
model.load_state_dict(weights)
model.eval()

Training Further

This model is saved in SafeTensors format, making it easy to:

  • Continue training with your own data
  • Fine-tune for specific tasks
  • Integrate with Hugging Face Transformers
  • Use with other ML frameworks

Training Data

Trained on a small conversational dataset with common patterns:

  • Greetings and responses
  • Question-answer pairs
  • Basic conversational flow
  • Word-level tokenization

Intended Use

  • Research: Study modern transformer architectures
  • Education: Learn about GQA, RoPE, and efficient attention
  • Base Model: Fine-tune for specific conversational tasks
  • Experimentation: Test architectural improvements

Limitations

  • Small vocabulary (35 words)
  • Limited training data
  • Basic tokenization
  • Requires custom model code for loading

Further Training

To continue training:

  1. Load the SafeTensors weights
  2. Prepare your dataset
  3. Use the same architecture configuration
  4. Resume training with appropriate learning rate

Model Source

  • Repository: [Your GitHub Repository]
  • Architecture: Modern Transformer Decoder
  • Implementation: PyTorch with custom layers

Citation

@misc{modern-transformer-decoder,
  title={Modern Transformer Decoder with GQA and RoPE},
  author={Your Name},
  year={2024},
  howpublished={\url{https://huggingface.co/Modern-Transformer-Decoder-Tiny}}
}

License

MIT License - Feel free to use for research and development.

Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support