You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Modern-Transformer-Decoder-Tiny

A lightweight PyTorch transformer decoder trained for conversational AI with modern architecture features.

Model Details

Model Type: Transformer Decoder with Grouped-Query Attention
Parameters: 23,744
Architecture: Custom implementation combining LLaMA and Qwen-3 features
Training: Word-level tokenization on conversation data
Format: SafeTensors (ready for further training)

Features

Grouped-Query Attention (GQA) for memory efficiency
Rotary Position Embeddings (RoPE) for position encoding
RMSNorm pre-normalization for training stability
SwiGLU activation in feed-forward networks
KV caching for efficient inference
SafeTensors format for safe loading

Model Architecture

Vocab Size: 35
Hidden Size: 32
Layers: 2
Attention Heads: 2
KV Groups: 1
Max Sequence Length: 32

Usage

Loading the Model

import torch
from safetensors.torch import load_file

# Load SafeTensors weights
weights = load_file("model.safetensors")

# Load your custom model class
from your_code import TransformerDecoder, TransformerConfig

# Load config
import json
with open("config.json", "r") as f:
    config_dict = json.load(f)

config = TransformerConfig(
    vocab_size=config_dict["vocab_size"],
    embed_dim=config_dict["hidden_size"],
    num_layers=config_dict["num_hidden_layers"],
    num_heads=config_dict["num_attention_heads"],
    kv_groups=config_dict["kv_groups"],
    max_seq_len=config_dict["max_position_embeddings"]
)

# Create and load model
model = TransformerDecoder(config)
model.load_state_dict(weights)
model.eval()

Training Further

This model is saved in SafeTensors format, making it easy to:

Continue training with your own data
Fine-tune for specific tasks
Integrate with Hugging Face Transformers
Use with other ML frameworks

Training Data

Trained on a small conversational dataset with common patterns:

Greetings and responses
Question-answer pairs
Basic conversational flow
Word-level tokenization

Intended Use

Research: Study modern transformer architectures
Education: Learn about GQA, RoPE, and efficient attention
Base Model: Fine-tune for specific conversational tasks
Experimentation: Test architectural improvements

Limitations

Small vocabulary (35 words)
Limited training data
Basic tokenization
Requires custom model code for loading

Further Training

To continue training:

Load the SafeTensors weights
Prepare your dataset
Use the same architecture configuration
Resume training with appropriate learning rate

Model Source

Repository: [Your GitHub Repository]
Architecture: Modern Transformer Decoder
Implementation: PyTorch with custom layers

Citation

@misc{modern-transformer-decoder,
  title={Modern Transformer Decoder with GQA and RoPE},
  author={Your Name},
  year={2024},
  howpublished={\url{https://huggingface.co/Modern-Transformer-Decoder-Tiny}}
}

License

MIT License - Feel free to use for research and development.

Downloads last month: 26

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support