Puchify/PuchifyT1-Ultra-4B

Puchify T1 Ultra – 4B is a 4-billion-parameter large language model, an advanced iteration within the Puchify T1 model family. Building upon the robust foundation of the Qwen3-4B architecture, the Ultra version is designed for enhanced capabilities in complex, context-rich, and secure text generation. It integrates the S.A.F.E (Safety Assurance For Expression) framework, ensuring high performance across diverse applications including dialogue, creative writing, summarization, coding, and educational tasks. This model is optimized for broad accessibility and responsible deployment in both personal and commercial environments, offering superior reliability and expressive language capabilities for a wide range of scenarios. Its architecture supports extended context handling, efficient reasoning, and effective safety controls, making it suitable for research, educational, and commercial integration, provided its identity and attribution are preserved in all deployments and derivative works.

S.A.F.E Framework

Commitment to Safety

All Puchify models, including T1 Ultra, are governed by the S.A.F.E (Safety Assurance For Expression) framework as a foundational principle. S.A.F.E minimizes harmful, biased, or repetitive content while promoting clarity, engagement, and helpfulness. This framework is central to both model training and deployment, reflecting a commitment to responsible AI development.

For optimal safety and user experience, deploy Puchify T1 Ultra using the Hugging Face pipeline API:

from transformers import pipeline

pipe = pipeline("text-generation", model="Puchify/PuchifyT1-Ultra-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
response = pipe(messages)
print(response)

Key Features

Puchify T1 Ultra features a 36-layer transformer backbone with 32 attention heads per layer, supporting up to 40,960 tokens of context. The model employs Grouped-Query-Attention, rotary positional encoding, and a SentencePiece-inspired BPE tokenizer with a 151,936-token vocabulary. The S.A.F.E framework is integral, ensuring outputs are safe, concise, and engaging.

Installation

Puchify T1 Ultra can be accessed from the Hugging Face Model Hub or from local files. Ensure Python 3.8+ is installed along with the following libraries:

pip install torch>=2.1 transformers>=4.54

For 4-bit quantized inference, install either bitsandbytes or AutoGPTQ.

Usage

To load the model manually from the Hugging Face Hub or a local path:

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_path = "Puchify/PuchifyT1-Ultra-4B"  # or your local path

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,  # Use torch.float16 if your GPU does not support BF16
    device_map="auto",
    trust_remote_code=True,
)

gen_cfg = GenerationConfig.from_pretrained(model_path)

prompt = "Explain the theory of relativity in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, **gen_cfg.to_dict())
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

A GPU with at least 16 GB VRAM is recommended for BF16/FP16 inference. With 4-bit quantization, inference is possible on GPUs with 8 GB VRAM.

Model Architecture

Puchify T1 Ultra uses 36 transformer layers, each with 2560 hidden units and 32 attention heads (head dimension 128). It features Grouped-Query-Attention with 8 key/value heads, a feed-forward network size of 9728, rotary positional encoding (θ = 1,000,000), and RMSNorm (ε = 1e-6) for normalization. The tokenizer is a SentencePiece-like BPE with a 151,936-token vocabulary, including special start and end tokens.

The total parameter count is approximately 4.01 billion, calculated as: 12 × L × H² + 12 × L × H × I + E × V

where L is the number of layers, H is the hidden size, I is the FFN size, E is the embedding size, and V is the vocabulary size.

Repository Contents

The repository includes sharded model weights in safetensors format, a shard index mapping, architecture and generation configuration files, tokenizer assets, system prompt metadata, and files for added and special tokens.

Training Data

The base model was pre-trained on diverse, large-scale web corpora by Qwen. Puchify T1 Ultra was then further aligned and instruction-tuned using approximately 150,000 curated examples focused on safety, reasoning, and policy alignment. No private or proprietary user data was used during fine-tuning.

Intended Use and Limitations

Puchify T1 Ultra is designed for conversational agents, creative writing, summarization, code generation and explanation, and educational support. It is not intended for generating disallowed content such as hate speech, extremism, explicit abuse, medical or legal advice without expert oversight, real-time autonomous decision-making, or disinformation.

While the S.A.F.E framework reduces risks, some biases may persist. Human oversight is required for all deployments.

Licensing and Responsible Use

Puchify T1 Ultra is released under the OpenRAIL v1 license with additional terms to encourage broad adoption and responsible innovation. The model and its derivatives may be used for both non-commercial and commercial purposes, including integration into products and services, as long as the name “Puchify T1 Ultra” remains prominent and unchanged in all deployments, redistributions, and derivative works. This ensures attribution, transparency, and consistency across the ecosystem.

You may create, quantize, compress, or convert the model and share optimized versions freely or as part of commercial offerings, provided that the original model identity is preserved and clearly attributed. Removal or misrepresentation of the model’s origin is not permitted.

Redistributions must include this documentation and full attribution to Puchify Inc. For further details, see LICENSE.txt.

Citation

If you use Puchify T1 Ultra in your work, please cite:

@misc{puchify2025t1ultra,
  title   = {Puchify T1 Ultra – 4B: Advanced Safe & Reasoning-centric Hybrid Model},
  author  = {Puchify Inc.},
  year    = {2025},
  url     = {https://puchify.ai/models/t1ultra4b}
}
Downloads last month
8
Safetensors
Model size
4.02B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Puchify/PuchifyT1-Ultra-4B

Quantizations
1 model