jarvis-x-50M
A 50M parameter custom model, trained from scratch on CPU for free. Outperforms GPT-2 in factual Q&A, with ~5x faster inference and ~4x less memory. Built with VihaanTransformerBlock, not standard transformers!
Model Details
- Parameters: ~50M
- Architecture: Custom VihaanTransformerBlock (multi-head attention, GELU, lightweight)
- Training Data: WikiText-2 (~2M tokens)
- Vocabulary Size: 50,257 (GPT-2 tokenizer)
- Max Sequence Length: 128
- Training: 3 epochs, ~2,000 steps/epoch on Colab CPU
- Final Loss: ~0.05
Usage
import torch
from model import VihaanNet10x, Config
from transformers import AutoTokenizer
config = Config()
model = VihaanNet10x(config)
model.load_state_dict(torch.load("pytorch_model.bin"))
tokenizer = AutoTokenizer.from_pretrained(".")
model.eval()
Example
Prompt: "Tell me about Rome" Output: "Rome's empire shaped law, architecture, and culture for centuries!"
Author
Created by vihaan134354. India's first custom, CPU-trained model beating GPT-2 in facts!
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.