jarvis-x-50M

A 50M parameter custom model, trained from scratch on CPU for free. Outperforms GPT-2 in factual Q&A, with ~5x faster inference and ~4x less memory. Built with VihaanTransformerBlock, not standard transformers!

Model Details

  • Parameters: ~50M
  • Architecture: Custom VihaanTransformerBlock (multi-head attention, GELU, lightweight)
  • Training Data: WikiText-2 (~2M tokens)
  • Vocabulary Size: 50,257 (GPT-2 tokenizer)
  • Max Sequence Length: 128
  • Training: 3 epochs, ~2,000 steps/epoch on Colab CPU
  • Final Loss: ~0.05

Usage

import torch
from model import VihaanNet10x, Config
from transformers import AutoTokenizer

config = Config()
model = VihaanNet10x(config)
model.load_state_dict(torch.load("pytorch_model.bin"))
tokenizer = AutoTokenizer.from_pretrained(".")
model.eval()

Example

Prompt: "Tell me about Rome" Output: "Rome's empire shaped law, architecture, and culture for centuries!"

Author

Created by vihaan134354. India's first custom, CPU-trained model beating GPT-2 in facts!


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support