BwETAF-IID-100M

Boring’s Experimental Transformer for Autoregression (Flax) β€” A 100M parameter autoregressive model built in Flax. Lightweight, chaotic, and surprisingly good (I mean ok). Because who needs sanity when you’ve got tokens to predict?

Trained on determination, fueled by suffering, powered by free TPUs. πŸ”₯


πŸ› οΈ Model Specs

  • Parameters: ~100M
  • Context Window: 512 tokens
  • Dataset: almost on 10M raw sentences (with the first 5M for a second epoch)(WICKED4950/Raw-GPT-traindata) Or a total of about 7.6B tokens
  • Architecture: Custom Transformer
  • Tokenizer: GPT-2
  • Trainer: Hand-coded, Czz... Why not?
  • Final Val loss: Almost at 3.15

Why BwETAF?

  • πŸš€ Built for experimentation: Mess with the architecture guilt-free.
  • ⚑ JAX/Flax optimized: Designed for TPU efficiency (no PyTorch bloat!).
  • πŸŽ“ Educational focus: Learn how transformers work under the hood.
  • πŸ’» Runs on potato hardware: 100M params = no $10k GPU needed.

πŸš€ TPU-Optimized Training Pipeline (Proprietary)

This model was trained using a custom JAX/Flax pipeline optimized for free Google TPUs.

  • Trains 400M-parameter models on free TPUs (batch size ~32, ~177hrs). (In bf16)
  • Has checkpointing, saving, loading, graph plotting, Tokenization functions, Custom dataset formats for less TPU ram usage and an optimized trainer for BwETAF models
  • has a ready to use functions for anyone without touching the core part of how the model works

Interested in the tech? Contact me for consulting/licensing.


⚑ Quickstart

Use pip install BwETAF to install it.

** It does not include a Trainer**

import BwETAF

# You can use this function for quick testing of the model
prompt = "The meaning of life is"
output = BwETAF.SetUpAPI(prompt, "WICKED4950/BwETAF-IID-100M")
print(output)  # Example: "The meaning of life is... (model's actual output)"

# Load from Hugging Face
model = BwETAF.load_hf("WICKED4950/BwETAF-IID-100M")

# Load from local directory
BwETA.load_model("path/to/model")

# Save locally
model.save_model("path/to/save")

# to get the structure and params of the model do
params = model.trainable_variables
structure = model.model_struct

Open an google collab notebook


πŸŽ“ Student-Friendly

As a 17-year-old solo developer, I built this to:

  • Learn how LLMs work at the code level
  • Experiment without corporate constraints
  • Prove you don’t need $10M to train a model Fork this repo and make it your own playground!

πŸ’¬ Important Notes

  • This is experimentalβ€”expect weird bugs and cooler features.
  • It’s meant to be extended and hacked on. Go wild.
  • If it crashes, don't panic...

πŸ“© Reach Out

If you got anything to talk realted to this... Contact me at Instagram


🚧 Upcoming Madness

  • 🧠 BwETAF-400M with the same soul, but beefier body
  • 🧬 Custom layer experimentation (why not rewrite the rules?)
  • 🫠 Sanity?
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train WICKED4950/BwETAF-IID-100M

Collection including WICKED4950/BwETAF-IID-100M