โš™๏ธ BwETAF-IID-400M โ€” Model Card

Boringโ€™s Experimental Transformer for Autoregression (Flax)
A 378M parameter autoregressive transformer built with a custom training pipeline and questionable life choices.

Trained on determination, fueled by suffering, powered by free TPUs. ๐Ÿ”ฅ


๐Ÿ“Š Model Overview

  • Name: BwETAF-IID-400M
  • Parameters: 378,769,408
  • Tokens Seen: 6,200,754,176
  • Training Time: 63,883.53 sec
  • Framework: Flax + JAX
  • Context Window: 512 tokens
  • Tokenizer: GPT-2 BPE (50,257)
  • Positional Encoding: Sin/Cos
  • Activation Function: SwiGLU
  • Final Validation Loss: ~3.4

๐Ÿ“ˆ Training & Validation Loss

Training Loss

Training Loss

Validation Loss

Val loss


๐Ÿค” Why BwETAF?

  • โš™๏ธ Built from scratch โ€” No hugging-face trainer shortcuts here.
  • ๐Ÿ”ฌ Flexible architecture โ€” Swap blocks, change depths, scale it how you want.
  • ๐Ÿงช Experimental core โ€” Try weird ideas without breaking a corporate repo.
  • โšก TPU-optimized โ€” Trained on free Google TPUs with custom memory-efficient formats.
  • ๐Ÿ“ฆ Lightweight-ish โ€” You can actually run this model without a data center.

โšก Quickstart

pip install BwETAF==0.4.2
import BwETAF

# Quick API testing
prompt = "The meaning of life is"
output = BwETAF.SetUpAPI(prompt, "WICKED4950/BwETAF-IID-400M")
print(output)

# Load from Hugging Face
model = BwETAF.load_hf("WICKED4950/BwETAF-IID-400M")

# Load from local
BwETAF.load_model("path/to/model")

# Save the model
model.save_model("path/to/save")

# View params & structure
params = model.trainable_variables
structure = model.model_struct

โ˜๏ธ Google collab notes not updated for now


๐Ÿšง Known Limitations

  • Did not meet target benchmark (aimed for โ‰ค2.7, got ~3.4)
  • No fine-tuning or task-specific optimization
  • Early stopping due to saturation
  • Works, but wonโ€™t win any LLM trophies (yet)

๐Ÿ“ฉ Reach Out

Got questions, bugs, or chaos to share? Ping me on Instagram: Here I like weird LLM experiments and random ML convos ๐Ÿ’ฌ


๐Ÿ”ฎ Upcoming Experiments

  • ๐Ÿš€ BwETAF-IID-1B: Scaling this mess further
  • ๐Ÿงฌ Layer rewrite tests: Because the FFN deserves some drama
  • ๐ŸŒ€ Rotary + sparse attention tests
  • ๐Ÿงƒ Trying norm variations for training stability
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train WICKED4950/BwETAF-IID-400M

Collection including WICKED4950/BwETAF-IID-400M