models are in models/
names are model_dimension and n_layers (768-8 is not fully trained, but the loss is pretty flat)
inside models/old/ there are models that were trained on the non-cleaned dataset (with a tokenizer trained on that dataset)(I think all off them are fully trained, but some are missing from my wandb)
tok4096.model is of the cleaned dataset, tok4096_old.model is on the non_cleaned one
train_snakes.py is the training script (you need to change the outdir, d_model and n_layer). It initializes the mamba using the MambaLMHeadModel class.
model.py is where the MambaLMHeadModel class is defined.
context lenght is 256
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.