Contains files for a Transformer model that answers 6-digit subtraction questions (e.g. 123450-345670=-0123230) with very low loss (1e-8).
This subtraction model has 3 layers, 4 attention heads, d-model = 510, d-head = 170. The subtraction model was initialised with a very-low-loss Addition model (2 layers, 3 attention heads, 9e-9 loss), before being trained for 45K epochs.
The CoLab used to train the model is here: https://github.com/apartresearch/Verified_addition/blob/main/assets/Accurate_Math_Train.ipynb
The CoLab used to analyse the model is here: https://github.com/apartresearch/Verified_addition/blob/main/assets/Accurate_Math_Analyse.ipynb