AbstractPhila PRO
AI & ML interests
Recent Activity
Organizations
Omega Tokens: Finding The Self Solving Frame
Here are the first three surge trained experts. They should encompass almost any need if used correctly.
The image line.
The specific image trained SVAE structures are dubbed;
- SVAE-Fresnel
tiny - 64x64
small - 128x128
base - 256x256 <- cooking current MSE=0.000181 -> Operating CV: 0.3769
large - 512x512 <- upcoming
xl - 1024x1024 <-upcoming
xxl - 2048x2048 <-upcoming
giant - 4096x4096 <-upcoming
The initial Fresnel shows the model can reconstruct images far out of scope at entirely different sizes, entirely never seen images can be fully reconstructed within the same spectrum of MSE as the trained images.
Tests show;
- the Fresnel models can piecemeal images back together at a higher accuracy and lower error rate than running the full model. Tested up to 1024x1024 with near perfect reconstruction. 0.0000029 MSE
Fresnel CANNOT reconstruct noise directly; 1.0~ MSE
The 256x256 variant is cooking right now. The MSE is dropping rapidly and it's nearly as accurate as the 128x128 counterpart with only partial cooking.
The noise line. the specific noise trained SVAE structures;
- SVAE-Johanna
This model is capable of learning and reconstructing noise and this will train a noise compressor that can deconstruct/reconstruct any noise automatically with it.
tiny - 64x64 <-first train faulted, tried 16 types of noise out of the gate, going to restart with curriculum training.
small - 128x128 <-gaussian prototype ready = 0.012 MSE <- back in the oven 16 spectrum noise
small - 128x128 - 16 noise; <- MSE=0.053170 CV=0.4450 -> learning 16 noise types
base - 256x256 <- upcoming
large - 512x512 <- upcoming
xl - 1024x1024 <-upcoming POSSIBLE if large works
Johanna is being trained on 12 types of noise. The MSE is dropping as expected and the noises are in fact being learned and represented to be replicated.
The text line is exactly the same as the others.
-SVAE-Alexandria
Alexandria is meant to encode/decode text in a perfect or near-perfect reconstruction capacity.
AbstractPhil/geolip-SVAE
Epoch 1 test recon error 0.0064
Epoch 2 test recon error 0.0022
Epoch 8 is now 0.000294
Epoch 12 is now 0.000206
Epoch 14 is now 0.000190
Epoch 18 is now 0.000187
Epoch 24 is now 0.000117
Epoch 30 landmark 0.000099
There are NO EXPERTS HERE. This is pure self learning. The model learns the entire behavioral set within 1 epoch to reconstruct imagenet's test set to a useful state. By epoch 12 a recon of 0.000202 recall is now measured. This means, 99.99% accuracy at RECONSTRUCTING the test set through the bottleneck, while simultaneously leaving a trail of centerwise extraction as rich or richer.
ONE epoch. Just one.
Took about 10 minutes to train an already converged epoch, and I set it up for 200 epochs. This model will not need 200 epochs. I'd be surprised if it needs 3.
What you're looking at here, is the emergence of surge resonance. The power of a single epoch when the geometric CV alignment hits the tuning fork of absolute resonant perfection and counterpointed with the concerto's dissonant harmonic response.
I give you, surge resonance.
The metrics will be ready by morning and I'll begin building utilities to figure out what went right and what went wrong.
This model is rewarded when it exists within the geometric spectrum while simultaneously dual punished when leaving. There is no benefit to stray, and the benefit to exist within prevents the model from leaving the validated CV band.
This allows the model to exist perfectly within the tuning fork resonance structure.
The model CONTINUES to refine, even when the CV drift has begun to drift away from home. The model has left home and is now seeking new proximity.
Upcoming training will be the 256x256, 512x512, 1024x1024, and larger if the model holds. Each will be named.
The Geometric Engine: Structural Attractors in Neural Network Weight Space
Create 5m_svae_proto_1024_v3_kl_divergence.py
I see the answer. The behavioral sweep shows CV of 0.29154 between 0.291 and 0.292 are within a very special band of variations.
1024v, 24d - the entire operating spectrum of the T5 series embeddings when alignment differentiated by the configuration. This is effectively a threshold between what works operationally, going beyond this causes degraded behavioral response without attenuated compensation.
So I've managed to finally get the right questions to discover the connection between the fly in the ointment that kept returning, and the structural systems responsible for curating the behavior around it.
Finding: Geometric controlled structures do not require CV loss if the D is within the expected band. To compensate for the dimensional difference with the CV measured, the CV loss must be adjusted to the distillation target.
The vocabulary when established as geometrically valid throughout the lifecycle of the existence. The CV loss is only attuned and useful when running distillation paradigms. The current CV loss has no impact on the CV measured capacity of the embeddings consistent or pretrained.
This effectively allows compartmentalization to any vectorized locality as accumulated throughout a structure, allowing direct
I'll make this brief and to the point.
GEOLIP is an observer system at it's core. It watches, triangulates, and assists with correct answers.
Many experiments worked very well, many fell down and turned into a pile of broken circuits. The recent geometric-transformer being one of my biggest fumbles, still taught me many things about what I'm TRULY trying to accomplish here.
**Save money and lives**. Less hardware use for less need at inference. Train more calculations into a more reusable and accurate structure for near instant zero-shot or sequential inference.
In the process v8 unlocked a missing puzzle piece, EMA trajectory alignment compensation. I'm doing my best to build something that works.
The geolip distillation system is very powerful but requires much experimentation still.
* Genetic experiments were successful
* Data transfer experiments successful
* Analysis experiments successful - and expand large model accuracy
* Many distillation experiments were successful.
* The largest successes being the kernels, the distillation tools, and the geometric analysis systems.
With the good comes the bad, the faulty VITs, the simultaneous trains that fault, the internalized confusion that happens occasionally.
*** The observer NEEDS something to OBSERVE. If the observer observes the progressive development of point cloud structures, it learns how to observe THAT LEARNING PROCESS - drifting fault assessment.
*** In the process it DOES NOT learn how to improve the CE relations by embedding and compensating with anchored triangulation opinions.
BIGGEST CONCLUSION. Staged curriculum training.
These components must be DECOUPLED. One must be a compounding structural awareness beacon, the other must be an informationally aligned composition in a utilizable fashion.
This means stage-by-stage freeze/unfreeze processing. Independent task-oriented structural alignment.