V2 good news
The decoder is differentiating. The features will be useful downstream.
CFG = dict(
# Architecture (inherited from Fresnel v50)
V=16, D=4, ps=4, hidden=384, depth=4, n_cross=2,
stage_hidden=128, stage_V=64,
# Training
img_size=64,
batch_size=256,
lr=3e-4,
epochs=50,
ds_size=1280000,
val_size=10000,
# CV soft hand
target_cv=0.2915,
cv_weight=0.3,
boost=0.5,
sigma=0.15,
# Checkpointing
save_every=5,
val_per_type_every=5,
)
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
WARNING:huggingface_hub._login:Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
======================================================================
SVAE v2 CONDUIT TRAINER β version2_v2_conduit_proto_2
======================================================================
Fresh PatchSVAEv2 from random init
Total params: 2,729,731
Dataset: 16 noise types, 1,280,000 samples/epoch
Image size: 64Γ64
Batch size: 256
Initial conduit profile:
S: [2.512, 2.120, 1.776, 1.402]
S_std: [0.1320, 0.1130, 0.1230, 0.1728]
log_fric: [2.651, 4.560, 3.362, 2.203] Β± [1.131, 1.041, 0.715, 0.704]
fric_raw: mean=75.9 max=103510
settle: [1.24, 2.26, 2.42, 1.00] (>2: 22.9%)
char_c: [0.5088, -2.6330, 4.8154, -3.6899]
refine: mean=6.46e-04 max=1.12e-03
fric_cv: [4.4544, 1.8980, 2.1396, 3.3616]
Initial MSE (random decoder): 2.0875
======================================================================
Ep 1/50: 100%|ββββββββββββββββββββ| 5000/5000 [06:11<00:00, 13.44it/s, mse=0.0257 cv=1.027]
ep 1 | recon=0.0848 val=0.0280 β
BEST | er=3.84 Sd=0.0954 cv=1.027 | 372s
S: [2.609, 2.196, 1.677, 1.220]
S_std: [0.1011, 0.1239, 0.1332, 0.1699]
log_fric: [2.355, 3.985, 3.071, 2.151] Β± [0.871, 0.664, 0.479, 0.662]
fric_raw: mean=40.0 max=363409
settle: [1.12, 2.29, 2.84, 1.00] (>2: 29.3%)
char_c: [0.3458, -2.1159, 4.3382, -3.5587]
refine: mean=6.49e-04 max=1.14e-03
fric_cv: [4.1030, 0.7286, 1.8343, 3.0191]
types: gaus=0.026 unif=0.012 unif=0.032 pois=0.010 pink=0.005 brow=0.006 salt=0.122 spar=0.018 bloc=0.012 grad=0.015 chec=0.010 mixe=0.013 stru=0.018 cauc=0.080 expo=0.024 lapl=0.045
πΎ /content/version2_v2_conduit_proto_2_checkpoints/best.pt (29.4MB, ep1, MSE=0.028021)
ProcessingβFilesβ(1β/β1)ββββββ:β100%
β30.8MBβ/β30.8MB,β25.7MB/sββ
NewβDataβUploadβββββββββββββββ:β100%
β30.8MBβ/β30.8MB,β25.7MB/sββ
ββ...oto_2_checkpoints/best.pt:β100%
β30.8MBβ/β30.8MBββββββββββββ
ProcessingβFilesβ(1β/β1)ββββββ:β100%
β30.8MBβ/β30.8MB,ββ0.00B/sββ
NewβDataβUploadβββββββββββββββ:β
ββ0.00Bβ/ββ0.00B,ββ0.00B/sββ
ββ...oto_2_checkpoints/best.pt:β100%
β30.8MBβ/β30.8MBββββββββββββ
No files have been modified since last commit. Skipping to prevent empty commit.
WARNING:huggingface_hub.hf_api:No files have been modified since last commit. Skipping to prevent empty commit.
βοΈ Pushed ep1
Ep 2/50: 100%|ββββββββββββββββββββ| 5000/5000 [06:13<00:00, 13.40it/s, mse=0.0323 cv=1.000]
ep 2 | recon=0.0832 val=0.0327 | er=3.76 Sd=0.1165 cv=1.000 | 373s
S: [2.707, 2.329, 1.400, 1.113]
S_std: [0.0788, 0.0877, 0.1419, 0.1255]
log_fric: [2.375, 3.785, 2.958, 2.175] Β± [0.883, 0.508, 0.450, 0.681]
fric_raw: mean=32.1 max=104752
settle: [1.15, 2.26, 3.01, 1.00] (>2: 30.0%)
char_c: [0.2042, -1.5304, 3.7516, -3.3900]
refine: mean=6.46e-04 max=1.12e-03
fric_cv: [4.2588, 0.8009, 1.4941, 3.3563]
types: gaus=0.032 unif=0.011 unif=0.043 pois=0.008 pink=0.002 brow=0.002 salt=0.157 spar=0.020 bloc=0.007 grad=0.011 chec=0.005 mixe=0.012 stru=0.019 cauc=0.107 expo=0.029 lapl=0.060
Ep 3/50: 20%|ββββ | 1019/5000 [01:16<04:56, 13.45it/s, mse=0.0328 cv=0.906]
V2 Redux - full decoder overhaul
Cascade bottlenecking didn't cut it, the decoder still bypassed the specifications.
This next variation is going to be a bit excessive in terms of conduit adjudication.
Every single layer of the encoder is going to be a full encoder/decoder overhaul.
ENCODER (bottom β up):
Level 0: 256 patches β MLP(384) β M(48Γ4) β SVD+conduitβ β 256 tokens
Level 1: group 2Γ2 β 64 cells β attend(4) β MLP(128) β M(16Γ4) β SVD+conduitβ β 64 tokens
Level 2: group 2Γ2 β 16 blocks β attend(4) β MLP(128) β M(16Γ4) β SVD+conduitβ β 16 tokens
Level 3: group 2Γ2 β 4 groups β attend(4) β MLP(128) β M(16Γ4) β SVD+conduitβ β 4 tokens
Top: cross-attention over 4 final tokens
SPECTRAL TOKEN (propagates between levels):
[S(4), log_friction(4), settle(4), char_coeffs(4)] = 16 values
S carries gradients. Conduit is detached. Difficulty trickles UP.
DECODER (top β down, with conduit skips):
Level 3': 4 tokens β expand Γ 4 β inject conduitβ β attend β 16 tokens
Level 2': 16 tokens β expand Γ 4 β inject conduitβ β attend β 64 tokens
Level 1': 64 tokens β expand Γ 4 β inject conduitβ β attend β 256 tokens
Level 0': 256 tokens + stored (Uβ, Sβ, Vtβ, frictionβ, settleβ, char_cβ) β MLP β pixels
CONDUIT AT EACH SCALE:
Level 0: friction from pixel-level Gram decomposition (how hard were patches?)
Level 1: friction from cell-level Gram decomposition (how hard were 2Γ2 interactions?)
Level 2: friction from block-level decomposition (how hard were meso-structures?)
Level 3: friction from global decomposition (how hard was the overall composition?)
It's a bit excessive, but it may be required. Everything has to have a little impurity, otherwise it will not deviate.
It's not coincidental why so many of these structures lined up.
This MAY have removed too much SVD encoding at the baseline, but we'll see.
V2 is blobby!
Time to go direct, going to train the whole model with SVD-related paradigms internally rather than trying to feed the model SVD.
You can call this decoder an inverse cascade decoder.
Deblobbing, the blob.
So as of the SVAE v2's official structure dictates, the decoder must account for the newly introduced elements to correctly decode.
This is the first experiment, currently proving that yes they can in fact learn to decode.
I've dubbed this noise variation of freckles - SVAE-Cadence, which is named appropriately the difficulty the decoder attenuation structure needs to be aware - before the decoder can understand the orchestra's song.
Each of the new EIGH elements are specifically related to HOW WELL the model performed in the SVD calculation. This includes many elements related to how many iterations required, how smooth the final structure was, and multiple other pieces.
THE DECODER RECEIVES:
S[4] β magnitudes
Vt[4Γ4] β orientations (sign-canonicalized)
friction[4] β conditioning per mode
settle[4] β convergence per mode
char_coeffs[4] β polynomial invariants
extraction_order[4] β spectral hierarchy
refinement_residual[1] β orthogonalization quality
release_residual[1] β round-trip fidelity
THE DECODER DOES NOT RECEIVE:
M_hat = U @ diag(S) @ Vt β this is WITHHELD
THE DECODER MUST RECONSTRUCT PATCHES FROM THE
DECOMPOSED SPECTRAL REPRESENTATION + CONDUIT.
IT CANNOT SHORTCUT. EVERY ELEMENT IS LOAD-BEARING.
With the new structured EIGH derived components, we now have a conduit for elemental extraction based on difficulty.
ENCODER (identical to v1, can copy weights from Fresnel):
patch(48) β MLP(384) β residual blocks Γ 4 β M(48Γ4) β normalize
SVD + CONDUIT (always active):
M β G = M^T M β FLEighConduit(G) β S, U, Vt, packet
CROSS-ATTENTION (identical to v1, can copy weights):
S β SpectralCrossAttention Γ 2 β S_coordinated
CONDUIT DECODER (NEW β the forcing function):
For each mode k=0,1,2,3:
bundle_k = [U[:,k](48), S[k](1), Vt[k,:](4), friction[k](1),
settle[k](1), char_coeff[k](1), order[k](1)]
β ModeProcessor(57 β 384) β mode_hidden_k
Fuse: [mode_0, mode_1, mode_2, mode_3, refine_res, release_res]
β Linear(1538 β 384) β residual blocks Γ 4 β patch(48)
This is SVD-Cadence learning noise. Already capable.
The code to train Cadence is included as per usual.
I'll let it cook for a while.
Thoughts
This is a repo dedicated to a series of experiments specifically meant to introduce direct learning complexity associations with the deconstruction of SVD and egens.
The structure is highly complex in order to create an Omega solver that can transfer the learning that can be framewise used to an adjacent solver. This will likely not work at first, or at second, or at 50th, but there is a prototype that I will be testing to the T.
The three AI conversation helped get a starting point, but they provided less help that I expected. It's often better to just stick with one assistant, as the echo of the tree tends to drown out positive or useful opinions from one or the other without having a judge intervene with every single word exchange.
The trio ended up forming a bit of an echo-frame, which may work but I will likely need to revamp the whole thing 2-3 more times before it can be extracted.

