AbstractPhila PRO
AI & ML interests
Recent Activity
Organizations
I have rebuilt the SVAE into the core formula.
It is now a direct representation of the spherical aleph-void behavior, replicating more purely the directions achieved by the SVAE with a direct utilizable codebook and a pure access to cosine-similarity trained clean into the system.
With this the params dropped a large portion, and the model is to be renamed;
geolip-aleph-void
With this model's discovery and emergence, the SVAE experiment line is pushed to the aleph-void variant.
The SVAE batteries and model will continue to be supported and additional tests will be performed in the immediate and distant future on the model to improve the transformer with the upcoming behavior and math earned through training this new model. The weights and system of the SVAE will continue to be directly supported.
The SVAE itself is a FANTASTIC research tool. It can be fed many many elements and analyzed, the results utilized empirically and usefully. There are no limits to the model's utility in research, and utility in pragmatic use-case when hooked to models directly, in use for compression, and in use for generation - all because the model's format WILL converge the recon if the data is fed within the space as per the 1000+ tests show.
The geolip-aleph-void will represent the natural evolution of this model into the block-useful state. The construction and utility of this model will be directly aligned to a high complexity multi-scalar storage, entirely dependent on emergent aleph behavior in conjunction with a void-driven codebook inference. The combined behavior provides a multi-tiered high complexity geometric feature that will be utilized in upcoming distillations from large pretrained models into highly compressed atomic-sized models.
The two models share the encoder structure, and multiple elements, so they can be directly compared. No apples to oranges, direct comparison. They are siblings, and belong together.
The true power of the aleph stems from the decoder process. The representational utilization is deviant on many spectrums, and the process allows for robust representational utilization of the encoder structure with the capacity to recon if necessary.
Reading the Voids: Topological Contribution Signals in Frozen Geometric Codebooks
massive grin
That is a perfect convergence.
Now... lets see what happens when we snap a microcosm-sized battery, to a macrocosm shape. All the way up to d256.
It begins. The light is scaled to parity.
The transformer is much cleaner now. The scaling rule is no longer just a rule, it can be considered an IO scaling guarantee based on the input and output formats tested with the SVAE lens.
Larger lens can handle the canon variation, which preserves a different format of structure than the signed variation. They are both macrostructural representations of the same internal shape, one contains negations one contains perfectly constrained space.
The first of the two will be uploaded momentarily.
==========================================================================================
[sign-test] lens_sign='canon'
==========================================================================================
geolip-svae-v2 TWO-RECON | battery(PatchSVAE) 52,131 + shell 6,309,900 (shell 121x the microcosm) + growth 0 | D_base4->D_lens256 V32 ps2 | cuda
battery -> INTERNAL recon (pure MSE) | shell -> EXTERNAL recon (detached stem, lens_sign=canon) | adam lr=0.001 wd=0 sched=onecycle
growth : parked (no stencil)
[ByteTrigramDataset] Loading corpus wikitext-2-raw-v1...
[ByteTrigramDataset] Corpus: 10,938,611 bytes (10.9 MB), 768 bytes/image, 14,242 non-overlapping images available (10,937,843 valid window starts)
epoch 0: int[mse=0.01549 rec=12.93%] ext[mse=0.01293 rec=11.45%] | cc=-1.49 rig=0.0000 a=0.024 env=True
epoch 1: int[mse=0.00053 rec=30.25%] ext[mse=0.00063 rec=27.02%] | cc=-1.48 rig=0.0000 a=0.024 env=True
epoch 2: int[mse=0.00013 rec=48.89%] ext[mse=0.00020 rec=27.76%] | cc=-1.47 rig=0.0000 a=0.024 env=True
epoch 3: int[mse=0.00007 rec=41.68%] ext[mse=0.00014 rec=33.32%] | cc=-1.47 rig=0.0000 a=0.025 env=True
epoch 4: int[mse=0.00008 rec=50.93%] ext[mse=0.00013 rec=43.52%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 5: int[mse=0.00009 rec=68.53%] ext[mse=0.00015 rec=53.88%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 6: int[mse=0.00009 rec=70.30%] ext[mse=0.00015 rec=55.59%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 7: int[mse=0.00009 rec=14.55%] ext[mse=0.00014 rec=15.87%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 8: int[mse=0.00008 rec=70.50%] ext[mse=0.00013 rec=59.38%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 9: int[mse=0.00008 rec=40.62%] ext[mse=0.00012 rec=34.12%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 10: int[mse=0.00007 rec=40.38%] ext[mse=0.00010 rec=42.45%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 11: int[mse=0.00006 rec=56.09%] ext[mse=0.00008 rec=49.56%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 12: int[mse=0.00005 rec=30.38%] ext[mse=0.01204 rec=23.14%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 13: int[mse=0.00004 rec=58.75%] ext[mse=0.00019 rec=35.80%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 14: int[mse=0.00004 rec=64.02%] ext[mse=0.00014 rec=40.96%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 15: int[mse=0.00003 rec=51.41%] ext[mse=0.00012 rec=32.35%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 16: int[mse=0.00003 rec=54.11%] ext[mse=0.00010 rec=36.09%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 17: int[mse=0.00002 rec=53.60%] ext[mse=0.00008 rec=39.52%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 18: int[mse=0.00002 rec=40.67%] ext[mse=0.00006 rec=28.57%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 19: int[mse=0.00002 rec=84.62%] ext[mse=0.00005 rec=63.56%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 20: int[mse=0.00001 rec=83.90%] ext[mse=0.00004 rec=64.16%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 21: int[mse=0.00001 rec=88.53%] ext[mse=0.00003 rec=71.30%] | cc=-1.45 rig=0.0000 a=0.025 env=True
epoch 22: int[mse=0.00001 rec=67.54%] ext[mse=0.00002 rec=44.91%] | cc=-1.45 rig=0.0000 a=0.025 env=True
epoch 23: int[mse=0.00001 rec=95.63%] ext[mse=0.00002 rec=83.82%] | cc=-1.45 rig=0.0000 a=0.025 env=True
epoch 24: int[mse=0.00001 rec=98.24%] ext[mse=0.00001 rec=91.46%] | cc=-1.44 rig=0.0000 a=0.025 env=True
epoch 25: int[mse=0.00000 rec=97.58%] ext[mse=0.00001 rec=91.81%] | cc=-1.44 rig=0.0000 a=0.025 env=True
epoch 26: int[mse=0.00000 rec=96.54%] ext[mse=0.00001 rec=87.23%] | cc=-1.43 rig=0.0000 a=0.025 env=True
epoch 27: int[mse=0.00000 rec=99.11%] ext[mse=0.00001 rec=95.73%] | cc=-1.43 rig=0.0000 a=0.025 env=True
epoch 28: int[mse=0.00000 rec=98.92%] ext[mse=0.00000 rec=94.47%] | cc=-1.42 rig=0.0000 a=0.025 env=True
epoch 29: int[mse=0.00000 rec=99.44%] ext[mse=0.00000 rec=98.19%] | cc=-1.42 rig=0.0000 a=0.025 env=True
epoch 30: int[mse=0.00000 rec=99.49%] ext[mse=0.00000 rec=98.55%] | cc=-1.41 rig=0.0000 a=0.025 env=True
epoch 31: int[mse=0.00000 rec=99.56%] ext[mse=0.00000 rec=98.76%] | cc=-1.41 rig=0.0000 a=0.025 env=True
epoch 32: int[mse=0.00000 rec=99.61%] ext[mse=0.00000 rec=99.03%] | cc=-1.40 rig=0.0000 a=0.025 env=True
epoch 33: int[mse=0.00000 rec=99.65%] ext[mse=0.00000 rec=99.19%] | cc=-1.40 rig=0.0000 a=0.025 env=True
epoch 34: int[mse=0.00000 rec=99.67%] ext[mse=0.00000 rec=99.29%] | cc=-1.40 rig=0.0000 a=0.025 env=True
epoch 35: int[mse=0.00000 rec=99.70%] ext[mse=0.00000 rec=99.35%] | cc=-1.39 rig=0.0000 a=0.025 env=True
epoch 36: int[mse=0.00000 rec=99.71%] ext[mse=0.00000 rec=99.39%] | cc=-1.39 rig=0.0000 a=0.025 env=True
epoch 37: int[mse=0.00000 rec=99.72%] ext[mse=0.00000 rec=99.41%] | cc=-1.39 rig=0.0000 a=0.025 env=True
epoch 38: int[mse=0.00000 rec=99.72%] ext[mse=0.00000 rec=99.42%] | cc=-1.39 rig=0.0000 a=0.025 env=True
epoch 39: int[mse=0.00000 rec=99.72%] ext[mse=0.00000 rec=99.43%] | cc=-1.39 rig=0.0000 a=0.025 env=True
best recovery — internal 99.72% | external 99.43% | ckpt sign_test_canon/sign_canon.pt
==========================================================================================
[sign-test] lens_sign='signed'
==========================================================================================
geolip-svae-v2 TWO-RECON | battery(PatchSVAE) 52,131 + shell 6,309,900 (shell 121x the microcosm) + growth 0 | D_base4->D_lens256 V32 ps2 | cuda
battery -> INTERNAL recon (pure MSE) | shell -> EXTERNAL recon (detached stem, lens_sign=signed) | adam lr=0.001 wd=0 sched=onecycle
growth : parked (no stencil)
[ByteTrigramDataset] Loading corpus wikitext-2-raw-v1...
[ByteTrigramDataset] Corpus: 10,938,611 bytes (10.9 MB), 768 bytes/image, 14,242 non-overlapping images available (10,937,843 valid window starts)
epoch 0: int[mse=0.01550 rec=12.86%] ext[mse=0.00944 rec=16.90%] | cc=-1.49 rig=0.0000 a=0.024 env=True
epoch 1: int[mse=0.00052 rec=30.22%] ext[mse=0.00028 rec=37.08%] | cc=-1.48 rig=0.0000 a=0.024 env=True
epoch 2: int[mse=0.00013 rec=48.07%] ext[mse=0.00009 rec=36.29%] | cc=-1.47 rig=0.0000 a=0.024 env=True
epoch 3: int[mse=0.00008 rec=56.08%] ext[mse=0.00008 rec=54.79%] | cc=-1.47 rig=0.0000 a=0.024 env=True
epoch 4: int[mse=0.00008 rec=47.01%] ext[mse=0.00009 rec=38.59%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 5: int[mse=0.00009 rec=64.35%] ext[mse=0.00010 rec=61.49%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 6: int[mse=0.00009 rec=57.33%] ext[mse=0.00010 rec=55.95%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 7: int[mse=0.00009 rec=25.34%] ext[mse=0.00009 rec=17.40%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 8: int[mse=0.00009 rec=27.43%] ext[mse=0.00008 rec=28.23%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 9: int[mse=0.00008 rec=72.23%] ext[mse=0.00007 rec=76.08%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 10: int[mse=0.00007 rec=15.55%] ext[mse=0.00006 rec=19.54%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 11: int[mse=0.00006 rec=60.12%] ext[mse=0.00005 rec=62.56%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 12: int[mse=0.00005 rec=59.56%] ext[mse=0.00005 rec=60.09%] | cc=-1.46 rig=0.0000 a=0.024 env=True
epoch 13: int[mse=0.00004 rec=44.68%] ext[mse=0.01709 rec=23.43%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 14: int[mse=0.00004 rec=66.30%] ext[mse=0.00010 rec=53.38%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 15: int[mse=0.00003 rec=27.36%] ext[mse=0.00005 rec=34.04%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 16: int[mse=0.00003 rec=52.60%] ext[mse=0.00004 rec=43.46%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 17: int[mse=0.00002 rec=76.24%] ext[mse=0.00004 rec=66.86%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 18: int[mse=0.00002 rec=55.56%] ext[mse=0.00003 rec=45.13%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 19: int[mse=0.00002 rec=86.47%] ext[mse=0.00003 rec=76.29%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 20: int[mse=0.00001 rec=69.85%] ext[mse=0.00002 rec=60.43%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 21: int[mse=0.00001 rec=64.09%] ext[mse=0.00002 rec=52.68%] | cc=-1.45 rig=0.0000 a=0.026 env=True
epoch 22: int[mse=0.00001 rec=87.53%] ext[mse=0.00001 rec=77.76%] | cc=-1.45 rig=0.0000 a=0.026 env=True
epoch 23: int[mse=0.00001 rec=86.32%] ext[mse=0.00001 rec=80.48%] | cc=-1.45 rig=0.0000 a=0.026 env=True
epoch 24: int[mse=0.00001 rec=90.38%] ext[mse=0.00001 rec=83.52%] | cc=-1.44 rig=0.0000 a=0.026 env=True
epoch 25: int[mse=0.00000 rec=91.73%] ext[mse=0.00001 rec=82.42%] | cc=-1.44 rig=0.0000 a=0.026 env=True
epoch 26: int[mse=0.00000 rec=99.00%] ext[mse=0.00001 rec=98.25%] | cc=-1.43 rig=0.0000 a=0.027 env=True
epoch 27: int[mse=0.00000 rec=98.97%] ext[mse=0.00000 rec=97.66%] | cc=-1.43 rig=0.0000 a=0.027 env=True
epoch 28: int[mse=0.00000 rec=99.38%] ext[mse=0.00000 rec=98.97%] | cc=-1.42 rig=0.0000 a=0.027 env=True
epoch 29: int[mse=0.00000 rec=99.47%] ext[mse=0.00000 rec=99.26%] | cc=-1.42 rig=0.0000 a=0.027 env=True
epoch 30: int[mse=0.00000 rec=99.54%] ext[mse=0.00000 rec=99.35%] | cc=-1.41 rig=0.0000 a=0.027 env=True
epoch 31: int[mse=0.00000 rec=99.59%] ext[mse=0.00000 rec=99.44%] | cc=-1.41 rig=0.0000 a=0.027 env=True
epoch 32: int[mse=0.00000 rec=99.63%] ext[mse=0.00000 rec=99.53%] | cc=-1.40 rig=0.0000 a=0.027 env=True
epoch 33: int[mse=0.00000 rec=99.66%] ext[mse=0.00000 rec=99.58%] | cc=-1.40 rig=0.0000 a=0.027 env=True
epoch 34: int[mse=0.00000 rec=99.69%] ext[mse=0.00000 rec=99.61%] | cc=-1.40 rig=0.0000 a=0.027 env=True
epoch 35: int[mse=0.00000 rec=99.71%] ext[mse=0.00000 rec=99.63%] | cc=-1.40 rig=0.0000 a=0.027 env=True
epoch 36: int[mse=0.00000 rec=99.73%] ext[mse=0.00000 rec=99.65%] | cc=-1.39 rig=0.0000 a=0.027 env=True
epoch 37: int[mse=0.00000 rec=99.73%] ext[mse=0.00000 rec=99.67%] | cc=-1.39 rig=0.0000 a=0.027 env=True
epoch 38: int[mse=0.00000 rec=99.74%] ext[mse=0.00000 rec=99.67%] | cc=-1.39 rig=0.0000 a=0.027 env=True
epoch 39: int[mse=0.00000 rec=99.74%] ext[mse=0.00000 rec=99.67%] | cc=-1.39 rig=0.0000 a=0.027 env=True
best recovery — internal 99.74% | external 99.67% | ckpt sign_test_signed/sign_signed.pt
================================================================
SIGN TEST — best recovery (internal / external)
================================================================
lens_sign=canon internal=99.72% external=99.43%
lens_sign=signed internal=99.74% external=99.67%
The new tests show the external shell when aligned with the internal MSE using decoupled behavior, align cleanly to the new process of adjudication. This means the canon was flawed, and the sign-preserving structural boundaries are in fact the most utilizable state.
The alephs REQUIRE the signs to be preserved to retain the geometric structure, no exceptions.
==========================================================================================
[sign-test] lens_sign='canon'
==========================================================================================
geolip-svae-v2 TWO-RECON | battery(PatchSVAE) 52,131 + shell 44,204 + growth 0 | D_base4->D_lens16 V32 ps2 | cuda
battery -> INTERNAL recon (pure MSE) | shell -> EXTERNAL recon (detached stem, lens_sign=canon) | adam lr=0.001 wd=0 sched=onecycle
growth : parked (no stencil)
[ByteTrigramDataset] Loading corpus wikitext-2-raw-v1...
[ByteTrigramDataset] Corpus: 10,938,611 bytes (10.9 MB), 768 bytes/image, 14,242 non-overlapping images available (10,937,843 valid window starts)
epoch 0: int[mse=0.01549 rec=12.89%] ext[mse=0.02859 rec= 4.58%] | cc=-1.49 rig=0.0000 a=0.024 env=True
epoch 1: int[mse=0.00052 rec=30.14%] ext[mse=0.00300 rec=12.73%] | cc=-1.48 rig=0.0000 a=0.025 env=True
epoch 2: int[mse=0.00013 rec=47.44%] ext[mse=0.00070 rec=22.37%] | cc=-1.47 rig=0.0000 a=0.026 env=True
epoch 3: int[mse=0.00007 rec=62.95%] ext[mse=0.00032 rec=29.98%] | cc=-1.47 rig=0.0000 a=0.026 env=True
epoch 4: int[mse=0.00008 rec=16.89%] ext[mse=0.00024 rec=16.71%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 5: int[mse=0.00009 rec=57.83%] ext[mse=0.00025 rec=32.53%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 6: int[mse=0.00010 rec=62.00%] ext[mse=0.00029 rec=33.70%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 7: int[mse=0.00009 rec=50.16%] ext[mse=0.00032 rec=29.28%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 8: int[mse=0.00008 rec=33.23%] ext[mse=0.00031 rec=21.43%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 9: int[mse=0.00008 rec=29.09%] ext[mse=0.00031 rec=21.93%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 10: int[mse=0.00007 rec=43.08%] ext[mse=0.00030 rec=27.95%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 11: int[mse=0.00006 rec=48.34%] ext[mse=0.00026 rec=29.49%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 12: int[mse=0.00005 rec=44.14%] ext[mse=0.00023 rec=28.97%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 13: int[mse=0.00004 rec=36.71%] ext[mse=0.00021 rec=23.32%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 14: int[mse=0.00004 rec=51.07%] ext[mse=0.00019 rec=29.76%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 15: int[mse=0.00003 rec=55.61%] ext[mse=0.00017 rec=30.72%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 16: int[mse=0.00003 rec=82.98%] ext[mse=0.00016 rec=41.54%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 17: int[mse=0.00002 rec=63.17%] ext[mse=0.00014 rec=33.50%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 18: int[mse=0.00002 rec=66.96%] ext[mse=0.00013 rec=36.20%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 19: int[mse=0.00002 rec=75.74%] ext[mse=0.00012 rec=40.23%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 20: int[mse=0.00001 rec=81.11%] ext[mse=0.00010 rec=45.54%] | cc=-1.46 rig=0.0000 a=0.028 env=True
epoch 21: int[mse=0.00001 rec=69.71%] ext[mse=0.00009 rec=38.91%] | cc=-1.45 rig=0.0000 a=0.028 env=True
epoch 22: int[mse=0.00001 rec=89.51%] ext[mse=0.00008 rec=51.03%] | cc=-1.45 rig=0.0000 a=0.028 env=True
epoch 23: int[mse=0.00001 rec=93.21%] ext[mse=0.00006 rec=54.33%] | cc=-1.45 rig=0.0000 a=0.028 env=True
epoch 24: int[mse=0.00001 rec=74.30%] ext[mse=0.00006 rec=46.39%] | cc=-1.44 rig=0.0000 a=0.028 env=True
epoch 25: int[mse=0.00000 rec=96.38%] ext[mse=0.00005 rec=60.66%] | cc=-1.44 rig=0.0000 a=0.028 env=True
epoch 26: int[mse=0.00000 rec=98.80%] ext[mse=0.00004 rec=68.66%] | cc=-1.43 rig=0.0000 a=0.028 env=True
epoch 27: int[mse=0.00000 rec=99.20%] ext[mse=0.00003 rec=71.80%] | cc=-1.43 rig=0.0000 a=0.028 env=True
epoch 28: int[mse=0.00000 rec=99.37%] ext[mse=0.00003 rec=74.51%] | cc=-1.42 rig=0.0000 a=0.028 env=True
epoch 29: int[mse=0.00000 rec=99.32%] ext[mse=0.00002 rec=76.43%] | cc=-1.42 rig=0.0000 a=0.028 env=True
epoch 30: int[mse=0.00000 rec=99.53%] ext[mse=0.00002 rec=80.68%] | cc=-1.41 rig=0.0000 a=0.028 env=True
epoch 31: int[mse=0.00000 rec=99.56%] ext[mse=0.00002 rec=82.36%] | cc=-1.41 rig=0.0000 a=0.028 env=True
epoch 32: int[mse=0.00000 rec=99.62%] ext[mse=0.00002 rec=84.97%] | cc=-1.41 rig=0.0000 a=0.028 env=True
epoch 33: int[mse=0.00000 rec=99.65%] ext[mse=0.00001 rec=86.74%] | cc=-1.40 rig=0.0000 a=0.028 env=True
epoch 34: int[mse=0.00000 rec=99.68%] ext[mse=0.00001 rec=88.07%] | cc=-1.40 rig=0.0000 a=0.028 env=True
epoch 35: int[mse=0.00000 rec=99.70%] ext[mse=0.00001 rec=89.02%] | cc=-1.40 rig=0.0000 a=0.028 env=True
epoch 36: int[mse=0.00000 rec=99.71%] ext[mse=0.00001 rec=89.75%] | cc=-1.39 rig=0.0000 a=0.028 env=True
epoch 37: int[mse=0.00000 rec=99.72%] ext[mse=0.00001 rec=90.26%] | cc=-1.39 rig=0.0000 a=0.028 env=True
epoch 38: int[mse=0.00000 rec=99.73%] ext[mse=0.00001 rec=90.48%] | cc=-1.39 rig=0.0000 a=0.028 env=True
epoch 39: int[mse=0.00000 rec=99.73%] ext[mse=0.00001 rec=90.53%] | cc=-1.39 rig=0.0000 a=0.028 env=True
best recovery — internal 99.73% | external 90.53% | ckpt sign_test_canon/sign_canon.pt
==========================================================================================
[sign-test] lens_sign='signed'
==========================================================================================
geolip-svae-v2 TWO-RECON | battery(PatchSVAE) 52,131 + shell 44,204 + growth 0 | D_base4->D_lens16 V32 ps2 | cuda
battery -> INTERNAL recon (pure MSE) | shell -> EXTERNAL recon (detached stem, lens_sign=signed) | adam lr=0.001 wd=0 sched=onecycle
growth : parked (no stencil)
[ByteTrigramDataset] Loading corpus wikitext-2-raw-v1...
[ByteTrigramDataset] Corpus: 10,938,611 bytes (10.9 MB), 768 bytes/image, 14,242 non-overlapping images available (10,937,843 valid window starts)
epoch 0: int[mse=0.01550 rec=12.88%] ext[mse=0.01768 rec=10.14%] | cc=-1.49 rig=0.0000 a=0.024 env=True
epoch 1: int[mse=0.00053 rec=30.09%] ext[mse=0.00073 rec=21.73%] | cc=-1.48 rig=0.0000 a=0.025 env=True
epoch 2: int[mse=0.00013 rec=44.02%] ext[mse=0.00018 rec=38.30%] | cc=-1.47 rig=0.0000 a=0.025 env=True
epoch 3: int[mse=0.00008 rec=28.60%] ext[mse=0.00009 rec=30.84%] | cc=-1.47 rig=0.0000 a=0.025 env=True
epoch 4: int[mse=0.00008 rec=50.84%] ext[mse=0.00008 rec=49.70%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 5: int[mse=0.00009 rec=60.27%] ext[mse=0.00009 rec=47.82%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 6: int[mse=0.00009 rec=34.78%] ext[mse=0.00010 rec=33.67%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 7: int[mse=0.00009 rec=55.85%] ext[mse=0.00011 rec=43.34%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 8: int[mse=0.00009 rec=70.88%] ext[mse=0.00011 rec=57.15%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 9: int[mse=0.00008 rec=49.43%] ext[mse=0.00010 rec=43.24%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 10: int[mse=0.00007 rec=37.95%] ext[mse=0.00010 rec=36.92%] | cc=-1.46 rig=0.0000 a=0.025 env=True
epoch 11: int[mse=0.00006 rec=25.67%] ext[mse=0.00009 rec=27.67%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 12: int[mse=0.00005 rec=31.98%] ext[mse=0.00007 rec=28.85%] | cc=-1.46 rig=0.0000 a=0.026 env=True
epoch 13: int[mse=0.00004 rec=43.96%] ext[mse=0.00006 rec=41.25%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 14: int[mse=0.00004 rec=34.30%] ext[mse=0.00005 rec=31.27%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 15: int[mse=0.00003 rec=52.89%] ext[mse=0.00005 rec=44.82%] | cc=-1.46 rig=0.0000 a=0.027 env=True
epoch 16: int[mse=0.00003 rec=79.94%] ext[mse=0.00004 rec=68.72%] | cc=-1.46 rig=0.0000 a=0.028 env=True
epoch 17: int[mse=0.00002 rec=75.35%] ext[mse=0.00004 rec=61.30%] | cc=-1.46 rig=0.0000 a=0.028 env=True
epoch 18: int[mse=0.00002 rec=87.26%] ext[mse=0.00003 rec=75.07%] | cc=-1.46 rig=0.0000 a=0.028 env=True
epoch 19: int[mse=0.00002 rec=92.37%] ext[mse=0.00003 rec=80.44%] | cc=-1.46 rig=0.0000 a=0.029 env=True
epoch 20: int[mse=0.00001 rec=63.92%] ext[mse=0.00002 rec=52.35%] | cc=-1.46 rig=0.0000 a=0.029 env=True
epoch 21: int[mse=0.00001 rec=96.15%] ext[mse=0.00002 rec=86.80%] | cc=-1.45 rig=0.0000 a=0.029 env=True
epoch 22: int[mse=0.00001 rec=87.43%] ext[mse=0.00002 rec=76.03%] | cc=-1.45 rig=0.0000 a=0.029 env=True
epoch 23: int[mse=0.00001 rec=91.37%] ext[mse=0.00001 rec=78.56%] | cc=-1.45 rig=0.0000 a=0.029 env=True
epoch 24: int[mse=0.00001 rec=97.84%] ext[mse=0.00001 rec=91.84%] | cc=-1.44 rig=0.0000 a=0.030 env=True
epoch 25: int[mse=0.00000 rec=81.34%] ext[mse=0.00001 rec=68.67%] | cc=-1.44 rig=0.0000 a=0.030 env=True
epoch 26: int[mse=0.00000 rec=99.13%] ext[mse=0.00001 rec=96.60%] | cc=-1.43 rig=0.0000 a=0.030 env=True
epoch 27: int[mse=0.00000 rec=99.27%] ext[mse=0.00001 rec=97.22%] | cc=-1.43 rig=0.0000 a=0.030 env=True
epoch 28: int[mse=0.00000 rec=99.35%] ext[mse=0.00000 rec=97.58%] | cc=-1.42 rig=0.0000 a=0.030 env=True
epoch 29: int[mse=0.00000 rec=99.47%] ext[mse=0.00000 rec=98.30%] | cc=-1.42 rig=0.0000 a=0.030 env=True
epoch 30: int[mse=0.00000 rec=99.40%] ext[mse=0.00000 rec=97.67%] | cc=-1.41 rig=0.0000 a=0.030 env=True
epoch 31: int[mse=0.00000 rec=99.57%] ext[mse=0.00000 rec=98.77%] | cc=-1.41 rig=0.0000 a=0.030 env=True
epoch 32: int[mse=0.00000 rec=99.62%] ext[mse=0.00000 rec=98.93%] | cc=-1.41 rig=0.0000 a=0.030 env=True
epoch 33: int[mse=0.00000 rec=99.65%] ext[mse=0.00000 rec=99.07%] | cc=-1.40 rig=0.0000 a=0.030 env=True
epoch 34: int[mse=0.00000 rec=99.68%] ext[mse=0.00000 rec=99.17%] | cc=-1.40 rig=0.0000 a=0.030 env=True
epoch 35: int[mse=0.00000 rec=99.70%] ext[mse=0.00000 rec=99.22%] | cc=-1.40 rig=0.0000 a=0.030 env=True
epoch 36: int[mse=0.00000 rec=99.72%] ext[mse=0.00000 rec=99.27%] | cc=-1.39 rig=0.0000 a=0.030 env=True
epoch 37: int[mse=0.00000 rec=99.73%] ext[mse=0.00000 rec=99.29%] | cc=-1.39 rig=0.0000 a=0.030 env=True
epoch 38: int[mse=0.00000 rec=99.73%] ext[mse=0.00000 rec=99.31%] | cc=-1.39 rig=0.0000 a=0.030 env=True
epoch 39: int[mse=0.00000 rec=99.74%] ext[mse=0.00000 rec=99.31%] | cc=-1.39 rig=0.0000 a=0.030 env=True
best recovery — internal 99.74% | external 99.31% | ckpt sign_test_signed/sign_signed.pt
================================================================
SIGN TEST — best recovery (internal / external)
================================================================
lens_sign=canon internal=99.73% external=90.53%
lens_sign=signed internal=99.74% external=99.31%
external delta (signed - canon): +8.78% -> SIGNS CARRY CONTENT — hypothesis supported
I believe the Transformer is 2 steps back and 1 step forward, rather than the 3 steps forward I was anticipating. I'll need to enter blueprint stages again with the new eliminations in mind while accounting for the various successes.
- Scale isn't correctly aligned to the projection capacity.
- Rotary behavior does not affect non SVD variants of the model in the same way and must be accounted for.
- Projection is predominantly a statistics orientation and the encoder behavior's implicit guarantee only spans to certain R elemental states, and the outcome should be marked accordingly.
- The SVD variants are still mostly untested, so this may be the entirely incorrect approach.
- Multilens is too unstable in it's early form to be used.
- It will not reasonably converge within a set amount of epochs.
- The crusher isn't behaving accordingly to the expected format, the accumulations are too similar.
- The aleph fractals did not emerge usefully and thus the system requires a different formula for implicit deconstruction.
One approach I believe is valuable, is the projection capacity and the rigid encapsulant folding. This is most definitely a direction that needs to be explored.
Even so there were successes in the mix.
- Aleph convergence is a learnable explicit and it can be controlled.
- Some alephs form invalid connections and this process needs debugging.
- Some alephs form considerably longer chains of knots making the downstream models more likely to overfit to valid alephs.
- Void and Aleph are hand-in-hand components, disabling one kills the convergence of the other. They are dependent pairings.
- Rotary is HELPFUL when applied FROM higher dimensional space for convergence and works with sequences.
- The lensed rigidity can be expanded safely
The geolip-svae-transformer
Article wasn't ready yet, the experiments for this round need more work on each before I can attest to certain rules, and the wording needs considerable amounts of work.
The article will be out when the experiments are ready.
Bert trainer converged as well now when bert systems are fed as trigrams as well.
geolip-svae-bert | lens=single ladder [4, 8, 16, 32] D_dec=32 | void=on spectral=on(2L) | V32 ps2 | params 143,671 | cuda
trainer: adamw lr=0.001 wd=0.0001 clip=1.0 | rigid_hinge=3.0 (margin 0.25·crit) diff=0.2 (ramp 30%) recon=cosine_sim | sched=onecycle | workers=4
[BERTVectorDataset] Loading corpus wikitext-2-raw-v1...
[BERTVectorDataset] 23,767 non-empty lines; encoding via BERT...
[BERT] loading bert-base-uncased on cuda...
Loading weights: 100%
 199/199 [00:00<00:00, 4028.39it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: bert-base-uncased
Key | Status | |
-------------------------------------------+------------+--+-
cls.seq_relationship.bias | UNEXPECTED | |
cls.predictions.bias | UNEXPECTED | |
cls.seq_relationship.weight | UNEXPECTED | |
cls.predictions.transform.dense.bias | UNEXPECTED | |
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
cls.predictions.transform.dense.weight | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
[BERT] 100,000 unit-normalized vectors in 1.7s
[BERTVectorDataset] 100,000 BERT vectors held in CPU memory (307.2MB)
[eval_corpus] loading Trelis/tiny-shakespeare (held-out, unrelated to training)...
[eval_corpus] Trelis/tiny-shakespeare: 472 lines, encoding via BERT...
[BERT] loading bert-base-uncased on cuda:0...
Loading weights: 100%
 199/199 [00:00<00:00, 4059.70it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: bert-base-uncased
Key | Status | |
-------------------------------------------+------------+--+-
cls.seq_relationship.bias | UNEXPECTED | |
cls.predictions.bias | UNEXPECTED | |
cls.seq_relationship.weight | UNEXPECTED | |
cls.predictions.transform.dense.bias | UNEXPECTED | |
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
cls.predictions.transform.dense.weight | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
[BERT] 8,192 unit-normalized vectors in 0.1s
epoch 0: mse=0.00059 | rand=99.05% nat=99.06% | alpha=0.0239 | dev=0.0047 in_env=True
epoch 1: mse=0.00001 | rand=99.88% nat=99.88% | alpha=0.0239 | dev=0.0037 in_env=True
epoch 2: mse=0.00001 | rand=99.90% nat=99.87% | alpha=0.0239 | dev=0.0041 in_env=True
epoch 3: mse=0.00001 | rand=98.74% nat=98.71% | alpha=0.0240 | dev=0.0032 in_env=True
epoch 4: mse=0.00001 | rand=99.54% nat=99.53% | alpha=0.0244 | dev=0.0031 in_env=True
epoch 5: mse=0.00001 | rand=99.53% nat=99.52% | alpha=0.0248 | dev=0.0032 in_env=True
epoch 6: mse=0.00001 | rand=99.81% nat=99.81% | alpha=0.0251 | dev=0.0025 in_env=True
epoch 7: mse=0.00000 | rand=99.93% nat=99.93% | alpha=0.0254 | dev=0.0028 in_env=True
epoch 8: mse=0.00000 | rand=99.82% nat=99.82% | alpha=0.0256 | dev=0.0030 in_env=True
epoch 9: mse=0.00000 | rand=99.97% nat=99.97% | alpha=0.0257 | dev=0.0034 in_env=True
epoch 10: mse=0.00000 | rand=99.97% nat=99.97% | alpha=0.0259 | dev=0.0033 in_env=True
epoch 11: mse=0.00000 | rand=99.98% nat=99.98% | alpha=0.0259 | dev=0.0033 in_env=True
epoch 12: mse=0.00000 | rand=99.99% nat=99.99% | alpha=0.0259 | dev=0.0031 in_env=True
epoch 13: mse=0.00000 | rand=99.99% nat=99.99% | alpha=0.0259 | dev=0.0028 in_env=True
epoch 14: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0029 in_env=True
epoch 15: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0031 in_env=True
epoch 16: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0033 in_env=True
epoch 17: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0034 in_env=True
epoch 18: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0034 in_env=True
epoch 19: mse=0.00000 | rand=100.00% nat=100.00% | alpha=0.0259 | dev=0.0034 in_env=True
best cos recovery: 100.00% | checkpoint: geolip_svae_bert_results/geolip_svae_bert.pt
https://huggingface.co/AbstractPhil/geolip-svae-transformer/blob/main/transformer_v3.py
Lens setting: "crusher"
The crusher is live, a simple manifestation with a large punch to the statistics capacity. They are essentially pulverizing the information into a more compact shape and learning using it, simple manifestation that converges like the standard SVAE. Requires a bit more work, and a couple of the math faults still exist in the model that I'm working through. The current crusher requires optimizations and tweaks to the formula to speed it up.
The problem with these particular faults is larger than simple solution, the theorems around them require too much math. The approximations require a compact solution.
The crusher isn't quite there yet. I'll post the results for a converging crusher with better optimization than the current asap.
For single the current formula snaps a 100% so that's good.
geolip-svae-transformer | lens=single ladder [4, 8, 16] D_dec=16 | void=on spectral=on(2L) | V32 ps2 | params 96,849 | cuda
trainer: adamw lr=0.001 wd=0.0001 clip=1.0 | rigid_hinge=3.0 (margin 0.25·crit) diff=0.2 (ramp 30%) recon=pure_MSE | sched=onecycle | workers=8
torch.compile ON (mode=reduce-overhead) — first step pays compile cost
[ByteTrigramDataset] Loading corpus wikitext-2-raw-v1...
[ByteTrigramDataset] Corpus: 10,938,611 bytes (10.9 MB), 768 bytes/image, 14,242 non-overlapping images available (10,937,843 valid window starts)
epoch 0: mse=0.03261 | rand= 4.36% nat= 3.00% | alpha=0.0244 | dev=0.0085 in_env=True
epoch 1: mse=0.00302 | rand=25.08% nat=23.79% | alpha=0.0253 | dev=0.0066 in_env=True
epoch 2: mse=0.00027 | rand=32.96% nat=35.06% | alpha=0.0253 | dev=0.0058 in_env=True
epoch 3: mse=0.00027 | rand=42.94% nat=44.58% | alpha=0.0252 | dev=0.0042 in_env=True
epoch 4: mse=0.00027 | rand=19.85% nat=21.32% | alpha=0.0250 | dev=0.0045 in_env=True
epoch 5: mse=0.00020 | rand= 9.62% nat=11.55% | alpha=0.0249 | dev=0.0048 in_env=True
epoch 6: mse=0.00014 | rand=34.38% nat=39.38% | alpha=0.0250 | dev=0.0040 in_env=True
epoch 7: mse=0.00011 | rand=20.90% nat=22.83% | alpha=0.0251 | dev=0.0038 in_env=True
epoch 8: mse=0.00009 | rand=49.56% nat=53.20% | alpha=0.0253 | dev=0.0039 in_env=True
epoch 9: mse=0.00008 | rand=24.78% nat=28.64% | alpha=0.0256 | dev=0.0039 in_env=True
epoch 10: mse=0.00007 | rand=43.96% nat=46.96% | alpha=0.0258 | dev=0.0041 in_env=True
epoch 11: mse=0.00006 | rand=53.64% nat=54.76% | alpha=0.0260 | dev=0.0041 in_env=True
epoch 12: mse=0.00005 | rand=47.99% nat=48.25% | alpha=0.0263 | dev=0.0039 in_env=True
epoch 13: mse=0.00004 | rand=36.85% nat=40.04% | alpha=0.0265 | dev=0.0036 in_env=True
epoch 14: mse=0.00004 | rand=65.54% nat=68.46% | alpha=0.0267 | dev=0.0035 in_env=True
epoch 15: mse=0.00003 | rand=89.39% nat=92.22% | alpha=0.0269 | dev=0.0034 in_env=True
epoch 16: mse=0.00003 | rand=42.52% nat=44.34% | alpha=0.0270 | dev=0.0033 in_env=True
epoch 17: mse=0.00002 | rand=79.55% nat=83.27% | alpha=0.0272 | dev=0.0033 in_env=True
epoch 18: mse=0.00002 | rand=66.53% nat=69.91% | alpha=0.0273 | dev=0.0032 in_env=True
epoch 19: mse=0.00002 | rand=42.65% nat=42.95% | alpha=0.0274 | dev=0.0031 in_env=True
epoch 20: mse=0.00001 | rand=94.95% nat=96.85% | alpha=0.0275 | dev=0.0030 in_env=True
epoch 21: mse=0.00001 | rand=62.79% nat=65.49% | alpha=0.0276 | dev=0.0031 in_env=True
epoch 22: mse=0.00001 | rand=97.06% nat=98.77% | alpha=0.0276 | dev=0.0031 in_env=True
epoch 23: mse=0.00001 | rand=93.82% nat=95.22% | alpha=0.0276 | dev=0.0030 in_env=True
epoch 24: mse=0.00001 | rand=97.94% nat=99.36% | alpha=0.0277 | dev=0.0029 in_env=True
epoch 25: mse=0.00000 | rand=95.35% nat=96.22% | alpha=0.0277 | dev=0.0029 in_env=True
epoch 26: mse=0.00000 | rand=98.93% nat=99.65% | alpha=0.0277 | dev=0.0029 in_env=True
epoch 27: mse=0.00000 | rand=98.65% nat=99.59% | alpha=0.0277 | dev=0.0029 in_env=True
epoch 28: mse=0.00000 | rand=99.18% nat=99.65% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 29: mse=0.00000 | rand=99.27% nat=99.85% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 30: mse=0.00000 | rand=99.31% nat=99.69% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 31: mse=0.00000 | rand=99.43% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 32: mse=0.00000 | rand=99.47% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 33: mse=0.00000 | rand=99.51% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 34: mse=0.00000 | rand=99.54% nat=100.00% | alpha=0.0277 | dev=0.0028 in_env=True
epoch 35: mse=0.00000 | rand=99.56% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
epoch 36: mse=0.00000 | rand=99.58% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
epoch 37: mse=0.00000 | rand=99.59% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
epoch 38: mse=0.00000 | rand=99.60% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
epoch 39: mse=0.00000 | rand=99.60% nat=100.00% | alpha=0.0277 | dev=0.0027 in_env=True
best byte recovery: 99.60% | checkpoint: geolip_svae_transformer_results/geolip_svae_transformer.pt
adjudication (text -> model -> text):
'the cat sat on the mat' -> 'the cat sat on the mat'
'machine learning' -> 'machine learning'
'hello world' -> 'hello world'
Will need to attach unrelated data for recon testing to increase the nat difficulty. It's too easy.
Alright I hooked up Trelis/tiny-shakespeare for independent eval.
Fresh eyes of the day I see a couple of fairly crucial mistakes, I'll get them worked out.
Should work, it's a little unstable still but it gets there.
The explicit rigidity is converging into implicit behavior, and with that the model can recon random noise and the actual behavior of the byte math.
Multiscale lensed bytewise transformation through a rigid geometric lens.
geolip-svae-transformer | lens=single ladder [4, 8, 16] D_dec=16 | void=on spectral=on(2L) | V32 ps2 | params 96,849 | cuda
trainer: AdamW lr=0.0001 clip=1.0 | rigid_hinge=3.0 (margin 0.25·crit) diff=0.2 (ramp 30%) recon=pure_MSE | OneCycle | workers=4
[ByteTrigramDataset] Loading corpus wikitext-2-raw-v1...
[ByteTrigramDataset] Corpus: 10,938,611 bytes (10.9 MB), 768 bytes/image, 14,242 non-overlapping images available (10,937,843 valid window starts)
epoch 0: mse=0.10583 | rand= 1.39% nat= 0.79% | alpha=0.0239 | dev=0.0143 in_env=True
epoch 1: mse=0.03172 | rand= 2.63% nat= 1.21% | alpha=0.0240 | dev=0.0139 in_env=True
epoch 2: mse=0.01195 | rand= 4.19% nat= 4.27% | alpha=0.0244 | dev=0.0132 in_env=True
epoch 3: mse=0.00322 | rand=16.88% nat=15.48% | alpha=0.0249 | dev=0.0071 in_env=True
epoch 4: mse=0.00045 | rand=27.95% nat=29.86% | alpha=0.0251 | dev=0.0059 in_env=True
epoch 5: mse=0.00021 | rand=37.33% nat=37.96% | alpha=0.0252 | dev=0.0058 in_env=True
epoch 6: mse=0.00012 | rand=38.58% nat=38.66% | alpha=0.0253 | dev=0.0055 in_env=True
epoch 7: mse=0.00009 | rand=41.14% nat=40.70% | alpha=0.0253 | dev=0.0054 in_env=True
epoch 8: mse=0.00006 | rand=54.71% nat=55.22% | alpha=0.0254 | dev=0.0052 in_env=True
epoch 9: mse=0.00005 | rand=66.73% nat=67.98% | alpha=0.0254 | dev=0.0050 in_env=True
epoch 10: mse=0.00004 | rand=74.02% nat=74.04% | alpha=0.0254 | dev=0.0050 in_env=True
epoch 11: mse=0.00003 | rand=72.36% nat=72.56% | alpha=0.0254 | dev=0.0049 in_env=True
epoch 12: mse=0.00003 | rand=68.90% nat=67.53% | alpha=0.0254 | dev=0.0048 in_env=True
epoch 13: mse=0.00003 | rand=76.69% nat=77.38% | alpha=0.0254 | dev=0.0046 in_env=True
epoch 14: mse=0.00002 | rand=73.02% nat=72.70% | alpha=0.0254 | dev=0.0045 in_env=True
epoch 15: mse=0.00002 | rand=70.31% nat=69.88% | alpha=0.0254 | dev=0.0044 in_env=True
epoch 16: mse=0.00002 | rand=79.93% nat=77.46% | alpha=0.0254 | dev=0.0043 in_env=True
epoch 17: mse=0.00002 | rand=76.01% nat=73.98% | alpha=0.0255 | dev=0.0048 in_env=True
epoch 18: mse=0.00002 | rand=83.62% nat=82.27% | alpha=0.0255 | dev=0.0055 in_env=True
epoch 19: mse=0.00001 | rand=87.61% nat=88.60% | alpha=0.0255 | dev=0.0051 in_env=True
epoch 20: mse=0.00001 | rand=87.44% nat=88.66% | alpha=0.0255 | dev=0.0049 in_env=True
epoch 21: mse=0.00001 | rand=89.98% nat=87.82% | alpha=0.0255 | dev=0.0047 in_env=True
epoch 22: mse=0.00001 | rand=91.60% nat=89.77% | alpha=0.0255 | dev=0.0046 in_env=True
epoch 23: mse=0.00001 | rand=92.70% nat=91.50% | alpha=0.0255 | dev=0.0045 in_env=True
epoch 24: mse=0.00001 | rand=92.76% nat=91.87% | alpha=0.0255 | dev=0.0045 in_env=True
epoch 25: mse=0.00001 | rand=93.11% nat=91.68% | alpha=0.0255 | dev=0.0044 in_env=True
epoch 26: mse=0.00001 | rand=94.42% nat=94.23% | alpha=0.0255 | dev=0.0044 in_env=True
epoch 27: mse=0.00001 | rand=94.46% nat=93.55% | alpha=0.0255 | dev=0.0043 in_env=True
epoch 28: mse=0.00001 | rand=95.02% nat=94.58% | alpha=0.0255 | dev=0.0043 in_env=True
epoch 29: mse=0.00001 | rand=93.72% nat=94.69% | alpha=0.0255 | dev=0.0043 in_env=True
epoch 30: mse=0.00001 | rand=90.87% nat=88.89% | alpha=0.0256 | dev=0.0035 in_env=True
epoch 31: mse=0.00001 | rand=90.68% nat=89.57% | alpha=0.0256 | dev=0.0032 in_env=True
epoch 32: mse=0.00001 | rand=94.78% nat=94.39% | alpha=0.0256 | dev=0.0030 in_env=True
epoch 33: mse=0.00001 | rand=95.82% nat=95.86% | alpha=0.0256 | dev=0.0030 in_env=True
epoch 34: mse=0.00000 | rand=96.34% nat=96.15% | alpha=0.0256 | dev=0.0031 in_env=True
epoch 35: mse=0.00000 | rand=96.59% nat=96.65% | alpha=0.0256 | dev=0.0031 in_env=True
epoch 36: mse=0.00000 | rand=96.74% nat=96.80% | alpha=0.0256 | dev=0.0031 in_env=True
epoch 37: mse=0.00000 | rand=96.83% nat=96.86% | alpha=0.0256 | dev=0.0031 in_env=True
epoch 38: mse=0.00000 | rand=96.87% nat=96.86% | alpha=0.0256 | dev=0.0031 in_env=True
epoch 39: mse=0.00000 | rand=96.87% nat=96.65% | alpha=0.0256 | dev=0.0031 in_env=True
best byte recovery: 96.87% | checkpoint: geolip_svae_transformer_results/geolip_svae_transformer.pt
adjudication (text -> model -> text):
'the cat sat on the mat' -> 'the cat sat on tie mat'
'machine learning' -> 'machine learning'
'hello world' -> 'hello world'
Perfect? Not yet. Faster than SVAE battery convergence? Most definitely.
The transformer operates with the "single" setting.
AbstractPhil/geolip-svae-transformer
I've implanted a rigid formula that allows this direct behavior from the H2 battery to superimpose onto adjacent structural boundaries, and with that built aleph and void into the system as well. These are guarantees.
As for the centrifuge concept. The optimization on the centrifuge was quite lackluster. The hardware doesn't support such behavior. You can access the current operating version of the centrifuge by utilizing "stacked" configuration. Four lenses was too much when running a quaternion bank to handle such complex interactions reasonably, so I will need to work something out in the future to get a full centrifuge system working.
Crusher is ready, transformer_v3.
You might be curious WHY these converge at such low raw MSE in the later stages. The reasoning is kind of difficult to explain, so I'll try to make it simple. The direction is very subtle in the later stages of training with AdamW, so the curves start to create much more accurate shifts towards the goals. This allows the model to rapidly converge after earlier heavier training. You can't simply train it low, it takes too long. This allows the model to KIND OF get everything NEAR where it's supposed to be, which allows the really small twitches of MSE to provide massive corrections without needing hard logits or more difficult to finetune features.
The optimization on the centrifuge was quite lackluster. The hardware doesn't support such behavior, so I've continued to the crusher.
The transformer is operational, which takes the behavior of the H2 battery and directly projects it to a multiscale structure. Roughly turns 57k params to around 90k params, and with this behavior the model converges SEMI-CLOSE to the SVAE current spectrum in considerably less epochs. So stay tuned on that one, the transformer did converge.
I've implanted a rigid formula that allows this direct behavior from the H2 battery to superimpose onto adjacent structural boundaries, and with that built aleph and void into the system as well. These are guarantees.
Four lenses was too much when running a quaternion bank to handle such complex interactions reasonably, so I will need to work something out in the future to get a full centrifuge system working.
The crusher however is essentially a guarantee. There's a definite valuation set that can be easily obtained by simply pulverizing and analyzing the remains.