Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AbstractPhil 
posted an update 2 days ago
Post
97
The geolip-svd-transformer is almost ready.

I've spent multiple days preparing the substructure, scaling, testing, and expanding the system. The conduit is meant to reorganize data. Just like the SVAE prototypes, they are meant to sort and organize, not compress and compact.

The organization is almost prepared and almost ready. The resulting structure will produce projection-capable geometric aligned memory, compacted and transformed into a utilizable token set. The remaining structural components are specifically SVD-related utilities, and each of those are utilizing the variant natures of how difficult, how dispersed, and so on each component is as it's learned over time.

The SVAE components were perfect for testing this playground. They appear to be larger when analyzed, however the representation of those are meant to represent huge vocabularies. Patch 16x16 expanded upward to 768 is meant to encapsulate the behavior of near-pi upscaled, condensed into a considerably simpler smaller form.

This model is behaving perfectly. It does not encode in the traditional sense, it analyzes and produces geometric opinions throughout it's structure. Each of them proved one after the other the model could not only learn, but it can perfectly reconstruct, and with that produce utility-driven expansion capacity directly.

Fresnel -> effective image analysis battery.
Johanna -> effective noise analysis
Grandmaster -> Johanna finetuned with sigma restoration using Fresnel's opinions.
Freckles -> massive analysis array for noise (4096 to 16k tks)

Geometric batteries.

Cayley rotation is meant to encapsulate that potential and expand it, allowing further differentiation down the chain of model structural behavioral events.

Suffice it to say, this is the geometric transformer's evolved state. These will exist as conduits throughout the models, the expanded behavioral attenuation units meant to provide geometric analysis internally within models for data-oriented CV alignment.

By default the transfer learning from these batteries is not going to go be as effective as say raw pixel transfer.

However, you can achieve from a pure noise model nearly 72% accuracy on cifar100 using just the Freckles-256 (256 patches) trained purely on noise with CrossEntropy, Conv, and direct bottleneck ingestion - BEFORE the conduit-svd was introduced.

With conduit-svd the transfer-potential of the transformer will expand this behavior exponentially with QKV, treating the QKV as a uniquely differentiable format - specifically aligned to the geometric battery-state itself.

This is only possible due to the increased accuracy from the geolip.linalg.eigh structure and speed of the geolip.linalg.svd.

Without them, degenerate eigh and SVD cannot form, and the full structural awareness will never coalesce internally. Without enough degenerate EIGH and SVD, the structural basin for the miniature patchwork accuracy will never coalesce into opinions.

Odd, I know, but it's required. Degenerate SVD create a highly difficult to measure void response that I at first tried to patch out, until direct analysis showed CM is definitely preserving the structure - just in an unexpected series of ways. Near-degenerate and degenerate are a predominant structural learning, so when a huge influx of these structural boundaries format into a utilizable shape, the upshoot structure behaves in a uniformly geometric format that can be analyzed.

I didn't expect it either.

By clamping the CM above near degenerate to guarantee non-degenerate volumes, the structure shows that the volumes aren't in fact there most of the time. It's predominantly directions and almost all magnitude is devoid.

======================================================================
  COMPLETE
======================================================================
  Best val acc: 93.8%
  Time: 979s (8.2s/epoch)
  Conv: 4,251,200  Cells: 366,176  Head: 167,946  Total: 4,785,322

  Comparison:
    SpectralCell standalone (D=16 V=16 h=256 +conv +aug): 79.1%  926K  1.2s/ep
    ConduitBattery backbone (GPT trainer, ep55/120):       88.7%   ~2M  ?s/ep
    Conv + SpectralCell inline:                     93.8%  4,785,322  8.2s/ep
·

image

First note, there is no degeneracy in this cell now. As per hundreds of bulk tests with many readouts, the degeneracy is swept up in the SVD kernel, the fl_gram eigh svd, the FLEigh structure, or any of the subsequent catches that the pytorch handles.

The degeneracy problem is solved, and with that introduced a massive amount of new problems. Problems that I have built prototypes to address; each core problem has been narrowed down to three core components as solutions for information movement.

S^N sequential
Scattered S^N * D for orthogonal clustering
S * D + D * D for structural cohesive memory annealing

This comes down to three important utilities that many core structures depend on.

Sequence, distance, cosine similarity, QKV support, rotary support, and more.

  1. Sequential structural cohesion; LLM, tokens, next token prediction, spearman, and so on.
  2. Behavioral attenuated implicit; ViT, Resnets, diffusers, etc
  3. Geometric alignment structure; Distillation, transfer learning, teacher/student, genetic inheritance, generational learning, SVAE, geolip prototypes, and constellations.

Third is least useful to the out of scope, first two are very useful so they are my predominant focus here.

I have 14 potential prototypes and I will be forming a notebook for each, testing the robustness, the positives, the negatives, the storage and recall capacity, the magnitude standardization vs normalization accuracy, the flow matched directional EMA vs non-EMA, the structural supported ensemble approach vs the residual approach, and a few other elemental substructures.

The biggest tradeoffs will be between normalization clipping and standardization unit structured tokens. These are inherently entirely different expectations and produce entirely different opinions.

Each of these experiments will be fully documented, the subsequent models included in the notebook sections, and the notebooks represented in the cell repo.

The Cell is a fickle beast, but I believe I have tamed the monster. The battery will be substantially stronger with the new cell upgrades, as the battery includes multiple constellation elements such as FILM solidification, normalization at curative points rather than destructive, and a few other elements to assist with producing tokenizations such as direct Conv support and huggingface transformer capacity for the MOE substructures.

As it stands, the transformer tokens here are represented simply as [b, S, D, V] also [b, S, U, Vt], and they have direct embedding tokenization potentials on many structures, but not all structures. There are multiple deviant structures that suffer from certain rules that require additional solutions before those work.

The prototypes may not exactly reflect this shape, and the shape may change for packaging and reuse purposes so bare with it for now. I'm only one person and I'm heavily relying on Claude to handle many of the logistics. I can code all of this, it just takes a lot longer for me to do manually so I'm basically on NO GELU HERE - NO NORMS HERE - NO PROJECTION HERE duty. I'm basically babysitting Claude so the code is correct and making sure the tests come out as they are supposed to.

In this post