Unified representation
learning for the intelligence
substrate.
Alloy researches unified cross-modal representation learning for arbitrary-dimensional signals. It is the intelligence substrate research track for Stratum — the question is whether a single representation can encode 1D time series, 2D images, and 3D spatial data without modality-specific preprocessing.
Phase 2 is active: coordinate-value tokenization treats every signal as a function over coordinates. The tokenizer does not know whether a patch came from a 1D or 2D signal. It knows the coordinate where the patch was sampled and the values observed there. Gradient-separated training prevents alignment losses from distorting the reconstruction backbone.
Multi-Signal Input
nD + 1 · n is free per signalS independent signals. Each signal Ss is a set of T tokens; each token is a pair (coord ∈ ℝn, value ∈ ℝdv) — a sample of a function over an n-dimensional coordinate space. n is free per signal: time-series take n=1, images n=2, volumes and point clouds n=3.
Disentangled Coordinate Tokenizer
(coord, value) → ℝdmodelEach raw token — a pair (coord, value) — is embedded by two independent branches whose outputs are concatenated. The coord branch carries where, the value branch carries what, and an orthogonality loss ℒortho keeps the two subspaces from collapsing into each other inside the joint dmodel embedding.
Status: Active research. Best Phase 2 result — mixing ratio 0.140, unification ratio 1.184. Training pipeline runs locally, on GPU, or on RunPod. Evaluation suite covers 10 metrics including semantic alignment, probing accuracy, and latent slot specialization.