← Back
Alloy · Research Track · Active

Unified representation
learning for the intelligence
substrate.

Alloy researches unified cross-modal representation learning for arbitrary-dimensional signals. It is the intelligence substrate research track for Stratum — the question is whether a single representation can encode 1D time series, 2D images, and 3D spatial data without modality-specific preprocessing.

Phase 2 is active: coordinate-value tokenization treats every signal as a function over coordinates. The tokenizer does not know whether a patch came from a 1D or 2D signal. It knows the coordinate where the patch was sampled and the values observed there. Gradient-separated training prevents alignment losses from distorting the reconstruction backbone.

§01

Multi-Signal Input

nD + 1 · n is free per signal

S independent signals. Each signal Ss is a set of T tokens; each token is a pair (coord ∈ ℝn, value ∈ ℝdv) — a sample of a function over an n-dimensional coordinate space. n is free per signal: time-series take n=1, images n=2, volumes and point clouds n=3.

S signals · s ∈ {0 … S−1} fs: ℝns → ℝdv · T samples
n=1 · ℝ time-series
(T, 1)
(T, dv)
n=2 · ℝ² image / lattice
(T, 2)
(T, dv)
n=3 · ℝ³ point cloud / volume
(T, 3)
(T, dv)
§02

Disentangled Coordinate Tokenizer

(coord, value) → ℝdmodel

Each raw token — a pair (coord, value) — is embedded by two independent branches whose outputs are concatenated. The coord branch carries where, the value branch carries what, and an orthogonality loss ortho keeps the two subspaces from collapsing into each other inside the joint dmodel embedding.

two-branch tokenizer · one token at a time applied independently across (B, S, T)
Coordinate branch in: ℝn
n
Fourier encoding F frequencies · sin/cos · no learned params
2F
Linear · LayerNorm 2F → dmodel/2 · normalize per token
dmodel/2
coord embedding dmodel/2
Value branch in: ℝdv
dv
Linear projection dv → dmodel/2 · learned weights
dmodel/2
LayerNorm · GeLU normalize · smooth gated nonlinearity
dmodel/2
value embedding dmodel/2
↓ dmodel/2 ↓ dmodel/2
concat [coord_emb ‖ value_emb]
Token embedding dmodel · no positional encoding added
coord subspace · dims 0 … dmodel/2 − 1
value subspace · dims dmodel/2 … dmodel − 1
position is implicit in coord (B, S, T, dmodel) → transformer

Status: Active research. Best Phase 2 result — mixing ratio 0.140, unification ratio 1.184. Training pipeline runs locally, on GPU, or on RunPod. Evaluation suite covers 10 metrics including semantic alignment, probing accuracy, and latent slot specialization.