BLUF: GRAIL runs at full native speed and requires no CPU or cloud trust—a decisive advantage over all known encrypted ML methods. Unlike systems that must decrypt or emulate over ciphertext, GRAIL directly parses encrypted inputs and parameters through model layers with no runtime slowdown.
Deployment Note: As with any cryptographic protocol, security assumes that model training and encryption occur on secure or air-gapped devices, prior to inference-time execution. Once encrypted, models and inputs remain opaque to untrusted CPUs throughout usage.
Tagline: With GRAIL, you don’t need to trust the CPU.
Why?
- No plaintext in the ALU: Compute happens over algebraically encrypted representations. The processor only sees obfuscated tensors—not the true data.
- Keys stay off-device: Decryption schedules live outside the untrusted machine. Optional re-keying during runtime keeps states fresh and non-malleable.
- Zero vendor trust required: Unlike TEEs (e.g., Intel SGX or AMD SEV), GRAIL doesn’t rely on opaque microcode or vendor firmware.
- Default behavior: GRAIL does this by design. No special mode, no overhead. It's not a patch—it's the architecture.
- Future-aligned: As computing shifts to NPU-native and neural models replace OS kernels, GRAIL’s geometry-native encryption will be essential.
- Performance: GRAIL runs at native speed. Compared to FHE or MPC? It’s not just “3× faster”—it’s 1,000× to 10,000× faster.
Bottom line: GRAIL runs at normal speed without trusting the CPU.
Compared to FHE/MPC, it’s not “3× faster”—it’s thousands to ten-thousands× faster.
Compared to plaintext? = equal speed, even with frequent or per-step key rotation.
These embedded coprocessors are well-documented and raise legitimate concerns for users requiring full CPU-level privacy:
These are low-level vendor-controlled systems with privileged access—potential vectors for surveillance or remote compromise. GRAIL avoids relying on them entirely.
Comparison of Methods for Secure Computation Without CPU Trust
| Method |
What's Protected “In Use” |
Trust & Leakage |
Speed (Relative to FHE = 1×) |
ML Fit Today |
| FHE (CKKS, TFHE) |
Data & model stay encrypted; ops over ciphertexts |
No trust in hardware; leaks access patterns unless ORAM used |
1× (baseline) e.g. 8.58s vs. milliseconds |
Mature libraries; still slow for real-time ML |
| MPC / Secret Sharing |
Data split across multiple parties |
Requires ≥2 honest parties; high communication |
10–100× faster than FHE |
Efficient for matmul-heavy models; WAN hurts |
| ORAM / Garbled Circuits |
Data and access patterns obfuscated |
High bandwidth; full privacy if padded |
10–100× faster than FHE |
Best for binarized networks or lookup-style tasks |
| ZK / zkML |
Verifiable execution; not encrypted in-use |
Trusted setup; slow proof generation |
2–10× faster than FHE (verify-only) |
Great for proofs, not for privacy |
| TEE (Intel SGX, AMD SEV) |
Plaintext inside enclave; encrypted RAM |
Requires trusting vendor firmware; vulnerable to side channels |
500–1,000× faster than FHE |
Widely deployed; not trustless |
| GRAIL (this work) |
Parameters, activations, and latents are algebraically encrypted via geometry/operator representations |
No hardware trust; strong semantic protection using group theory, symbolic entropy, and automorphic logic |
≈1× (compared to plaintext)
1,000×–10,000× faster than FHE
By default. No extra encryption step needed.
|
Optimal for real-time, encrypted ML inference and training |
Note: The comparison with FHE or MPC is just one small corner of GRAIL's capabilities. GRAIL is not merely an encryption layer—it is a superset architecture that unifies cryptographic, geometric, symbolic, and post-quantum computation into a single coherent neural framework.
Use Case: Generating Cryptographically Equivalent Twin Models
One of GRAIL’s most powerful properties is its ability to produce an
infinite family of algebraically encrypted twin models—each with
distinct internal weights but identical outputs on all inputs.
These variants are not merely obfuscated—they are provably invariant under GRAIL’s encryption basis. This makes them ideal for:
- Deploying unique model instances per user, device, or session
- Preventing parameter extraction via model inversion or distillation
- Enabling secure multi-party or decentralized inference without key sharing
- Thwarting fingerprinting attacks, even when outputs are observable
Expanded Insight
GRAIL enables the construction of an infinite ensemble of cryptographically equivalent models,
each defined on a reparametrized weight manifold with its own internal energy geometry. These are not mere latent-space
reparameterizations, but fully distinct semantic universes: models whose internal geometries—curvature, attractors,
and critical points—are reshaped while preserving identical outputs through deep algebraic and cryptographic invariants.
Each model-world within the ensemble possesses a self-consistent energy topology defined by transformed weights.
Local geometry shifts; global semantics remain intact.
These transformations are not analogous to relativistic frame changes—they are mathematically equivalent.
The cryptographic operator acts as a coordinate transformation on a curved manifold, reorienting the model’s internal frame of
reference within a physically structured weight space. Here, the model functions as an observer, and the input acts as
an observable tensor. Both are preserved under frame transformation, satisfying covariance and consistency conditions from
general relativity.
This framework embeds machine learning models into the formal tensorial language of relativistic physics.
The system preserves inference under arbitrary frame changes, just as physical laws remain invariant across observers in curved spacetime.
GRAIL thus offers a principled unification: neural architectures are recast as relativistic observers
within cryptographically secured geometries. This is not a metaphor, but a rigorous embedding of learning dynamics into the
same mathematical categories that underwrite general relativity.
Each transformed instance becomes a distinct observer-world within an ensemble of
metric-preserving, cryptographic manifolds—all yielding invariant inference yet internally reconfigured.
This enables deployment across adversarial, decentralized, or multi-party environments without semantic leakage or degradation.
- Inference remains invariant in encrypted and plaintext modes
- Transformations follow exact tensorial rules of frame covariance
- Supports geometric ensembling, multi-key model sharding, and zero-leakage inference
These cryptographic twins arise from symmetry-preserving flows on encrypted model manifolds, where
algebraic group actions preserve semantics while reshaping structure—analogous to Lorentz or diffeomorphic
transformations in general relativity.
Outcome:
A single model becomes a generator of functionally identical, geometrically distinct, and physically invariant cryptographic twins,
enabling secure inference in a relativistically consistent cryptographic landscape.
A learning–theoretic route to emergent quantum gravity: geometry (automorphic), information (Galois/DFA), and thermodynamics (Selberg–Huber) fused by a critical-entropy thermostat.
Automorphic kernels
Hyperbolic attention \( \mathbb H^2 \) (current)
Roadmap: \( \mathbb H^d \) (\(d=3,4\))
CEAS criticality
DFA symbolic quantization
Selberg/Huber diagnostics
Yoneda lift
Abstract (plain language)
I construct an attention mechanism that natively lives on hyperbolic geometry and uses automorphic (Maass-type) kernels. A critical-entropy controller (CEAS) regulates the inverse temperature \( \beta \) so that attention entropy hovers near a pseudo-critical point. Within this setting, the classic Langlands triad is realized inside a neural operator:
automorphic \( \leftrightarrow \) Galois \( \leftrightarrow \) motive.
Key equations.
Automorphic kernel:
\[
K_{\beta}(q,k)=\sum_{\gamma\in\Gamma_{\text{trunc}}}\exp\big(-\beta\, d_{\mathbb H}(q,\gamma k)\big)
\]
CEAS identity:
\[
\frac{dH}{d\beta} \;=\; -\,\beta\,\mathbb{E}_i\!\left[\operatorname{Var}_{p_{i\cdot}(\beta)}\!\big(s_{i\cdot}\big)\right]
\]
Geometry notice. The current diagnostics and Selberg/prime-geodesic proxies are 2D-specific (surface quotients \( \mathrm{PSL}(2,\mathbb Z)\backslash\mathbb H^2 \)). The \( \mathbb H^d \) roadmap (for \( d=3,4 \)) replaces these with lattices in \( SO^+(d,1) \) and higher-dimensional hyperbolic weights.
Synthesis at a glance
| Pillar | Realization | Physical meaning / Control |
| Automorphic geometry |
Heat/Maass kernel on \( \mathrm{PSL}(2,\mathbb Z)\backslash \mathbb H^2 \) (current); truncated Poincaré (+ Hecke) |
Curvature quantization; \( \beta \) sharpens/softens geometry |
| Galois information |
DFA coupler (cycle/transition bias; row-stochastic shifts) |
Discrete causal quantization; entropy gate constrains transitions |
| Motivic thermodynamics |
Selberg/Huber probe energies & pressure bands |
Thermodynamic quantization; CEAS maintains near-critical corridor |
Operational signatures
- Non-commutativity field \( [\xi,X](t) \): BCH two-path probe → input-projected Gram eigenvalues (first layers).
- Effective spectrum \( \lambda_{\mathrm{eff}}(t) \): from probe energies \( E(t) \), \( \lambda_{\mathrm{eff}}(t)\!\approx\! -\,\frac{d}{dt}\log E(t) \); bands narrow under CEAS.
- Hyperbolic trace proxies (2D): seeded prime-geodesic/trace terms on \( \mathrm{PSL}(2,\mathbb Z) \) certify negative curvature.
Download & cite
Download the PDF
Lecture Notes (Draft)
Show suggested citation (BibTeX)
@misc{CTQLanglands,
title = {Critical--Tri--Quantized Langlands:
Automorphic Attention, Galois/DFA, and Motivic Thermodynamics at CEAS Criticality},
author = {William Chuang},
year = {2025},
note = {Lecture Notes (Draft)},
url = {https://drive.google.com/file/d/1XLZKuXL6of--CfMzcVMQHTW0zW-YLurn/view?usp=sharing}
}
Quick orientation
Geometry
Tokens on \( \mathbb H^2 \) (Poincaré disk/UHP); logits include hyperbolic heat distance
Automorphic gates
Truncated Poincaré series; optional small-prime Hecke averages
Symbolic layer
DFA coupler modulates cycles / row-stochastic shifts
Thermostat
CEAS regulates \( \beta \) via \( \frac{dH}{d\beta} \) near pseudo-criticality
Observables
\( [\xi,X](t) \) spectrum; \( \lambda_{\mathrm{eff}}(t) \); hyperbolic trace proxies (2D)
One-line logit (schematic)
\[
\underbrace{\langle q(x_i),k(x_j)\rangle}_{\text{content}}
+ \underbrace{\mathrm{heat}_t\!\big(d_{\mathbb H}(z_i,z_j)\big)}_{\text{geometry}}
+ \underbrace{\log\!\!\sum_{\gamma\in\Gamma_{\rm trunc}}\! e^{-\beta\, d_{\mathbb H}(z_i,\gamma z_j)} + \text{Hecke}}_{\text{automorphic}}
+ \underbrace{\mathrm{DFA}_{ij}}_{\text{cycles}}
\]
Softmax at inverse temperature \( \beta \) (regulated by CEAS).
Yoneda viewpoint: probes → heads
I treat each head as a covariant fiber functor
\( \widehat{\mathrm{Head}}_\beta:\mathsf{Rep}(\Gamma)\!\to\!\mathsf{Hilb}_{\mathrm{fe}} \),
\( V \mapsto (V^\vee \!\otimes \mathcal H_\beta)_\Gamma \).
For any \( V\in\mathsf{Rep}(\Gamma) \), the representable probe is
\( h_V(W)=\mathrm{Hom}_\Gamma(V,W) \).
By Yoneda,
Nat\(h_V,\widehat{\mathrm{Head}}_\beta\)\(\;\cong\;\)\(\widehat{\mathrm{Head}}_\beta(V)\).
Operational reading.
Specifying how a head acts on all maps out of \(V\) is equivalent to a single feature vector in the fiber at \(V\).
So a small family of probes \( \{h_{V_a}\} \) suffices to recover the head on a dense class of tests.
Practical probes
- Pick a finite tensor–dual generating set \( \mathcal G=\{V_a\} \) (e.g., standard rep, its dual, and a few low tensor powers).
- Log the fibers \( \widehat{\mathrm{Head}}_\beta(V_a) \) during diagnostics; these are exactly the “features on probes.”
- (Optional) Coend reconstruction: \( \displaystyle \mathcal H_\beta^{\mathrm{rec}}=\int^{V} V^\vee\!\otimes \widehat{\mathrm{Head}}_\beta(V) \), then pass to \( \Gamma \)-coinvariants to recover \( \mathcal H_\beta \).
Hecke & DFA as natural maps
- Hecke naturality: postcomposing \( \eta:h_V\!\Rightarrow\!\widehat{\mathrm{Head}}_\beta \) with \( \eta^{(n)} \) corresponds to applying \( T_n \) on the \( \mathcal H_\beta \)-factor of \( \widehat{\mathrm{Head}}_\beta(V) \).
- DFA compliance: the comparison \( \widehat{\mathrm{Head}}_\beta\!\Rightarrow\!\mathsf T_{\mathrm{DFA}}\widehat{\mathrm{Head}}_\beta \) is natural in \(V\); stable heads land in the invariant image.
Physics link (CTQ gravity)
- Observer–probe principle: the measured BCH spectrum and \( \lambda_{\mathrm{eff}}(t) \) are functions of a small probe set \( \mathcal G \).
- Gauge invariance: functorial invariants (Hecke spectra, heat trace, BCH functionals) match GR’s “physics = invariants” ethos.
Twin verification via Yoneda (cryptographic twins)
Two heads \( \widehat{\mathrm{Head}}_\beta \) and \( \widehat{\mathrm{Head}}'_\beta \) are cryptographic twins if there is a unitary monoidal natural isomorphism
\( \eta:\widehat{\mathrm{Head}}_\beta \Rightarrow \widehat{\mathrm{Head}}'_\beta \)
that intertwines all Hecke maps and respects the DFA comparison.
Checklist (finite generator test)
- Choose generators: fix a tensor–dual generating set \( \mathcal G=\{V_a\} \subset \mathsf{Rep}(\Gamma) \).
- Fiber match: find unitary maps \( \theta_{V_a}: \widehat{\mathrm{Head}}_\beta(V_a) \!\to\! \widehat{\mathrm{Head}}'_\beta(V_a) \) (use unitary Procrustes on the logged features).
- Naturality: verify \( \theta \) commutes with the generating morphisms between \( V_a \)’s.
- Monoidality: check \( \theta_{V\otimes W} = \mu'_{V,W}\!\circ(\theta_V\!\otimes\!\theta_W)\!\circ\mu_{V,W}^{-1} \) on probe pairs.
- Hecke/DFA squares: confirm \( \theta\circ \eta^{(n)}=\eta'^{(n)}\!\circ \theta \) and naturality with \( \mathsf T_{\mathrm{DFA}} \).
Conclude twinhood.
If the five items hold on \( \mathcal G \), Yoneda + monoidality extend \( \theta \) uniquely to a unitary monoidal natural isomorphism
\( \eta:\widehat{\mathrm{Head}}_\beta \Rightarrow \widehat{\mathrm{Head}}'_\beta \).
Invariants to compare (should match for twins)
- Hecke spectra: eigenvalues of \( \{\eta^{(n)}\} \) on each \( \widehat{\mathrm{Head}}_\beta(V_a) \).
- Heat trace / spectral action proxies: \( \mathrm{Tr}(e^{-tL_\beta}) \), \( \lambda_{\mathrm{eff}}(t) \).
- BCH field: input-projected Gram eigenvalues of \( [\xi,X](t) \) on first layers.
- DFA invariants: dimension of the DFA-invariant subspace and its stability under CEAS.
Notes
- \( \mathbb H^2 \) vs \( \mathbb H^d \): the Yoneda test is geometry-agnostic; only the kernel/trace proxies change when moving to \( d=3,4 \).
- WMAP checkpoints: I pick \( \mathcal G \) to reflect the symmetries seen by the hyperbolic sampler; matching fibers on \( \mathcal G \) aligns models across runs.
Orbit–jump: diagonal isometries on weights and data
Core idea: map models along orbits of a symmetry group. Apply a single isometry
\( \varphi\in\mathrm{Isom}(\mathbb H^d) \) simultaneously to the model’s geometric
weights and to the data anchors, i.e.
\( (q_i,k_j; x) \mapsto (\varphi q_i,\varphi k_j; \varphi x) \),
while keeping the one–sided automorphic kernel
\[
K_\beta(q,k)=\sum_{\gamma\in\Gamma_{\rm trunc}} \exp\!\big(-\beta\, d_{\mathbb H}(q,\gamma k)\big)
\]
and conjugating the truncation \( \Gamma_{\rm trunc}\leftarrow \varphi\,\Gamma_{\rm trunc}\,\varphi^{-1} \).
Because hyperbolic distance is isometry-invariant, the forward map is preserved exactly; this yields
cryptographic twins of a trained model.
Diagonal action ≠ ordinary equivariance.
Typical equivariant nets enforce \(f(g\!\cdot\!x)=\rho(g)f(x)\) by tying parameters. Here, after training, this framework
transports the entire solution along an orbit:
\[
\{q_i,k_j\}\mapsto\{\varphi q_i,\varphi k_j\},\quad
\Gamma_{\rm trunc}\mapsto \varphi\Gamma_{\rm trunc}\varphi^{-1},\quad
x\mapsto \varphi x,
\]
so logits based on \(d_{\mathbb H}(q,\gamma k)\) and evaluations on \(\varphi x\) are unchanged. This produces
infinitely many functionally identical twins indexed by \(\varphi\), with exact equality (up to relabeling) when
\(\varphi\) lies in the normalizer/commensurator of \(\Gamma\).
What this framework solves
- Symmetry-preserving model transport: Transports neural models along a group orbit by preserving the forward map via
isometry-invariant distances and conjugation of the automorphic group action.
- Constructive twin generation: Enables infinite, behaviorally identical twins \( f_{\varphi_j} \) by pushing weights and data together
under known group actions \( \varphi_j \in G \).
- Bypasses NP-hard extraction: Avoids discovering invariances (which is NP-hard); instead, directly acts using known symmetry structure.
How this circumvents NP-hardness
- Does not search for hidden group structure; assumes group is known.
- Applies geometric group theory and differentiable mappings to transform model weights and data directly.
- Preserves function through invariant metrics and conjugation of automorphic group action.
Orbit–Jump Controller: Automorphic Shortcuts for Training
Use DFA + Langlands diagnostics to select isometries \( \varphi\in\mathrm{Isom}(\mathbb H^d) \) that leap across basins where standard gradient steps stall.
Non-commutativity turns symmetry into an optimization step.
Key choices.
One-sided automorphic kernel:
\[
K_{\beta}(q,k)=\sum_{\gamma\in\Gamma_{\rm trunc}}\exp\!\big(-\beta\, d_{\mathbb H}(q,\gamma k)\big)
\]
To make cryptographic twins (identical outputs), push all geometric weights by the same isometry:
\[
\{q_i,k_j\}\mapsto\{\varphi q_i,\varphi k_j\}
\]
and conjugate the truncation set:
\( \Gamma_{\rm trunc}\leftarrow \varphi\,\Gamma_{\rm trunc}\,\varphi^{-1} \).
Orbit–Jump Recipe
- Parameterize isometries. In \( \mathbb H^d \): \( \varphi(\xi)=\exp(\sum_a \xi_a J_a)\in SO^+(d,1) \) (boosts+rotations). In \( \mathbb H^2 \): \( \varphi(\xi)\in PSL(2,\mathbb R) \).
- Collect state features. Yoneda probes; CEAS stats \( H(\beta),\tfrac{dH}{d\beta},\mathcal K(\beta) \); Selberg/Huber (heat-trace fit, spectral bands, \( \lambda_{\rm eff}(t) \)); DFA cycle spectrum and \( \mathrm{KL}(P_{\rm DFA}\,\|\,P_{\rm auto}) \); small-prime Hecke checks.
- Score a candidate jump.
\[
\mathcal J(\varphi)=
\underbrace{\mathcal L_{\rm train}^{(+m)}(\varphi\!\cdot\!\theta)}_{\text{lookahead}}
+\alpha_{\rm ceas}(H(\beta)-H^\star)^2
+\alpha_{\rm spec}\,\mathrm{bandwidth}(\lambda_{\rm eff})
+\alpha_{\rm dfa}\,\mathrm{KL}
+\alpha_{\rm heck}\,\mathrm{err}_{\rm Hecke}
\]
- Pick \( \varphi \). (1) Differentiable lookahead (MAML-style) on Lie-algebra coords; (2) Black-box bandit/CMA-ES near identity; (3) RL policy \( \pi(\xi\mid\text{state}) \).
- Apply jump. Push \( (q,k)\leftarrow(\varphi q,\varphi k) \); update \( \Gamma_{\rm trunc}\leftarrow \varphi\,\Gamma_{\rm trunc}\,\varphi^{-1} \); shift DFA coupler consistently; resume CEAS-regulated training.
Timeline of relevant complexity results
| Year | Researcher(s) | Contribution |
| 1969–1972 |
Minsky & Papert |
Perceptrons (1969/1972).
Claim: While predating the formal definition of NP-completeness, this book first introduced the use of group invariance concepts to show what a perceptron cannot compute.
Significance: Contained the group invariance theorem, which stated that a network’s output can be expressed as a function of the input orbits. This was used to prove that certain invariant predicates lay beyond the capabilities of a single-layer perceptron. Ensign et al. later cite this as a precursor to their NP-hardness results.
|
| 1992 |
Blum & Rivest |
Learning neural networks is NP-hard.
Claim: Proved that learning a single hidden layer neural network with threshold gates is NP-hard, and that training a 3-node network is NP-complete.
Significance: Although not explicitly about group orbits, this was an early foundational result for the general hardness of neural network learning; the orbit-identification problem is a type of “learning” or “explanation,” grounding later NP-hardness proofs.
|
| 2017 → 2020 |
Ensign, Neville, Paul, Venkatasubramanian |
First direct NP-hardness proof for group invariants.
Claim: Extracting implicit group invariances from trained general-purpose neural networks is NP-hard.
Significance: Gave a formal reduction from the KNAPSACK problem to finding permutation invariants for a Boolean-input network, establishing hardness of orbit identification.
|
| 2021 |
Grein et al. |
Demonstrated Euclidean/E(3)-equivariant networks as a way to encode geometric symmetries in the architecture, avoiding post-hoc orbit discovery. |
| 2023–2024 |
Vardi et al. |
Showed that even learning under known symmetries can be exponentially hard in the Statistical Query (SQ) model, bounding symmetry-based training efficiency. |
| 2023–2025 |
William Chuang |
Early public pointer (Apr 8, 2023): The README of the well-distributed-schottky-groups repository (Schottky subgroups of PSL(2, R) for a hyperbolic-geometry master’s thesis) notes that the implementation “could also work as a cipher device for non-linear encryption,” explicitly suggesting Schottky/Möbius/Lorentz maps as a non-linear cipher and as a bridge to statistical-mechanics style ensembles.
First explicit orbit-transport commit (Oct 8, 2023): A separate personal repository generalizes these ideas into a metric-invariant architecture for transporting trained neural models along known group orbits.
Contribution: Bypasses the NP-hardness of orbit identification by avoiding post-hoc discovery altogether and instead applying explicit geometric operators to re-embed models across different manifolds while preserving function, dot-product structure, and symmetry. Develops a constructive, geometric, metric-invariant framework that jointly moves weights and data via conjugation by automorphic operators (Schottky / Langlands–Maass / Poincaré-series style), yielding function-identical “twins” and enabling orbit-jump optimization without solving the hard inverse problem of extracting implicit invariants.
Note: Independent research, not conducted under a university.
|
Distinction from prior work
- Not an equivariant network: Does not enforce equivariance by architectural constraints; operates post-training via orbit-preserving isometries.
- Not parameter-only symmetry: Unlike neuron permutation or scaling twins, this method moves both model and data with conjugated group kernel.
- Not data-only augmentation: Pushes the entire system (model, data, automorphic kernel) under the same geometric transformation.
One-liner summary.
Extracting hidden symmetries in neural networks is NP-hard (Ensign et al., 2017). This method bypasses the hardness by constructing a forward-preserving orbit action on weights and data, and then leveraging non-commutativity with optimizers to accelerate training.
Exact twins. Conjugation keeps equality to round-off. If \( \varphi \) lies in the normalizer/commensurator of \( \Gamma \), the truncated list is unchanged up to relabeling.
Safety guards
- Early-reject \( \varphi \) if \( \mathcal J \) worsens beyond tolerance.
- Trust region on Lie-algebra step size to avoid degeneracy.
- Periodic Yoneda naturality checks to certify twinhood.
Pseudo-loop
for step in training:
train k SGD steps with CEAS
if step % T == 0:
S = collect_state(Yoneda, CEAS, SelbergHuber, DFA, Hecke)
φ* = argmin_φ J(φ; S) # option 1/2/3
if accept(φ*):
q, k = φ*·q, φ*·k
Γ_trunc = φ*·Γ_trunc·(φ*)^{-1}
Relation to Fourier Neural Operators (FNO)
- Beats: curved/quotient domains \( \Gamma\backslash\mathbb H \) and arithmetic/automorphic tasks; native kernels + Selberg/Huber control; orbit-jumps exploit GD–symmetry non-commutativity.
- FNO wins: flat, periodic PDE boxes (FFT \( O(N\log N) \), strong resolution-invariance).
- Hybrid: automorphic (Laplace–Beltrami/Hecke) block with orbit-jumps, plus an FNO block on near-Euclidean charts.
Seven bridges → Einstein–Hilbert action
The bridges carry positive/Lorentzian observations onto a negatively curved, \( \Gamma \)-automorphic stage where Laplace-type analysis is valid.
They supply: (i) automorphy, (ii) a Laplace-type generator with a well-behaved heat trace, and (iii) scale separation.
- A1–A3 (symbolic–arithmetic): modular symbols; Poisson–Helgason; arithmetic lifts.
- B1–B2 (thermodynamic encoders): transfer operators; horocycle/geodesic encodings.
- C1–C2 (functorial): moduli-stack lift; Langlands-style functoriality.
Result.
With a suitable test function \( f \), the spectral action \( \mathcal S_{\mathrm{spec}}(L_\beta,\Lambda)=\mathrm{Tr}\,f(L_\beta/\Lambda^2) \)
expands as \( c_0 \Lambda^d \mathrm{Vol} + c_2 \Lambda^{d-2}\!\int \sqrt{-g}\,R + \cdots \);
the \(c_2\) term is of Einstein–Hilbert type. A Regge-style graph functional converges to the same curvature term under refinement.
Milestones
- Spectral–thermodynamic coefficient match.
Derive Einstein-like equations from the CEAS free energy and fit
αEH(CEAS).
Compare to the spectral-action coefficient
αEH(spec) obtained on
X = Γ\Hd (Route A); report
ρ = αEH(CEAS) / αEH(spec).
- CEAS ablation (validity, not dependence).
Set
αec=0 to ablate CEAS and verify that the bridge-based routes
(spectral-action, Regge, Fisher–Rao) still yield a stable EH term on
X = Γ\Hd. Use band flatness of
λeff(t) and stable heat-trace fits as criteria; CEAS should mainly narrow variance and provide a complementary thermodynamic derivation.
Reproducibility
Diagnostics run on a trained GRAILAttention (with optional DFA).
If the WMAP V-band FITS is absent locally, a synthetic hyperbolic sampler reproduces the reported spectra using the same code path.
Roadmap: \( \mathbb H^d \) ( \(d=3,4\) )
- Switch to the Poincaré ball distance (dimension-agnostic) in the kernel.
- Replace \( \mathrm{PSL}(2,\mathbb Z) \) proxies with lattices in \( SO^+(d,1) \); new generators and length extractors.
- Adopt higher-dimensional Selberg/Huber weights (not \( \ell / 2\sinh(\ell/2) \)).
- Keep CEAS, DFA, and BCH probe unchanged (geometry-agnostic).
Metric-invariant algebra: replace scalar products by \( d_M \)
The core idea extends far beyond automorphic kernels. Replace scalar products everywhere with a
Riemannian (or pseudo-Riemannian) metric distance \(d_M(\cdot,\cdot)\) on a manifold \( (M,g) \)
with isometry group \(G=\mathrm{Isom}(M)\). The fundamental invariance
\[
d_M(\varphi q,\varphi k)=d_M(q,k)\qquad\forall\,\varphi\in G
\]
makes \(d_M\) a building block for scores, gates, and whole forward passes.
Construct metric-based operators (no automorphy required).
For any scalar function \(F:\mathbb R_{\ge 0}\!\to\!\mathbb R\) and any algebraic/compositional use ( \(+,-,\times,/\), powers,
rational forms, thresholds ), define
\[
S_{ij}=F\!\big(d_M(q_i,k_j)\big).
\]
Because \(d_M\) is isometry-invariant, every expression built solely from \(\{d_M(q_i,k_j)\}\) is unchanged under the
diagonal action \( (q_i,k_j;x)\mapsto(\varphi q_i,\varphi k_j;\varphi x) \).
Twin models without automorphy
If a forward map \(\mathcal F\) depends only on metric distances and shared readouts,
\[
\mathcal F\big(\{d_M(q_i,k_j)\},\,\varphi x\big)=\mathcal F\big(\{d_M(\varphi q_i,\varphi k_j)\},\,\varphi x\big)
=\mathcal F\big(\{d_M(q_i,k_j)\},\,x\big),
\]
then applying the same isometry \(\varphi\) to both geometric parameters and data yields
function-identical twins — no automorphy needed.
Examples of metric primitives
- Metric kernels: \(e^{-\beta d_M}\), \(1/(1+\alpha d_M)\), \((d_M+\epsilon)^{-p}\), truncated/polynomial expansions.
- Distance matrices as logits: \(S_{ij}=F(d_M(q_i,k_j))\) followed by softmax/normalization.
- Gates & masks: indicators \(1\{d_M\!\le\!\tau\}\), annealed via \(F\).
- Heat/Green surrogates: use \(F(d_M)\) as a chart-free proxy for diffusion/propagators.
Automorphy is optional.
Automorphic sums (e.g., one-sided Poincaré \( \sum_{\gamma} e^{-\beta d_M(q,\gamma k)} \)) add arithmetic/geometric structure.
They are not required for twins. When used, preserve exactness by conjugating the truncated set:
\( \Gamma_{\rm trunc}\leftarrow \varphi\,\Gamma_{\rm trunc}\,\varphi^{-1} \).
Practical guardrails
- Ensure every non-metric feature that influences logits (biases, normalizers) is transformed consistently; otherwise twinhood can break.
- For Minkowski/pseudo-Riemannian settings, choose the appropriate invariant (e.g., Lorentz interval) and restrict to the proper isometry subgroup (e.g., \(SO^+(d,1)\)).
- Numerical charts should be consistent across the diagonal move to keep distance computations stable.
Novelty & claim (to the best of current knowledge)
Claim.
This framework provides, to the best of current knowledge, the first repeatedly tested method that
bypasses the NP-hard problem of post-hoc symmetry extraction for neural networks by:
(i) applying a single isometry \( \varphi\in\mathrm{Isom}(\mathbb H^d) \) to both model geometry and data,
(ii) keeping a one-sided automorphic kernel \( K_\beta(q,k)=\sum_{\gamma\in\Gamma_{\rm trunc}}\exp(-\beta\,d_{\mathbb H}(q,\gamma k)) \),
and (iii) conjugating the truncation \( \Gamma_{\rm trunc}\leftarrow \varphi\,\Gamma_{\rm trunc}\,\varphi^{-1} \).
This yields function-identical twins by construction and enables orbit-jump optimization.
Beyond automorphy.
The same diagonal-isometry idea extends to any manifold metric \(d_M\) with \(d_M(\varphi q,\varphi k)=d_M(q,k)\).
Any forward map built solely from \( \{d_M(q_i,k_j)\} \) remains identical under the diagonal action
\( (q_i,k_j;x)\mapsto(\varphi q_i,\varphi k_j;\varphi x) \).
Hence there is an infinite design space of twin-generating constructions (via algebraic/compositional uses of \(d_M\)),
and twin models do not require automorphy.
Beyond isometry.
Twin generation does not require distance preservation specifically. If the forward map depends only on a
scalar invariant \( I(q,k) \) that is preserved by a group action \( g \) (i.e., \( I(g\,q, g\,k)=I(q,k) \)),
then applying the same group element diagonally to weights and data leaves outputs unchanged:
\( (q_i,k_j;x)\mapsto(g\,q_i, g\,k_j; g\,x) \).
Examples of admissible invariants include:
- Metric distances \( d_M \) on any Riemannian/pseudo-Riemannian manifold with the invariance \( d_M(g q, g k)=d_M(q,k) \).
- Conformal/projective invariants (e.g., cross-ratios on \( \partial\mathbb H \)) preserved by the chosen symmetry group.
- Physics-meaningful invariants (e.g., gauge-invariant scalars/Casimirs from the ambient geometry).
- Algebraic/compositional uses of a fixed invariant \( I \) (e.g., \(+,-,\times,/,\log,\sum,\prod\)) applied consistently across the model.
Note: for automorphic kernels, isometry is required to preserve the one-sided Poincaré sum and thus
the exact automorphy (with conjugation \( \Gamma_{\rm trunc}\!\leftarrow\!g\,\Gamma_{\rm trunc}\,g^{-1} \)).
For metric-only or invariant-only twin constructions, automorphy is unnecessary; diagonal action by any group that preserves \( I \) suffices for identical outputs.
Beyond scalar computation. The diagonal-isometry framework extends beyond neural architectures. Any computational system—classical or Turing-complete—can be embedded in a curved manifold \( (M, g) \) by replacing scalar multiplications with invariant functions \(F(I(q,k))\), where \(I\) is preserved by a known group action \(g\). Model instructions, register values, memory contents, and data inputs are all treated as vector points \(p_i \in M\), and transported together via diagonal group action: \[ (p_i;x) \mapsto (g\,p_i;\,g\,x) \] This yields functionally identical machines or programs under geometric transport. Thus, even legacy OS architectures or classical machines can be upgraded to curvature-aware, symmetry-transportable systems before the rise of AI-native substrates.
What is—and isn’t—being claimed
- Bypass, not contradiction. The classical NP-hardness (post-hoc discovery of hidden invariances) is not contradicted. The framework assumes a known symmetry group and provides a constructive transport along its orbits.
- In-loop optimization, not just transport. Beyond producing exact twins, the framework includes an
orbit-jump controller that uses Langlands-triad diagnostics (automorphic ↔ Galois/DFA ↔ thermodynamic/Selberg–Huber)
to select loss-decreasing Lorentz/Möbius moves \( \varphi \) during training. These non-SGD steps exploit real-world
non-commutativity to reduce loss between gradient updates.
- Scope (automorphic specialization). Works with the one-sided Poincaré/automorphic kernel on \( \Gamma \backslash \mathbb H^d \), acts diagonally on (weights, data), and preserves exactness via conjugation of \( \Gamma_{\rm trunc} \).
- Scope (metric/invariant twinhood). For metric-only or invariant-only constructions using \(d_M\) or a scalar invariant \(I\), automorphy is optional; exact twinhood holds whenever logits depend only on the preserved invariant and the same group action is applied to both model geometry and data.
- Evidence. Empirically validated across repeated experiments; forward equality follows from invariance of the chosen scalar (distance or other \(I\)) and, in the automorphic case, from the relabeling \( \gamma\mapsto \varphi\gamma\varphi^{-1} \).
Suggested formal naming
- Gauge-Lifted Neural Transport via Invariant Orbit Geometry
- Invariant-Lifted Model Transport under Symmetric Geometries
- Symmetry-Orbit Construction of Functionally Identical Neural Twins
- Orbit-Preserving Neural Transport via Group-Conjugated Kernels
Limits & guardrails.
Automorphic exactness requires a known lattice/group and one-sided kernel with consistent conjugation of \( \Gamma_{\rm trunc} \).
Metric/invariant twins require that the forward map depend solely on a group-preserved scalar and that the diagonal group action be applied to both model geometry and data.
The optimization component selects \( \varphi \) within a known symmetry group; it does not attempt to
discover unknown symmetry groups, and thus avoids the NP-hard post-hoc extraction problem.
Independence & research context
This project is an independent effort developed outside a university setting. The work spans physics,
mathematics, statistics, and AI/CS, and proceeded independently because prior academic roles did not
provide the mandate or latitude to propose and build new frameworks at this scope.
Why independent.
- Novelty constraints. Student positions emphasized surveys and expository writing;
proposing original architectures or cross-domain frameworks was often discouraged or deemed out of remit.
- Advisor-familiarity bounds. Work was expected to remain within areas already familiar
to advisors; deep interdisciplinary directions (physics ↔ math ↔ statistics ↔ AI/CS) were effectively
outside the operating envelope.
- Framework-level research. Program structures prioritized incremental contributions
over paradigm-level design. Building a replacement or generalization of existing frameworks required
independence to maintain scope and pace.
Standards & focus.
The project does not lower the bar to fit legacy incentives. Time and attention are allocated to efforts that
meet a high standard: technical novelty anchored in first principles, falsifiable predictions,
cross-validated experiments, and public artifacts (code, logs, diagnostics) that enable external replication.
Engagement is prioritized where these standards can be upheld without dilution.
Provenance & transparency
- Public record: first public GitHub commit for this line of work on
Oct 8, 2023 (see project repository).
- Self-funded, independent: no institutional sponsorship; artifacts and diagnostics are
released to enable external replication.
- Positioning: statements here reflect personal experience; technical claims are grounded
in the reproducible codebase and empirical logs accompanying the work.
Collaboration stance.
Collaboration and institutional partnerships are welcome when they preserve the ability to pursue
interdisciplinary research at full fidelity and to publish complete, verifiable results without constraint.