v14 — Full Backbone Diffusion Model

Click any block to expand. Every block has a diagram showing what it does.

Input Protein Backbone Data NEW: full backbone
N CA C O 1.458Å 1.523Å peptide bond 1.329Å N CA C O one residue: (4 atoms, 3 coords each) tensor shape: (B, L, 4, 3)
v14 predicts all 4 backbone atoms per residue, not just Cα. Each residue is represented as [N, CA, C, O] — the minimal backbone that defines the protein chain geometry.
1 Frozen Protein Encoder
amino acid sequence M E T H I K ... Contact Classifier 🔒 frozen single (B, L, 128) contacts (B, L, L)
Pre-trained encoder converts amino acid sequence into per-residue features (128-dim) and a contact probability map (which residues are physically close). All weights frozen — no gradients flow back.
2 Pair Stack — Triangle Updates ×8
Triangle Multiplicative Update How edge (i,j) learns about the structure i j k edge(i,k) edge(k,j) edge(i,j) gets updated The idea: • To understand how residue i   relates to residue j... • Look at ALL intermediate   residues k that connect them • Aggregate: edge(i,j) +=   Σk edge(i,k) × edge(k,j) • Repeated 8× to propagate   long-range structural info
The pair stack builds a pairwise relationship matrix (B, L, L, 128). Each entry describes how two residues relate structurally. The triangle update is inspired by AlphaFold2: if residue i is near k, and k is near j, then i is likely near j. Eight rounds of this propagate information across the whole chain. Also includes contact map conditioning (gated injection from encoder) and OuterProductMean (d=32).
3 Diffusion — Add Noise, Learn to Denoise
clean structure x₀ + noise (ε) xₜ = √α̅ₜ⋅x₀ + √(1-α̅ₜ)⋅ε noisy structure xₜ model predicts denoise xₜ → x̂₀ predicted structure x̂₀
Training: Take a known protein structure, add random Gaussian noise at a random timestep t (out of 1000), then train the model to predict the original clean structure from the noisy version. Generation: Start from pure noise, iteratively denoise using DDIM (50 steps) to generate new protein structures.
4 Frame Initialization — Peptide Plane NEW
Local Frame from Peptide Plane N CA origin C x (CA→C) z (plane normal) y (z × x) Why peptide plane? v13 (old): CA triplets • 3 consecutive Cα atoms • Nearly collinear at high noise • Gram-Schmidt fails → NaN v14 (new): N-CA-C plane • Intra-residue atoms, always valid
Each residue gets a local coordinate frame (3 axes + origin at CA). The frame is built from the peptide plane defined by N, CA, C within the same residue. This is more stable than v13's approach of using 3 consecutive CA atoms. At high noise levels, CA triplets become nearly collinear and crash. Also includes SNR-gated confidence (SLERP toward identity at low signal) and self-conditioning (50% of the time, use a previous prediction as extra input).
5 IPA Denoiser — 8 Layers
One IPA Layer (Invariant Point Attention) node features h (B, L, 512) Scalar Attention Standard Q, K, V 8 heads, 512-dim + pair bias on logits Point Attention query points 3D queries in local frames concat → 1216 → Linear → 512 FFN: 512 → 1024 → 512 Frame Update: MLP → ΔR, Δt → SE(3) compose ×8
The core denoiser. Invariant Point Attention (IPA) combines standard sequence attention with 3D geometric attention — queries and keys are actual 3D points positioned in each residue's local coordinate frame. This makes the attention SE(3)-equivariant: the output doesn't change if you rotate the whole structure. After each layer, the frame update refines each residue's rotation and translation. 8 layers of refinement progressively sharpen the structure prediction.
6 BackboneAtomHead — Place N, C, O NEW
Placing Backbone Atoms in the Local Frame x z CA origin ideal N N ideal C C O How it works 1. MLP predicts small offsets: h(512)→256→256→9 (3 atoms × 3 xyz = 9 deltas) 2. Add to ideal bond geometry: offset = ideal + Δ 3. Rotate into global coords: pos = CA + R × offset
The key innovation of v14. After the IPA denoiser predicts frames (R, t), the BackboneAtomHead places N, C, O atoms relative to CA using learned offsets from ideal bond geometry (N-CA = 1.458Å, CA-C = 1.523Å, C=O = 1.231Å). The MLP is zero-initialized, so it starts by producing ideal geometry and learns residue-specific corrections (proline kinks, glycine flexibility). CA is always exactly at the frame origin.
7 Loss Functions — 13 Terms

CA-level losses (inherited from v13b) — supervise Cα positions

FAPE
w = 1.0
Frame-aligned point error — how well do predicted points match true points in each residue's local frame?
Frame Rotation
w = 0.5
Angular distance between predicted and true coordinate frames: 1 - cos(θ)
d
Distance MSE
w = 1.0
MSE on all pairwise Cα distances
3.8Å
Bond Geometry
w = 3.0 (annealed 1→3)
Consecutive Cα distance vs 3.8Å target
L/D
Chirality
w = 0.1
Signed volume of Cα quartets — correct handedness
θ
Angle
w = 0.5
MSE on Cα-Cα-Cα bond angles
overlap!
Clash
w = 0.1
Penalizes atoms closer than 3.8Å
Aux Distance
w = 0.03
Ordinal BCE on 32-bin distance predictions
Rg
Rg Loss
w = 0.5
MSE on log(radius of gyration) — ensures correct overall compactness

Backbone losses NEW — supervise N, C, O atom positions (scaled by bb_ramp: 0→1 over first 5 epochs)

BB FAPE
w = 1.0
FAPE over all 4 backbone atoms in local frames
1.461.52
BB Bond
w = 2.0
MSE on N-CA, CA-C, C=O, C-N bond lengths vs ideal
111.2°
BB Angle
w = 0.5
MSE on N-CA-C, CA-C-N, C-N-CA angles
ω=180°
Omega
w = 0.5
Peptide bond planarity: 1 + cos(ω). Trans (ω=180°) → 0
Tracked Metric (Early Stopping)
structural = total − aux×0.03 − bb_ramp×(1.0×bb_fape + 2.0×bb_bond + 0.5×bb_angle + 0.5×omega)
CA-only structural loss, excluding aux and backbone terms. Patience: 15 epochs.
Config Training Configuration
Optimizer
AdamW
Denoiser LR
2e-5
Pair Stack LR
6e-5 (3×)
BB Head LR
10e-5 (5×) NEW
EMA Decay
0.999
CFG Uncond
10%
LR Schedule
  • Warmup: 5 epochs (linear ramp 1%→100%)
  • Constant: epochs 6–20 at full LR
  • Cosine decay: epochs 21–60 down to 1e-6