1. Input Tensors

ids, true_coords, true_backbone, masks
N CA C O N CA C . . .

Each protein is represented as a chain of residues. v15 inherits the same input format as v14, tracking 4 backbone atoms per residue:

  • ids — (B, L) token sequence
  • true_coords — (B, L, 3) Cα coordinates
  • true_backbone — (B, L, 4, 3) [N, CA, C, O] atoms
  • coord_mask — (B, L) per-residue mask
  • backbone_mask — (B, L, 4) per-atom mask

2. Frozen Encoder FROZEN

ContactClassifier → single embeddings + contact probabilities
Sequence Encoder frozen single (B,L,128) contacts (B,L,L)

A pre-trained encoder converts amino acid sequences into two representations. Its weights are not updated during training.

  • Pre-trained ContactClassifier (weights frozen)
  • Output: single (B, L, 128) — per-residue embeddings
  • Output: contact_probs (B, L, L) — pairwise contact map

3. Pair Stack

SingleToPairV13 + RPE + OPM + Contact Conditioning → 8× EnhancedPairBlock
i k i j ×8 triangle updates

Builds a rich pairwise representation by combining several signals, then refines it with triangle multiplicative updates.

  • SingleToPairV13 projects single → pair representation
  • LogScaledRPE — 128 bins, max distance 512 residues
  • OuterProductMean — d_opm=32
  • Contact Map Conditioning — gated, d=128
  • EnhancedPairBlock: TriMul (out + in) + AxialBlock
  • Output: pair (B, L, L, 128) — also feeds pAE head

4. Diffusion Process

Normalize → sample t → add noise to Cα coords
clean → +noise → noisy
  • Normalize Cα coordinates (center + scale)
  • Sample diffusion timestep t ~ U(0, T)
  • Add Gaussian noise to Cα positions: x_t = α_t x_0 + σ_t ε
  • Note: Noise applied only to Cα — full backbone is derived from frames

5. Frame Init — Peptide Plane

Peptide Plane Frames from N-CA-C (more stable than CA-triplet Gram-Schmidt)
CA x y z N C

Each residue gets a local coordinate frame (rotation + translation) built from N, CA, C atoms.

  • e1 = norm(CA->C)
  • e3 = norm((N-CA) x e1)
  • e2 = e3 x e1
  • R = [e1 | e2 | e3], t = CA position

Self-Conditioning (50%): With 50% probability, run denoiser once to get x̂₀, then feed as additional input for the real forward pass.

6. IPA Denoiser + BackboneAtomHead

8× IPABlock → x̂₀, R̂ → BackboneAtomHead → full [N, CA, C, O]
IPA x8 x0,R h(512) BackboneHead N,CA,C,O (B,L,4,3)

IPA is a special attention mechanism that operates in 3D space, making it SE(3)-equivariant. Three attention channels are merged: scalar QKV, point-based geometric attention, and pair bias.

BackboneAtomHead — predicting all 4 atoms:

After IPA predicts CA positions and frames, the BackboneAtomHead places N, C, O atoms relative to each CA using ideal geometry + learned corrections.

  • MLP: Linear(512→256) → ReLU → Linear(256→256) → ReLU → Linear(256→9)
  • Output 9 values = 3 atoms × 3 coords (dN, dC, dO)
  • pos_atom = t + R @ (ideal_offset + delta)
  • Zero-init: starts at ideal geometry

Output: backbone_pred (B, L, 4, 3) and single representation h (B, L, 512) — feeds pLDDT head

7. pLDDT Head v15 NEW

Predicted Local Distance Difference Test — per-residue confidence
low mid high pLDDT per residue

pLDDT (predicted Local Distance Difference Test) measures per-residue accuracy by checking whether local CA-CA distances are preserved within distance thresholds (0.5, 1.0, 2.0, 4.0 Å). The head learns to predict its own accuracy — forcing the trunk to represent uncertainty.

Architecture:

single repr h (B, L, 512) pLDDT MLP Linear(512->256) + ReLU Linear(256->50) 50-bin dist over LDDT [0,1] (B, L, 50) Per-residue confidence coloring 1 0.15 0.92 0.95 0.20

LDDT definition:

For each residue, LDDT checks all CA-CA pairs within a 15Å radius and computes the fraction of distances preserved within thresholds:

LDDT_i = (1/4) ∑ [|d_pred - d_true| < τ] / N_contacts, τ ∈ {0.5, 1.0, 2.0, 4.0}Å
  • Ranges from 0 (completely wrong) to 1 (perfect)
  • The 50-bin distribution is trained with cross-entropy on binned ground-truth LDDT
  • Loss weight: w = 0.2
  • At inference: pLDDT = ∑ bin_center × softmax(logits) gives a scalar per residue

Why it matters: The head forces the trunk's single representation to encode uncertainty. Loops and disordered regions get low pLDDT, well-packed cores get high pLDDT — the model learns what it doesn't know.

8. pAE Head v15 NEW

Predicted Aligned Error — pairwise domain-level confidence
L x L aligned error matrix Green blocks = confident domains Red off-diag = uncertain inter-domain pAE Head (B,L,L,64)

For each pair (i, j), pAE predicts the error of residue j's position when the structure is aligned using residue i's local frame. This tells us which parts of the structure are well-determined relative to each other — critical for multi-domain proteins where domains may be individually correct but their relative orientation is uncertain.

Architecture:

pair repr z (B, L, L, 128) from pair stack pAE MLP Linear(128->64) + ReLU Linear(64->64) 64-bin dist over [0, 32]Å (B, L, L, 64)

The L×L aligned error matrix:

Predicted Aligned Error (pAE) matrix scored residue j → aligned residue i → Domain A Domain B LOW error LOW error HIGH error HIGH error 32Å (uncertain) 16Å 0Å (confident)
  • Input: pair representation (B, L, L, 128) from the pair stack
  • MLP: Linear(128→64) → ReLU → Linear(64→64)
  • 64 bins over [0, 32]Å aligned error range
  • Loss: cross-entropy on binned aligned error, w = 0.2
  • At inference: pAE_ij = ∑ bin_center × softmax(logits)

Key insight: pAE is not symmetric — aligning on residue i and scoring residue j is different from the reverse. Block-diagonal structure in the pAE matrix reveals domain boundaries: within-domain pairs have low error, cross-domain pairs have high error when the relative orientation is uncertain.

9. Auxiliary Heads

AuxPairStack + OrdinalDistanceHead + RgPredictor
  • AuxPairStack — lightweight pair refinement
  • OrdinalDistanceHead — ordinal distance bins
  • RgPredictor — radius of gyration prediction

Cα Losses 9 terms

Inherited from v13b

Each loss enforces a different geometric property:

LossWeightClamp
FAPE1.0max 10.0
Frame Rotation0.5max 2.0
Distance MSE1.0max 10.0
Bond Geometry3.0max 10.0 (annealed 1→3)
Chirality0.1
Angle0.5
Clash0.1max 25.0
Aux Distance0.03
Rg Loss0.5

BB Losses 4 terms

Scaled by bb_ramp schedule
LossWeightClamp
BB FAPE1.0max 10.0
BB Bond2.0max 10.0
BB Angle0.5max 5.0
Omega0.5max 5.0

bb_ramp schedule:

bb_ramp = min(epoch / 5, 1.0)

Confidence Losses v15 NEW 2 terms

pLDDT cross-entropy + pAE cross-entropy (w=0.2 each)

Two new loss terms for the confidence prediction heads. These are not included in the structural metric.

LossWeightDetails
pLDDT CE0.2Cross-entropy on 50-bin LDDT distribution
pAE CE0.2Cross-entropy on 64-bin aligned error distribution

Ground truth LDDT is computed from predicted vs. true CA-CA distances, then binned into 50 uniform bins over [0, 1]. Ground truth aligned error is computed per-pair by aligning on frame i and measuring distance of residue j, then binned into 64 bins over [0, 32]Å.

Total losses in v15: 9 (CA) + 4 (BB) + 2 (confidence) = 15 terms

11. Tracked Metric: val_structural

total − aux×0.03 − bb_contribution − confidence_losses
Validation metric (for early stopping and checkpointing) val_structural = total - aux*0.03 - bb_contribution - plddt_loss*0.2 - pae_loss*0.2 bb_contribution = bb_ramp * (w_fape*l_fape + w_bond*l_bond + w_angle*l_angle + w_omega*l_omega) Structural metric excludes confidence losses — confidence heads are ancillary and should not affect structural checkpointing

12. Training Configuration

AdamW, per-module LR, CFG 10%, EMA 0.999 — init from v14 run3 best

Initialization: loaded from v14 run3 best weights v15 NEW

All existing modules (encoder, pair stack, IPA denoiser, backbone atom head, auxiliary heads) are initialized from the best v14 run3 checkpoint. The two new confidence heads (pLDDT, pAE) are randomly initialized.

Per-module Learning Rates (AdamW):

ModuleLRNote
Denoiser (IPA)2e-5base
Pair Stack6e-5
BackboneAtomHead10e-5
pLDDT Head10e-5v15 NEW
pAE Head10e-5v15 NEW

LR Schedule:

epochs LR 0 5 20 40 warmup constant cosine decay

CFG: 10% dropout (classifier-free guidance)

EMA: 0.999 exponential moving average