pLDDT + pAE confidence heads on top of v14 backbone diffusion · Click any box to expand
Each protein is represented as a chain of residues. v15 inherits the same input format as v14, tracking 4 backbone atoms per residue:
ids — (B, L) token sequencetrue_coords — (B, L, 3) Cα coordinatestrue_backbone — (B, L, 4, 3) [N, CA, C, O] atomscoord_mask — (B, L) per-residue maskbackbone_mask — (B, L, 4) per-atom maskA pre-trained encoder converts amino acid sequences into two representations. Its weights are not updated during training.
single (B, L, 128) — per-residue embeddingscontact_probs (B, L, L) — pairwise contact mapBuilds a rich pairwise representation by combining several signals, then refines it with triangle multiplicative updates.
pair (B, L, L, 128) — also feeds pAE headt ~ U(0, T)x_t = α_t x_0 + σ_t εEach residue gets a local coordinate frame (rotation + translation) built from N, CA, C atoms.
e1 = norm(CA->C)e3 = norm((N-CA) x e1)e2 = e3 x e1Self-Conditioning (50%): With 50% probability, run denoiser once to get x̂₀, then feed as additional input for the real forward pass.
IPA is a special attention mechanism that operates in 3D space, making it SE(3)-equivariant. Three attention channels are merged: scalar QKV, point-based geometric attention, and pair bias.
BackboneAtomHead — predicting all 4 atoms:
After IPA predicts CA positions and frames, the BackboneAtomHead places N, C, O atoms relative to each CA using ideal geometry + learned corrections.
pos_atom = t + R @ (ideal_offset + delta)Output: backbone_pred (B, L, 4, 3) and single representation h (B, L, 512) — feeds pLDDT head
pLDDT (predicted Local Distance Difference Test) measures per-residue accuracy by checking whether local CA-CA distances are preserved within distance thresholds (0.5, 1.0, 2.0, 4.0 Å). The head learns to predict its own accuracy — forcing the trunk to represent uncertainty.
Architecture:
LDDT definition:
For each residue, LDDT checks all CA-CA pairs within a 15Å radius and computes the fraction of distances preserved within thresholds:
w = 0.2pLDDT = ∑ bin_center × softmax(logits) gives a scalar per residueWhy it matters: The head forces the trunk's single representation to encode uncertainty. Loops and disordered regions get low pLDDT, well-packed cores get high pLDDT — the model learns what it doesn't know.
For each pair (i, j), pAE predicts the error of residue j's position when the structure is aligned using residue i's local frame. This tells us which parts of the structure are well-determined relative to each other — critical for multi-domain proteins where domains may be individually correct but their relative orientation is uncertain.
Architecture:
The L×L aligned error matrix:
(B, L, L, 128) from the pair stackLinear(128→64) → ReLU → Linear(64→64)w = 0.2pAE_ij = ∑ bin_center × softmax(logits)Key insight: pAE is not symmetric — aligning on residue i and scoring residue j is different from the reverse. Block-diagonal structure in the pAE matrix reveals domain boundaries: within-domain pairs have low error, cross-domain pairs have high error when the relative orientation is uncertain.
Each loss enforces a different geometric property:
| Loss | Weight | Clamp |
|---|---|---|
| FAPE | 1.0 | max 10.0 |
| Frame Rotation | 0.5 | max 2.0 |
| Distance MSE | 1.0 | max 10.0 |
| Bond Geometry | 3.0 | max 10.0 (annealed 1→3) |
| Chirality | 0.1 | — |
| Angle | 0.5 | — |
| Clash | 0.1 | max 25.0 |
| Aux Distance | 0.03 | — |
| Rg Loss | 0.5 | — |
| Loss | Weight | Clamp |
|---|---|---|
| BB FAPE | 1.0 | max 10.0 |
| BB Bond | 2.0 | max 10.0 |
| BB Angle | 0.5 | max 5.0 |
| Omega | 0.5 | max 5.0 |
bb_ramp schedule:
Two new loss terms for the confidence prediction heads. These are not included in the structural metric.
| Loss | Weight | Details |
|---|---|---|
| pLDDT CE | 0.2 | Cross-entropy on 50-bin LDDT distribution |
| pAE CE | 0.2 | Cross-entropy on 64-bin aligned error distribution |
Ground truth LDDT is computed from predicted vs. true CA-CA distances, then binned into 50 uniform bins over [0, 1]. Ground truth aligned error is computed per-pair by aligning on frame i and measuring distance of residue j, then binned into 64 bins over [0, 32]Å.
Total losses in v15: 9 (CA) + 4 (BB) + 2 (confidence) = 15 terms
Initialization: loaded from v14 run3 best weights v15 NEW
All existing modules (encoder, pair stack, IPA denoiser, backbone atom head, auxiliary heads) are initialized from the best v14 run3 checkpoint. The two new confidence heads (pLDDT, pAE) are randomly initialized.
Per-module Learning Rates (AdamW):
| Module | LR | Note |
|---|---|---|
| Denoiser (IPA) | 2e-5 | base |
| Pair Stack | 6e-5 | 3× |
| BackboneAtomHead | 10e-5 | 5× |
| pLDDT Head | 10e-5 | 5× v15 NEW |
| pAE Head | 10e-5 | 5× v15 NEW |
LR Schedule:
CFG: 10% dropout (classifier-free guidance)
EMA: 0.999 exponential moving average