v15 — Full Backbone + Confidence Prediction

▼

1. Input Tensors

ids, true_coords, true_backbone, masks

Each protein is represented as a chain of residues. v15 inherits the same input format as v14, tracking 4 backbone atoms per residue:

ids — (B, L) token sequence
true_coords — (B, L, 3) Cα coordinates
true_backbone — (B, L, 4, 3) [N, CA, C, O] atoms
coord_mask — (B, L) per-residue mask
backbone_mask — (B, L, 4) per-atom mask

▼

2. Frozen Encoder FROZEN

ContactClassifier → single embeddings + contact probabilities

A pre-trained encoder converts amino acid sequences into two representations. Its weights are not updated during training.

Pre-trained ContactClassifier (weights frozen)
Output: single (B, L, 128) — per-residue embeddings
Output: contact_probs (B, L, L) — pairwise contact map

▼

3. Pair Stack

SingleToPairV13 + RPE + OPM + Contact Conditioning → 8× EnhancedPairBlock

Builds a rich pairwise representation by combining several signals, then refines it with triangle multiplicative updates.

SingleToPairV13 projects single → pair representation
LogScaledRPE — 128 bins, max distance 512 residues
OuterProductMean — d_opm=32
Contact Map Conditioning — gated, d=128
8× EnhancedPairBlock: TriMul (out + in) + AxialBlock
Output: pair (B, L, L, 128) — also feeds pAE head

▼

4. Diffusion Process

Normalize → sample t → add noise to Cα coords

Normalize Cα coordinates (center + scale)
Sample diffusion timestep t ~ U(0, T)
Add Gaussian noise to Cα positions: x_t = α_t x_0 + σ_t ε
Note: Noise applied only to Cα — full backbone is derived from frames

▼

5. Frame Init — Peptide Plane

Peptide Plane Frames from N-CA-C (more stable than CA-triplet Gram-Schmidt)

Each residue gets a local coordinate frame (rotation + translation) built from N, CA, C atoms.

e1 = norm(CA->C)
e3 = norm((N-CA) x e1)
e2 = e3 x e1
R = [e1 | e2 | e3], t = CA position

Self-Conditioning (50%): With 50% probability, run denoiser once to get x̂₀, then feed as additional input for the real forward pass.

▼

6. IPA Denoiser + BackboneAtomHead

8× IPABlock → x̂₀, R̂ → BackboneAtomHead → full [N, CA, C, O]

IPA is a special attention mechanism that operates in 3D space, making it SE(3)-equivariant. Three attention channels are merged: scalar QKV, point-based geometric attention, and pair bias.

BackboneAtomHead — predicting all 4 atoms:

After IPA predicts CA positions and frames, the BackboneAtomHead places N, C, O atoms relative to each CA using ideal geometry + learned corrections.

MLP: Linear(512→256) → ReLU → Linear(256→256) → ReLU → Linear(256→9)
Output 9 values = 3 atoms × 3 coords (dN, dC, dO)
pos_atom = t + R @ (ideal_offset + delta)
Zero-init: starts at ideal geometry

Output: backbone_pred (B, L, 4, 3) and single representation h (B, L, 512) — feeds pLDDT head

▼

7. pLDDT Head v15 NEW

Predicted Local Distance Difference Test — per-residue confidence

pLDDT (predicted Local Distance Difference Test) measures per-residue accuracy by checking whether local CA-CA distances are preserved within distance thresholds (0.5, 1.0, 2.0, 4.0 Å). The head learns to predict its own accuracy — forcing the trunk to represent uncertainty.

Architecture:

LDDT definition:

For each residue, LDDT checks all CA-CA pairs within a 15Å radius and computes the fraction of distances preserved within thresholds:

LDDT_i = (1/4) ∑ [|d_pred - d_true| < τ] / N_contacts, τ ∈ {0.5, 1.0, 2.0, 4.0}Å

Ranges from 0 (completely wrong) to 1 (perfect)
The 50-bin distribution is trained with cross-entropy on binned ground-truth LDDT
Loss weight: w = 0.2
At inference: pLDDT = ∑ bin_center × softmax(logits) gives a scalar per residue

Why it matters: The head forces the trunk's single representation to encode uncertainty. Loops and disordered regions get low pLDDT, well-packed cores get high pLDDT — the model learns what it doesn't know.

▼

8. pAE Head v15 NEW

Predicted Aligned Error — pairwise domain-level confidence

For each pair (i, j), pAE predicts the error of residue j's position when the structure is aligned using residue i's local frame. This tells us which parts of the structure are well-determined relative to each other — critical for multi-domain proteins where domains may be individually correct but their relative orientation is uncertain.

Architecture:

The L×L aligned error matrix:

Input: pair representation (B, L, L, 128) from the pair stack
MLP: Linear(128→64) → ReLU → Linear(64→64)
64 bins over [0, 32]Å aligned error range
Loss: cross-entropy on binned aligned error, w = 0.2
At inference: pAE_ij = ∑ bin_center × softmax(logits)

Key insight: pAE is not symmetric — aligning on residue i and scoring residue j is different from the reverse. Block-diagonal structure in the pAE matrix reveals domain boundaries: within-domain pairs have low error, cross-domain pairs have high error when the relative orientation is uncertain.

▼

9. Auxiliary Heads

AuxPairStack + OrdinalDistanceHead + RgPredictor

AuxPairStack — lightweight pair refinement
OrdinalDistanceHead — ordinal distance bins
RgPredictor — radius of gyration prediction

▼

Cα Losses 9 terms

Inherited from v13b

Each loss enforces a different geometric property:

Loss	Weight	Clamp
FAPE	1.0	max 10.0
Frame Rotation	0.5	max 2.0
Distance MSE	1.0	max 10.0
Bond Geometry	3.0	max 10.0 (annealed 1→3)
Chirality	0.1	—
Angle	0.5	—
Clash	0.1	max 25.0
Aux Distance	0.03	—
Rg Loss	0.5	—

▼

BB Losses 4 terms

Scaled by bb_ramp schedule

Loss	Weight	Clamp
BB FAPE	1.0	max 10.0
BB Bond	2.0	max 10.0
BB Angle	0.5	max 5.0
Omega	0.5	max 5.0

bb_ramp schedule:

bb_ramp = min(epoch / 5, 1.0)

▼

Confidence Losses v15 NEW 2 terms

pLDDT cross-entropy + pAE cross-entropy (w=0.2 each)

Two new loss terms for the confidence prediction heads. These are not included in the structural metric.

Loss	Weight	Details
pLDDT CE	0.2	Cross-entropy on 50-bin LDDT distribution
pAE CE	0.2	Cross-entropy on 64-bin aligned error distribution

Ground truth LDDT is computed from predicted vs. true CA-CA distances, then binned into 50 uniform bins over [0, 1]. Ground truth aligned error is computed per-pair by aligning on frame i and measuring distance of residue j, then binned into 64 bins over [0, 32]Å.

Total losses in v15: 9 (CA) + 4 (BB) + 2 (confidence) = 15 terms

▼

11. Tracked Metric: val_structural

total − aux×0.03 − bb_contribution − confidence_losses

▼

12. Training Configuration

AdamW, per-module LR, CFG 10%, EMA 0.999 — init from v14 run3 best

Initialization: loaded from v14 run3 best weights v15 NEW

All existing modules (encoder, pair stack, IPA denoiser, backbone atom head, auxiliary heads) are initialized from the best v14 run3 checkpoint. The two new confidence heads (pLDDT, pAE) are randomly initialized.

Per-module Learning Rates (AdamW):

Module	LR	Note
Denoiser (IPA)	2e-5	base
Pair Stack	6e-5	3×
BackboneAtomHead	10e-5	5×
pLDDT Head	10e-5	5× v15 NEW
pAE Head	10e-5	5× v15 NEW

LR Schedule:

CFG: 10% dropout (classifier-free guidance)

EMA: 0.999 exponential moving average