← Back to home
RBX1 Binder Design
De novo protein binder design for RBX1 (RING Box Protein 1, 108 AA) — GEM Workshop / ICLR 2026
Last updated: 2026-03-23 (+6,516 designs from boltz2_wcm wave) — 20,724 unique designs
20,724
Total Designs
20,724
Scored
0.966
Best ipTM
0.892
Best pLDDT
0.877
Best ipSAE
Top 10 Designs

Ranked by ipSAE (real values from Boltz-2 ipSAE ranking) across all campaigns. Higher ipSAE = better predicted interface contacts.

Rank Name Campaign Length ipTM ipSAE pLDDT pTM

Top 10 Designs Overview

Top 10 designs bar chart

ipTM vs ipSAE

ipTM vs ipSAE scatter plot
Target: RBX1
Sequence (108 AA)
MAAAMDVDTPSGTNSGAGKKRFEVKKWNAVALWAWDIVVDNCAICRNHIMDLCIECQANQASATSEECTVAWGVCNHAFHFHCISRWLKTRQVCPLDNREWEFQKYGH
Key Properties
PDB: 2LGV (NMR), 4P5O (complex)
Zinc sites: 3 Zn2+ (11 Cys, 3 His)
E2 interface: 35 res, mean conservation 0.839
Cullin interface: 31 res, mean conservation 0.938
Competition: GEM / ICLR 2026, deadline Apr 26
Database Growth
Design database growth over 24 hours
Scored designs accumulated over 24 hours of continuous generation, scoring, and validation. Wave 1 (676 designs) completed in ~2 hours. Wave 2 gen10k campaign added ~9,000 designs over 6 hours. Wave 3 contributed another ~3,500. Growth plateaued at 14,229 as generation jobs finished.
All Designed Structures
Rank Design Campaign Length Boltz ipTM ipSAE Boltz pLDDT Boltz pTM AF3 ipTM AF3 pTM OF3 ipTM OF3 pTM Pb ipSAE Pb ipTM Pipeline Download
All Structure Cards
Campaign Summary
Campaign Pipeline Hotspots Size
RFdiff E2 smallRFdiff+MPNNA44,A45,A46,A51,A54,A56,A57,A79,A83,A84,A87,A95,A9640-65 AA
RFdiff E2 medRFdiff+MPNNA44,A45,A46,A51,A54,A56,A57,A79,A83,A84,A87,A95,A9690-120 AA
RFdiff E2 enhancedRFdiff+MPNNAbove + A42,A53 (ESM-2 DMS additions)40-65 AA
RFdiff Beta E2RFdiff+MPNNA44,A45,A46,A51,A56,A57,A79,A83,A87,A95,A9640-65 AA
RFdiff Cullin smallRFdiff+MPNNA27,A29,A30,A31,A33,A35,A36,A73,A75,A10140-65 AA
RFdiff Cullin medRFdiff+MPNNA27,A29,A30,A31,A33,A35,A36,A73,A75,A10190-120 AA
RFdiff Beta CullinRFdiff+MPNNA27,A29,A30,A31,A33,A35,A36,A73,A75,A10140-65 AA
BoltzGen smallBoltzGenNone (untargeted)40-65 AA
BoltzGen mediumBoltzGenNone (untargeted)90-120 AA
Scoring Results by Campaign
Campaign Scored iPSAE mean iPSAE max >0.7 >0.5 Best len
RFdiff E2 small11,7920.2260.8773551,79051
RFdiff E2 med1960.2520.80183198
RFdiff Beta E21040.3130.821112455
RFdiff Cullin small1280.2400.80582460
RFdiff Cullin med1280.2780.793725111
BoltzGen (untargeted)1,8400.3780.773341862
Other/pilot400.1740.6850250
Totals
Total RFdiffusion backbones: 9,964
E2 face: 5,649 — Cullin face: 1,709 — Beta model: 110
Total BoltzGen designs: 1,820 complete + 1,400 resuming
Total scored in master: 14,228
Total with OF3 validation: 248
Total with AF3 validation: 32
Hotspot Details
E2 Face (from PDB 4P5O UBC12 interface + ESM-2 DMS):
Core: A44(I), A45(C), A46(R), A51(D), A56(C), A57(Q)
Extended: A54(I*), A79(F), A83(C), A84(I*), A87(W), A95(P), A96(L)
* I54, I84 added from ESM-2 DMS (mutation-sensitive, missed by conservation)
Cullin Face (from PDB 4P5O Cullin-1 interface):
Core: A27(W), A29(A), A30(V), A31(A), A33(W), A35(W)
Extended: A36(D), A73(G), A75(C), A101(W)
BoltzGen: No hotspot conditioning — binder targets determined by model
RFdiffusion Checkpoints
Complex_base_ckpt.pt — standard binder design (mostly helical)
Complex_beta_ckpt.pt — diverse topology binder design (mixed sheet/helix)
Top 100 Candidates — Download

Quality-filtered ranking: (1) AF3 ipTM >= 0.5 (18 designs), then (2) OF3 ipTM >= 0.5 (82 designs). Only designs that pass orthogonal validation thresholds are included. No junk.

Download FASTA (100 sequences) Download CSV (all metrics) Download Full Report (PDF)
CSV columns: rank, id, rank_source (AF3/OF3), campaign, pipeline, length, sequence, boltz_iptm, boltz_ptm, boltz_plddt, ipsae, af3_iptm, af3_ptm, af3_flag, of3_iptm, of3_ptm
FASTA headers include all metrics for quick reference.
Design Naming Convention
Design Naming Convention: - beta_e2_100-026: beta campaign, E2 face, 100-design batch, design #26 - sat_e2-0675: satellite E2 campaign, design #675 - g10k_e2_sm_j11_binder_125: gen10k E2 small, job 11, binder #125 - cul_small_10-017: cullin face, small binder, 10-design batch, #17 - e2_med_10-027: E2 face, medium binder, 10-design batch, #27 - BG Small 3: BoltzGen small pilot, design #3 Campaigns: - beta_e2: Early E2 face designs from beta RFdiffusion checkpoint - sat_e2: Satellite E2 campaigns (diverse hotspot sampling) - gen10k / g10k: 10,000-scale generation campaign - cul_small / cul_med: Cullin face, small (40-65 AA) / medium (90-120 AA) binders - e2_small / e2_med: E2 face, small / medium binders - BoltzGen: Designs from BoltzGen generative model (not RFdiffusion)
Cross-Validation: Scoring Bias

The primary competition metric (ipSAE) and Boltz-2 ipTM show no correlation with AF3 ground-truth validation. AF3 does correlate with OF3, suggesting both capture real structural features that Boltz-2 misses.

Figure 0. ipSAE (Boltz-2 competition metric) vs AF3 ipTM for 60 validated designs. Pearson r = -0.044 (p = 0.741), Spearman rho = -0.233. ipSAE has no predictive value for AF3 validation. Marginal boxplots show per-face distributions. Edge color: black = top hit, green = promote, orange = review, gray = discard. This is the central finding: the competition ranking metric does not predict real binding as assessed by AF3.

Three independent structure prediction methods -- Boltz-2 (scoring), OpenFold3, and AlphaFold3 -- show dramatically different agreement patterns. AF3 is the ground-truth orthogonal validation.

Boltz-2 vs AF3 ipTM scatter
Figure 1. Boltz-2 ipTM vs AlphaFold3 ipTM for 60 validated designs. Points colored by target face: E2 (blue, n=45), Cullin (red, n=6), BoltzGen (green, n=9). Marginal boxplots show per-face distributions. Orange line: Ridge regression (r=0.08, rho=-0.12). Dashed diagonal: perfect agreement. Boltz-2 scores show no correlation with AF3 validation (r=0.08), confirming that high Boltz scores do not predict genuine binding.
AF3 vs OF3 ipTM scatter
Figure 2. AF3 ipTM vs OF3 ipTM for 49 designs with both scores. Points colored by target face: E2 (blue, n=40), Cullin (red, n=5), BoltzGen (green, n=4). Ridge regression (r=0.57, rho=0.49) shows moderate positive correlation between the two independent validation methods, suggesting they capture overlapping but distinct structural signals.
Boltz-2 vs OF3 ipTM scatter
Figure 3. Boltz-2 ipTM vs OF3 ipTM for 1,225 designs. Points colored by target face: E2 (blue, n=845), Cullin (red, n=335), BoltzGen (green, n=45). Ridge regression (r=-0.01, rho=-0.04) shows essentially zero correlation between Boltz-2 scoring and OF3 validation at scale, consistent with the AF3 result. Boltz-2 ipTM is uninformative for predicting orthogonal validation scores.
Performance by Target Face

Designs target three RBX1 surfaces: E2 interface (12,112 designs), Cullin interface (6,791), and BoltzGen full-surface (1,820). ipSAE and structural confidence metrics vary by target.

ipSAE by target face
Figure 4. ipSAE distribution by target face with individual points (downsampled to 300 per group). BoltzGen designs achieve higher mean ipSAE (0.446) than E2 (0.227) or Cullin (0.205). Significance bars show two-sample t-test p-values between groups.
Metrics by target face
Figure 5. Four key metrics (ipSAE, Boltz ipTM, Boltz pLDDT, Boltz pTM) compared across target faces. BoltzGen shows higher ipSAE but generally comparable Boltz confidence metrics, suggesting that the ipSAE advantage is interface-specific rather than driven by global structural quality.
Performance by Binder Size

Binder lengths range from under 55 to over 100 amino acids. Size buckets reveal whether longer binders achieve better interface quality or structural confidence.

ipSAE by binder size
Figure 6. ipSAE by binder size bucket (<55, 55-65, 66-99, 100+ AA). Mean and count annotations per group. No strong size-dependent trend in interface quality.
Boltz ipTM vs binder length
Figure 7. Boltz-2 ipTM vs binder length colored by target face. Small binders (40-65 AA) dominate the E2 dataset with wide ipTM spread. Cullin and BoltzGen designs span broader length ranges.
Performance by Model Checkpoint

Designs were generated with three model variants: Base (standard Boltz-2), Beta (beta checkpoint), and BoltzGen (generative model). Comparing ipSAE across checkpoints.

ipSAE by model checkpoint
Figure 8. ipSAE by model checkpoint (Base, Beta, BoltzGen). Mean and count annotations per group. BoltzGen designs achieve higher ipSAE on average, while Base and Beta show similar distributions.
Hit Rate Analysis

Hit rate curves show the fraction of designs exceeding a given quality threshold across the full ipSAE and OF3 ipTM ranges, stratified by target face.

Hit rate by ipSAE threshold
Figure 9. Hit rate curves by target face at ipSAE thresholds. BoltzGen maintains the highest hit rate across most thresholds. E2 and Cullin face designs show similar hit rate profiles.
Hit rate by OF3 ipTM threshold
Figure 10. Hit rate curves by target face at OF3 ipTM thresholds (for the 1,225 designs with OF3 validation). Reveals which target face produces the most designs above stringent OF3 cutoffs.
Three-Method Comparison

Side-by-side comparison of Boltz-2, OpenFold3, and AlphaFold3 ipTM scores for top designs ranked by OF3 validation. Highlights consistent Boltz-2 score inflation across methods.

Three-method comparison bars
Figure 11. Top 20 designs by OF3 ipTM: Boltz-2 (blue, faded), OpenFold3 (red), and AlphaFold3 (green) scores side-by-side. Faint green bars indicate designs without AF3 data. Boltz-2 consistently overestimates compared to both orthogonal validation methods, while OF3 and AF3 show better mutual agreement.
Hotspot Set Analysis

Each RFdiffusion campaign was conditioned on a specific set of RBX1 hotspot residues. Four distinct hotspot sets were used:

E2 Enhanced (+DMS): A44, A45, A46, A51, A54, A56, A57, A79, A83, A84, A87, A95, A96, A42, A53 (11,336 designs)
E2 Standard: A44, A45, A46, A51, A56, A57, A79, A83, A87, A95, A96 (400 designs)
E2 Core: A44, A45, A46, A51, A54, A56, A57, A79, A83, A84, A87, A95, A96 (256 designs)
Cullin: A27, A29, A30, A31, A33, A35, A36, A73, A75, A101 (276 designs)
BoltzGen: No hotspot conditioning (1,240 designs)

ipSAE vs ipTM by hotspot set

ipSAE vs ipTM scatter with marginal distributions
Joint distribution of ipSAE and ipTM colored by hotspot set. Marginal histograms show per-set density. Stars mark the best design in each set. The E2 Enhanced set (with ESM-2 DMS-derived residues I54, I84, C42, C53) produces the highest ipSAE values, suggesting that adding mutation-sensitive residues to the hotspot specification improves interface quality.

ipSAE vs binder length by hotspot set

ipSAE vs binder length with marginals
Binder length vs ipSAE by hotspot set. Small binders (40-65 AA) dominate the E2 Enhanced set. BoltzGen designs span a wider length range but achieve lower ipSAE. No strong correlation between length and interface quality within any set, though medium-length binders (50-60 AA) appear slightly enriched among top performers.

Hit rate by hotspot set

Hit rate by hotspot set
Fraction of designs exceeding ipSAE thresholds for each hotspot set. The E2 Enhanced set maintains the highest hit rate at all thresholds. The Cullin set shows competitive hit rates despite targeting a different face, while BoltzGen has the highest fraction above ipSAE=0.3 but drops off sharply above 0.5.

Efficiency frontier: ipSAE vs pLDDT

Pareto front ipSAE vs pLDDT
Pareto front (dashed line) of designs optimizing both ipSAE and pLDDT. Designs on the frontier achieve the best trade-off between interface quality and structural confidence. Most frontier designs are from the E2 Enhanced set, but a few Cullin and E2 Standard designs appear at high pLDDT.

Amino acid composition by ipSAE quality tier

Amino acid composition by tier
Amino acid frequency comparison between top-tier (ipSAE > 0.7), mid-tier (0.3-0.7), and failed (ipSAE < 0.1) designs. Blue = charged (R,K,D,E), red = hydrophobic (F,L,I,M,V,W), green = polar (S,T,C,N,Q,Y). Top-tier designs show higher glutamate (E) and leucine (L) frequency, while failed designs are enriched in alanine (A) and glycine (G), suggesting that oversimplified sequences with low complexity correlate with poor interface formation.
Evolutionary Analysis of RBX1

Conservation analysis from 165-sequence MSA (MAFFT) of RBX1 homologs across eukaryotes. 834 raw homologs collected via HMMER, filtered to 272 non-redundant sequences, aligned with 1155 total rows including outgroups.

165
MSA Sequences
834
Raw Homologs
272
Filtered Seqs
108
Target Residues
3
Zn2+ Sites
66
Interface Residues
Per-Residue Conservation

Shannon entropy-based conservation score (0 = variable, 1 = perfectly conserved). Colored by functional role: E2 interface, Cullin interface, Zinc-binding, other. Hover for details.

Entropy Distribution

Histogram of Shannon entropy values across all 108 residues. Lower entropy = higher conservation.

Interface Conservation Comparison

Mean conservation by functional region.

Gap Fraction by Region

Mean alignment gap fraction. Lower = better coverage in the MSA.

Highly Conserved Residues (Conservation ≥ 0.93)
Residue # Amino Acid Conservation Entropy Gap Fraction Functional Role
Zinc Coordination Sites
Zn2+ Site 1 (ZN_109)
Ligands: CYS42, CYS45, HIS80, CYS83
Avg Conservation: 0.943
Role: Stabilizes RING-H2 domain fold, part of E2 recruitment surface
Zn2+ Site 2 (ZN_110)
Ligands: CYS75, HIS77, CYS94
Avg Conservation: 0.956
Role: Structural zinc, anchors the beta-sheet core between the two interfaces
Zn2+ Site 3 (ZN_111)
Ligands: CYS53, CYS56, CYS68, HIS82
Avg Conservation: 0.922
Role: Bridges the E2-binding loop to the core, critical for domain integrity
Target Interface Definitions
E2 Recruitment Face PRIMARY (60%)
35 residues at UBC12/E2 interface (PDB 4P5O, 8Å cutoff)
Residues: 41,42,43,44,45,46,47,48,51,52,53,54,55,56,57,58,59,79,81,82,83,84,85,86,87,88,90,91,93,94,95,96,97,98,99
Mean conservation: 0.839
Hotspots (RFdiffusion): B44-B48, B52, B55-B57, B84-B88, B95-B98
Key residues: R46 (1.0), C94 (0.988), I44 (0.948), C56 (0.966), P95 (0.947)
Cullin Interface SECONDARY (40%)
31 residues at Cullin C-terminal domain interface
Residues: 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,72,73,74,75,76,89,90,91,92,93,99,100,101,102,103,104
Mean conservation: 0.938
Hotspots (RFdiffusion): B24-B34, B72-B75, B100-B104
Key residues: W33 (1.0), W35 (1.0), G73 (1.0), W101 (0.988), V30 (0.974)
Interpretation

RBX1 is deeply conserved across eukaryotes. The RING-H2 domain (residues ~27–104) shows uniformly high conservation (>0.7 for nearly all positions), with zinc-coordinating cysteines and histidines reaching near-perfect conservation (0.91–1.0). The N-terminal tail (residues 1–20) is more variable, consistent with it being unstructured in the NMR ensemble (2LGV).

The Cullin face is significantly more conserved than the E2 face (mean 0.938 vs 0.839). This makes sense: the Cullin interaction is constitutive (RBX1 is always bound to a Cullin scaffold in vivo), whereas the E2 interface cycles through multiple E2 partners. The higher conservation at the Cullin face means binders targeting it are more likely to disrupt a functionally critical interaction, but the surface may also be harder to compete with due to the tight, conserved binding.

Three perfectly conserved residues stand out: W33, W35 (both Cullin face), G73 (Cullin face), and R46 (E2 face). These are absolutely invariant across all 165 sequences in the MSA. W33 and W35 form a tryptophan pair that likely stacks against the Cullin surface—a classic hot-spot motif. R46 is the catalytic arginine critical for E2 activation.

Gap fraction is low (<2%) for the core domain (residues 36–105), meaning the MSA is well-aligned in the structured region. The N/C-terminal tails show higher gaps (10–27%), reflecting length variation among homologs. This gives confidence that the conservation scores in the core are reliable.

Design implications: The 60/40 E2/Cullin split is well-justified. The Cullin face offers a tighter, more conserved target with multiple tryptophan hot-spots, favoring high-affinity designs. The E2 face is more diverse, potentially offering more epitope options but requiring designs that can outcompete the native E2 partners. The zinc sites should be included in the target structure but not directly targeted—they are buried and structurally critical, not surface-accessible.

RBX1 Sequence — Conservation Colored

Each residue colored by conservation score. Hover for details.

ESM-2 Deep Mutational Scanning

Masked marginal log-likelihood ratios (dLLR) computed with ESM-2 (650M params) for all single-point mutations across the 108-residue RBX1 sequence. More negative dLLR = more deleterious mutation. Sensitivity = mean |dLLR| across all 20 amino acids at each position.

2,160
Mutations Scored
0.581
Spearman r (vs Evo)
0.714
Pearson r (vs Evo)
-18.25
Most Deleterious dLLR
+2.54
Most Beneficial dLLR
31
Predicted Contacts
Mutation Effect Heatmap

Log-likelihood ratio for each of 20 amino acids at each position. Blue = tolerated/beneficial, red = deleterious. Wild-type residue marked with black dot. Hover for values.

Per-Position Mutation Sensitivity

Mean |dLLR| across all 20 amino acids. Higher = less tolerant of mutations. Colored by functional role.

ESM Sensitivity vs Conservation

Spearman r = 0.581, Pearson r = 0.714. Strong agreement between model-predicted and evolutionary constraint.

Predicted Contact Map

ESM-2 attention-derived contact predictions (probability > 0.5). 31 high-confidence contacts.

Top 20 Most Deleterious Single Mutations
Rank Position WT → Mut dLLR Functional Role WT Conservation
Top 30 Mutation-Sensitive Residues
Residue AA ESM Sensitivity Evo Conservation WT Log-Prob Functional Role
Interpretation

ESM-2 and evolution strongly agree on which residues are critical. The Spearman correlation of 0.581 (p < 10-10) and Pearson of 0.714 (p < 10-17) between ESM-2 sensitivity and Shannon entropy conservation confirm that the protein language model has learned genuine structural and functional constraints from sequence alone.

Zinc-coordinating cysteines dominate the sensitivity landscape. C42, C56, C83, C75, C53, C68, and C94 are all among the top 15 most sensitive positions. Any mutation at these sites is catastrophic (dLLR < -10), consistent with their role as structural zinc ligands that maintain the RING-H2 fold. The most deleterious single mutation in the entire protein is I54W (dLLR = -18.25), a massive tryptophan insertion into the hydrophobic core adjacent to C53 and C56.

The Cullin face has the single most sensitive residue (D36, sensitivity = 13.0), which also has perfect evolutionary conservation (0.945). This aspartate likely forms critical salt bridges in the Cullin interaction. For binder design, targeting residues around D36 could be highly effective at disrupting the complex.

ESM-2 identifies some positions as sensitive that conservation misses. I54 (ESM sensitivity rank 5, conservation only 0.786) and I84 (rank 11, conservation 0.802) are moderately conserved but ESM predicts they are among the most intolerant of mutations — likely because they play critical roles in hydrophobic packing that the MSA alone doesn't fully capture.

Design implications: Binders should maximize contacts with high-sensitivity residues (especially D36, C42, R46, F79, W87) since these positions cannot easily mutate to escape binding. The DMS data also suggests that the N-terminal tail (residues 1–20) has low mutation sensitivity, confirming it is a poor target for binder design.

AF3 Validation Insights

Comprehensive analysis of what predicts AlphaFold3 validation success. 60 designs validated with AF3, analyzed using Lasso regression on sequence and structural features. AF3 is our ground-truth orthogonal validation — Boltz-2 scores are heavily inflated for E2-face RFdiffusion designs.

60
AF3 Validated
7
Top Hits (>=0.7)
5
Promote (0.5-0.7)
4
Review (0.4-0.5)
35
Discard (<0.4)
Lasso Regression: What Predicts AF3 Success?

LassoCV regression on 20 sequence and design features to predict AF3 ipTM. Features with non-zero coefficients are the strongest predictors after regularization. Positive = helps AF3 validation, negative = hurts.

Lasso coefficients
Feature correlations with AF3 ipTM. The strongest predictor of AF3 success is being a BoltzGen design (untargeted, diverse topology), followed by sequence features like aromatic content and lower alanine fraction. High Boltz-2 ipTM is actually a weak or negative predictor — designs with inflated Boltz scores tend to fail AF3 validation.
Top Feature Correlations
Feature importance scatter
Top 8 features correlated with AF3 ipTM. Each subplot shows one feature vs AF3 score with Ridge regression line and Pearson r. Points colored by pipeline (blue=RFdiff E2, red=RFdiff Cullin, green=BoltzGen).
What Distinguishes Success from Failure?
Success vs failure comparison
Comparison of successful designs (AF3 ipTM >= 0.5) vs failed designs (AF3 ipTM < 0.3) across key features. Successful designs tend to have higher sequence entropy (more diverse amino acid usage), more aromatic residues, and lower alanine content.
Amino Acid Composition
AA composition
Amino acid frequency comparison between AF3-validated (ipTM >= 0.5) and AF3-failed (ipTM < 0.2) designs. Successful designs are enriched in structurally important residues (F, W, Y aromatics; charged residues) and depleted in simple residues (A, G).
BoltzGen vs RFdiffusion in AF3
Pipeline comparison
BoltzGen designs show dramatically better AF3 validation than RFdiffusion+MPNN designs. BoltzGen mean AF3 ipTM is 2-3x higher, despite lower Boltz-2 scores. This suggests RFdiffusion+MPNN designs may be optimizing for Boltz-2 scoring artifacts rather than genuine binding.
Key Findings

1. Aspartate (D) content is the strongest positive predictor of AF3 success (r=+0.35), while glutamate (E) is the strongest negative predictor (r=-0.33). BoltzGen designs tend to validate better in AF3, despite having lower Boltz-2 scores. This is the single strongest predictor of AF3 success.

2. Sequence diversity matters: designs with higher Shannon entropy (more diverse AA usage) and more unique amino acids tend to validate better. Low-complexity sequences rich in alanine and glycine consistently fail.

3. Aromatic residues (F, W, Y) are enriched in successful designs — these contribute to specific hydrophobic contacts at the interface that AF3 can validate.

4. High Boltz-2 ipTM is NOT predictive of AF3 success — in fact, it may be slightly anti-correlated. Designs with Boltz ipTM > 0.95 mostly fail AF3 validation (delta > 0.7).

5. Cullin-face designs show better AF3 agreement than E2-face designs, though the sample size is small. The one Cullin design tested (design_0357) has AF3 ipTM = 0.78 with low delta.

6. For future campaigns: prioritize BoltzGen-style generation, increase aromatic content in ProteinMPNN sampling, and reduce alanine/glycine bias. Consider Cullin-face targeting which appears more AF3-compatible.

Post-Submission: Deep Analysis with Feature Engineering and Regression

After submission, the ADAPTYV competition independently re-ran Boltz-2 on all 83 submitted designs and scored them with 12 structural metrics. We performed comprehensive feature engineering (64 features from sequence composition, structural propensities, Boltz-2 confidence maps, and our scoring metrics) and regression analysis to understand what predicts competition performance.

83
Designs Submitted
33
Survived (ipSAE ≥ 0.6)
31
Failed (ipSAE < 0.3)
30
Our ipSAE < 0.5 Submitted
1. Submission Overview

Of 83 submitted designs, 33 survived with their ipSAE ≥ 0.6 and 31 failed below 0.3. The submission included 30 designs with our ipSAE < 0.5 -- some performed surprisingly well in competition scoring. The correlation between our ipSAE and theirs is only r = 0.312, showing that internal scoring was a poor predictor of competition performance. Seven designs we scored below 0.5 actually survived (≥ 0.6) in the competition, while 9 designs we scored above 0.7 failed (< 0.3).

Submission overview scatter
Figure 1. Our ipSAE vs competition ipSAE. Point size = binder length, color = target face. Annotated designs are surprising reversals: low ours but high theirs (upper left) and high ours but low theirs (lower right). The green/red horizontal lines mark survival (0.6) and failure (0.3) thresholds.
2. Feature Engineering (64 Features)

We extracted 64 features per design spanning four categories: (a) amino acid composition (20 individual AA fractions plus grouped fractions for charged, hydrophobic, polar, aromatic residues), (b) sequence properties (Shannon entropy, linguistic complexity, net charge, charge density, hydrophobicity, isoelectric point, molecular weight, max homopolymer repeat), (c) structural propensities (helix, sheet, disorder) and Boltz-2 PAE/pLDDT features from our prediction NPZ files (mean interface PAE, confident contacts, PAE asymmetry, binder pLDDT), and (d) design metadata (wave, face, binder length) plus our scoring metrics (ipSAE, ipTM, pTM, pLDDT, AF3 ipTM, OF3 ipTM). Structural features were extracted for 80 of 81 matched designs.

Full correlation heatmap
Figure 2. Full correlation matrix of our features (top-left block) vs their 12 competition metrics (bottom-right block). Black lines separate the two groups. Their metrics cluster tightly (ipSAE, ipTM, LIS all r > 0.9). Our scoring metrics show only weak correlation with theirs (r ~ 0.3). Sequence charge features and structural PAE metrics show moderate associations with competition outcomes.
3. Regression Analysis

We trained three models to predict competition ipSAE from our 64 features. All models show negative cross-validated R2, indicating that with only 81 samples and 64 features, none generalizes reliably. Ridge regression (CV R2 = -2.22) heavily overfits. ElasticNet CV selects 22 non-zero features (CV R2 = -0.69, l1_ratio = 0.10). Random Forest is least negative (CV R2 = -0.33) and achieves train R2 = 0.78, suggesting nonlinear patterns exist but cannot be captured with this sample size. The negative CV R2 values confirm that no model beats predicting the mean -- individual competition scores are inherently unpredictable from our available features.

Regression coefficients
Figure 3. Top 15 features by absolute coefficient magnitude in Ridge (left) and ElasticNet (right) regression. Green = positive coefficient (higher value predicts higher competition ipSAE), red = negative. Contacts_pae_lt5 (number of confident interface contacts) is a strong positive predictor. Molecular weight is negative, suggesting shorter binders perform better.
RF feature importance
Figure 4. Random Forest feature importance (MDI) for predicting competition ipSAE. Net charge is the top predictor, followed by serine content (aa_S), isoelectric point, and charge density. Our ipSAE ranks 7th. Structural features (mean_binder_pae, struct_binder_plddt) also contribute.
4. Survivors vs Failures

We compared 33 survivors (their ipSAE ≥ 0.6) against 31 failures (< 0.3) using Welch's t-test on all 64 features to find the most discriminating ones. The top 6 features are: net charge (p = 3.5e-5), isoelectric point (p = 3.4e-4), our ipTM (p = 0.003), our pTM (p = 0.003), charge density (p = 0.003), and our ipSAE (p = 0.004). All are statistically significant, but effect sizes are modest.

Survivors vs Failures
Figure 5. Box + swarm plots of the 6 most discriminating features between survivors (green, their ipSAE ≥ 0.6, n=33) and failures (red, < 0.3, n=31). Net charge is the strongest discriminator: survivors tend to have more negative net charge (mean -8.4 vs -4.0). Our ipTM and ipSAE are significantly higher in survivors but overlap substantially.
5. Cross-Validator Analysis: OpenFold3

We tested whether our OpenFold3 ipTM scores predict competition ipSAE across the 81 designs with OF3 data. OF3 serves as an orthogonal validation method independent of Boltz-2.

OF3 vs competition ipSAE
Figure 6. Our OpenFold3 ipTM vs competition ipSAE. Color = target face. The dashed line shows the linear trend. OF3 provides some signal for competition performance but the relationship is noisy and face-dependent.
5b. Cross-Validator Analysis: AlphaFold3

We tested whether our AlphaFold3 ipTM scores predict competition ipSAE across the designs with both AF3 and Pb data. AF3 serves as another orthogonal validation method independent of Boltz-2.

AF3 vs competition ipSAE
Figure 6b. Our AlphaFold3 ipTM vs competition ipSAE with boxplot marginals. Color = target face. Ridge regression line (orange) with Pearson r and Spearman rho shown.
6. Surprising Reversals

The most surprising finding is the extent of score reversals between our predictions and the competition's.

Design Our ipSAE Their ipSAE Surprise
wcm_design_07090.420.79We undervalued (+0.37)
wbg_design_14580.410.74We undervalued (+0.33)
w2_design_30610.190.73We undervalued (+0.54)
w2_design_77130.790.02We overvalued (-0.77)
w2_design_79210.790.09We overvalued (-0.70)
wbg_design_18170.770.13We overvalued (-0.64)
w3_design_23560.750.05We overvalued (-0.70)
7. Key Findings

1. Competition ipSAE is not predictable from our features. All regression models yield negative cross-validated R2 (Ridge: -2.22, ElasticNet: -0.69, RF: -0.33), meaning they do worse than predicting the mean. With 81 samples and 64 features, overfitting is severe.

2. Net charge is the strongest single discriminator (p = 3.5e-5). Survivors have significantly more negative net charge than failures. This suggests the competition scoring penalizes positively charged / electrostatically sticky surfaces.

3. Our ipSAE and ipTM are statistically significant discriminators (p < 0.005) but with weak effect sizes (r ~ 0.3). High internal scores are weakly associated with competition survival but provide no guarantee.

4. Structural features from our Boltz-2 predictions (interface PAE, confident contacts) appear in the top Ridge coefficients, suggesting that PAE-derived interface quality has some predictive power even when overall scores do not correlate.

5. Score reversals are dramatic: w2_design_3061 scored 0.19 in our scoring but 0.73 in theirs (+0.54 reversal), while w2_design_7713 scored 0.79 in ours but only 0.02 in theirs (-0.77 reversal). This confirms Boltz-2 scores are not reproducible across runs.

6. The submission included 30 designs with our ipSAE < 0.5, yet 7 of them survived in competition scoring. This validates the diversity-over-optimization strategy: submitting a broad range of designs captures winners that would be missed by strict internal score cutoffs.

UBC12 Natural Binder Optimization

Trimming, alanine scanning, and mutational optimization of UBC12 (PDB 4P5O chain I) — the natural E2 binding partner of RBX1. All scoring via OpenFold3. Auto-updated: 2026-04-09 09:05

25/25
Trimming Done
27/27
Ala Scan Done
366/369
Mutants Scored
0.860
WT ipTM
0.869
Best Mutant
48
Beneficial (>WT)
Phase 1: Progressive Trimming (Complete)

Best fragment: res 25-135 (111 AA) with OF3 ipTM = 0.860 (vs full-length 0.745). Trimming the disordered N-terminus improves binding by +0.115.

Show trimming results table
RkVariantRangeLenIface%ipTMpTMpLDDT
1both_25_13525-135111100%0.86040.747081.8
2ntrim_2525-183159100%0.82840.792584.4
3both_29_13529-135107100%0.81860.723979.9
4both_25_13025-130106100%0.80470.711178.3
5ntrim_2020-183164100%0.79450.779982.8
6ntrim_1515-183169100%0.78510.773282.9
7ntrim_2929-183155100%0.77690.771682.6
8ntrim_1010-183174100%0.76830.759381.9
9both_29_13029-130102100%0.75750.675477.0
10full_length2-183182100%0.74540.740181.3
11ctrim_1702-170169100%0.74510.727480.5
12ctrim_1602-160159100%0.73690.712378.5
13ctrim_1402-140139100%0.73680.697379.3
14ctrim_1352-135134100%0.72340.679677.5
15ctrim_1502-150149100%0.70800.679177.0
16both_29_12729-12799100%0.67500.633073.1
17ctrim_1302-130129100%0.65270.629673.8
18core_35_12035-1208655%0.58340.595868.3
19ctrim_1272-127126100%0.56420.573969.1
20minimal_patch2378-1355859%0.55770.535569.1
21minimal_29_10029-1007262%0.39880.493570.3
22core_29_12029-1209276%0.24180.401359.6
23core_35_12735-1279379%0.21090.396157.7
24minimal_patch125-502641%0.18810.451259.8
25minimal_80_12780-1274859%0.05170.377655.8
Phase 2: Alanine Scanning

All 27 interface positions scanned. The interface is remarkably robust — no single Ala mutation drops ipTM by more than 0.02. Most tolerant: K120, W119, E117. Most sensitive: R33, D37.

Show alanine scan table
MutationipTMΔipTMpTMClass
ala_R33A0.8404-0.01990.7403neutral
ala_D37A0.8419-0.01840.7397neutral
ala_I38A0.8471-0.01320.7340neutral
ala_L32A0.8479-0.01230.7386neutral
ala_P87A0.8513-0.00900.7445neutral
ala_I34A0.8519-0.00830.7420neutral
ala_S127A0.8524-0.00790.7436neutral
ala_I125A0.8525-0.00780.7420neutral
ala_H88A0.8526-0.00760.7429neutral
ala_L123A0.8533-0.00700.7433neutral
ala_Y86A0.8540-0.00620.7440neutral
ala_Q31A0.8547-0.00550.7447neutral
ala_Q84A0.8547-0.00550.7478neutral
ala_N126A0.8552-0.00500.7472neutral
ala_K36A0.8553-0.00490.7472neutral
ala_G85A0.8559-0.00440.7480neutral
ala_Q35A0.8562-0.00410.7490neutral
ala_E40A0.8562-0.00410.7470neutral
ala_N39A0.8562-0.00400.7444neutral
ala_P121A0.8569-0.00340.7432neutral
ala_T124A0.8569-0.00340.7430neutral
ala_D118A0.8569-0.00330.7450neutral
ala_V122A0.8576-0.00270.7475neutral
ala_D89A0.8586-0.00170.7454neutral
ala_E117A0.8594-0.00080.7478neutral
wt_res25-1350.8603+0.00000.7470neutral
ala_W119A0.8606+0.00030.7454neutral
ala_K120A0.8610+0.00070.7507neutral
Phase 3: Substitution & Double Mutant Optimization

369 variants: 114 full-scan at 6 tolerant positions + 90 targeted at 6 moderate positions + 165 double mutants. 366 scored so far, 48 show improved ipTM over WT (0.8603).

Top 50 Mutants
RkMutationTypeipTMΔipTMpTMpLDDT3D
1dbl_K120R_W119YDouble0.8695+0.00920.751081.4
2sub_N126WSingle0.8679+0.00760.751681.9
3dbl_W119H_D89NDouble0.8676+0.00730.749981.8
4dbl_W119H_E117KDouble0.8674+0.00720.750081.9
5sub_E40KSingle0.8660+0.00570.751081.4
6sub_D118GSingle0.8647+0.00440.749781.7
7dbl_K120Q_D89NDouble0.8647+0.00440.750581.3
8dbl_K120Q_W119HDouble0.8646+0.00430.749181.9
9sub_W119HSingle0.8643+0.00400.747581.5
10dbl_E117K_D89KDouble0.8634+0.00310.748680.9
11dbl_E117Q_D89KDouble0.8630+0.00280.748781.1
12sub_E40RSingle0.8630+0.00270.751581.5
13sub_Q35RSingle0.8628+0.00250.745981.4
14dbl_K120R_E117DDouble0.8626+0.00230.747681.0
15sub_V122DSingle0.8626+0.00230.749081.0
16dbl_W119H_E117QDouble0.8625+0.00220.747881.6
17sub_T124VSingle0.8624+0.00210.748281.4
18sub_D89QSingle0.8623+0.00200.750381.3
19sub_D89FSingle0.8623+0.00200.748481.8
20sub_N126DSingle0.8621+0.00190.744781.2
21dbl_W119H_D89QDouble0.8620+0.00170.747681.5
22sub_Q35SSingle0.8620+0.00170.748081.5
23sub_D118WSingle0.8618+0.00150.748081.6
24dbl_W119Y_D89QDouble0.8617+0.00150.746281.2
25dbl_K120R_D89KDouble0.8615+0.00120.748081.4
26sub_E40GSingle0.8615+0.00120.748081.6
27sub_D89LSingle0.8614+0.00110.747781.7
28dbl_W119Y_E117NDouble0.8613+0.00110.743881.6
29dbl_W119H_E117SDouble0.8612+0.00090.746681.6
30sub_Q35YSingle0.8612+0.00090.745681.8
31sub_E40WSingle0.8611+0.00080.747281.7
32sub_K120ASingle0.8610+0.00070.750981.7
33dbl_K120Q_E117SDouble0.8610+0.00070.749581.5
34sub_N126QSingle0.8610+0.00070.751181.5
35dbl_K120N_W119FDouble0.8609+0.00060.748081.7
36dbl_K120R_W119HDouble0.8609+0.00060.743881.3
37dbl_K120D_D89SDouble0.8609+0.00060.750882.0
38dbl_K120N_D89EDouble0.8608+0.00050.746381.5
39dbl_K120H_D89KDouble0.8607+0.00050.745581.4
40dbl_K120Q_D89KDouble0.8607+0.00040.746781.3
41sub_T124KSingle0.8607+0.00040.746481.8
42dbl_K120R_D89QDouble0.8606+0.00040.749881.5
43sub_T124ESingle0.8606+0.00030.746080.9
44sub_D89ISingle0.8606+0.00030.748481.5
45dbl_W119H_D89EDouble0.8606+0.00030.744781.3
46dbl_E117D_D89NDouble0.8605+0.00020.746581.1
47sub_K120MSingle0.8604+0.00010.746081.4
48sub_W119ASingle0.8604+0.00010.745181.1
49dbl_W119Y_E117DDouble0.8601-0.00020.744481.2
50dbl_W119Y_D89KDouble0.8600-0.00020.747081.6
Top 10 Mutant Structures
dbl_K120R_W119Y
#1
Double Mutant ΔipTM: +0.0092
0.8695
OF3 ipTM
0.7510
OF3 pTM
81.4
pLDDT
0.9120
Ranking
Download CIF
sub_N126W
#2
Single Mutant ΔipTM: +0.0076
0.8679
OF3 ipTM
0.7516
OF3 pTM
81.9
pLDDT
0.9245
Ranking
Download CIF
dbl_W119H_D89N
#3
Double Mutant ΔipTM: +0.0073
0.8676
OF3 ipTM
0.7499
OF3 pTM
81.8
pLDDT
0.9285
Ranking
Download CIF
dbl_W119H_E117K
#4
Double Mutant ΔipTM: +0.0072
0.8674
OF3 ipTM
0.7500
OF3 pTM
81.9
pLDDT
0.9216
Ranking
Download CIF
sub_E40K
#5
Single Mutant ΔipTM: +0.0057
0.8660
OF3 ipTM
0.7510
OF3 pTM
81.4
pLDDT
0.9137
Ranking
Download CIF
sub_D118G
#6
Single Mutant ΔipTM: +0.0044
0.8647
OF3 ipTM
0.7497
OF3 pTM
81.7
pLDDT
0.9193
Ranking
Download CIF
dbl_K120Q_D89N
#7
Double Mutant ΔipTM: +0.0044
0.8647
OF3 ipTM
0.7505
OF3 pTM
81.3
pLDDT
0.9149
Ranking
Download CIF
dbl_K120Q_W119H
#8
Double Mutant ΔipTM: +0.0043
0.8646
OF3 ipTM
0.7491
OF3 pTM
81.9
pLDDT
0.9237
Ranking
Download CIF
sub_W119H
#9
Single Mutant ΔipTM: +0.0040
0.8643
OF3 ipTM
0.7475
OF3 pTM
81.5
pLDDT
0.9208
Ranking
Download CIF
dbl_E117K_D89K
#10
Double Mutant ΔipTM: +0.0031
0.8634
OF3 ipTM
0.7486
OF3 pTM
80.9
pLDDT
0.9135
Ranking
Download CIF
Auto-updated: 2026-04-09 09:05