RBX1 Protein Design Dashboard

← Back to home

RBX1 Binder Design

De novo protein binder design for RBX1 (RING Box Protein 1, 108 AA) — GEM Workshop / ICLR 2026

Last updated: 2026-03-23 (+6,516 designs from boltz2_wcm wave) — 20,724 unique designs

20,724

Total Designs

20,724

Scored

0.966

Best ipTM

0.892

Best pLDDT

0.877

Best ipSAE

Top 10 Designs

Ranked by ipSAE (real values from Boltz-2 ipSAE ranking) across all campaigns. Higher ipSAE = better predicted interface contacts.

Rank	Name	Campaign	Length	ipTM	ipSAE	pLDDT	pTM

Top 10 Designs Overview

ipTM vs ipSAE

Target: RBX1

Sequence (108 AA)

            MAAAMDVDTPSGTNSGAGKKRFEVKKWNAVALWAWDIVVDNCAICRNHIMDLCIECQANQASATSEECTVAWGVCNHAFHFHCISRWLKTRQVCPLDNREWEFQKYGH
          

Key Properties

PDB: 2LGV (NMR), 4P5O (complex)

Zinc sites: 3 Zn²⁺ (11 Cys, 3 His)

E2 interface: 35 res, mean conservation 0.839

Cullin interface: 31 res, mean conservation 0.938

Competition: GEM / ICLR 2026, deadline Apr 26

Database Growth

Scored designs accumulated over 24 hours of continuous generation, scoring, and validation. Wave 1 (676 designs) completed in ~2 hours. Wave 2 gen10k campaign added ~9,000 designs over 6 hours. Wave 3 contributed another ~3,500. Growth plateaued at 14,229 as generation jobs finished.

All Designed Structures

Sort by: Campaign:

Rank	Design	Campaign	Length	Boltz ipTM	ipSAE	Boltz pLDDT	Boltz pTM	AF3 ipTM	AF3 pTM	OF3 ipTM	OF3 pTM	Pb ipSAE	Pb ipTM	Pipeline	Download

All Structure Cards

Sort cards by:

Campaign Summary

Campaign	Pipeline	Hotspots	Size
RFdiff E2 small	RFdiff+MPNN	A44,A45,A46,A51,A54,A56,A57,A79,A83,A84,A87,A95,A96	40-65 AA
RFdiff E2 med	RFdiff+MPNN	A44,A45,A46,A51,A54,A56,A57,A79,A83,A84,A87,A95,A96	90-120 AA
RFdiff E2 enhanced	RFdiff+MPNN	Above + A42,A53 (ESM-2 DMS additions)	40-65 AA
RFdiff Beta E2	RFdiff+MPNN	A44,A45,A46,A51,A56,A57,A79,A83,A87,A95,A96	40-65 AA
RFdiff Cullin small	RFdiff+MPNN	A27,A29,A30,A31,A33,A35,A36,A73,A75,A101	40-65 AA
RFdiff Cullin med	RFdiff+MPNN	A27,A29,A30,A31,A33,A35,A36,A73,A75,A101	90-120 AA
RFdiff Beta Cullin	RFdiff+MPNN	A27,A29,A30,A31,A33,A35,A36,A73,A75,A101	40-65 AA
BoltzGen small	BoltzGen	None (untargeted)	40-65 AA
BoltzGen medium	BoltzGen	None (untargeted)	90-120 AA

Scoring Results by Campaign

Campaign	Scored	iPSAE mean	iPSAE max	>0.7	>0.5	Best len
RFdiff E2 small	11,792	0.226	0.877	355	1,790	51
RFdiff E2 med	196	0.252	0.801	8	31	98
RFdiff Beta E2	104	0.313	0.821	11	24	55
RFdiff Cullin small	128	0.240	0.805	8	24	60
RFdiff Cullin med	128	0.278	0.793	7	25	111
BoltzGen (untargeted)	1,840	0.378	0.773	3	418	62
Other/pilot	40	0.174	0.685	0	2	50

Totals

Total RFdiffusion backbones: 9,964

E2 face: 5,649 — Cullin face: 1,709 — Beta model: 110

Total BoltzGen designs: 1,820 complete + 1,400 resuming

Total scored in master: 14,228

Total with OF3 validation: 248

Total with AF3 validation: 32

Hotspot Details

E2 Face (from PDB 4P5O UBC12 interface + ESM-2 DMS):
Core: A44(I), A45(C), A46(R), A51(D), A56(C), A57(Q)
Extended: A54(I*), A79(F), A83(C), A84(I*), A87(W), A95(P), A96(L)
* I54, I84 added from ESM-2 DMS (mutation-sensitive, missed by conservation)

Cullin Face (from PDB 4P5O Cullin-1 interface):
Core: A27(W), A29(A), A30(V), A31(A), A33(W), A35(W)
Extended: A36(D), A73(G), A75(C), A101(W)

BoltzGen: No hotspot conditioning — binder targets determined by model

RFdiffusion Checkpoints

Complex_base_ckpt.pt — standard binder design (mostly helical)

Complex_beta_ckpt.pt — diverse topology binder design (mixed sheet/helix)

Top 100 Candidates — Download

Quality-filtered ranking: (1) AF3 ipTM >= 0.5 (18 designs), then (2) OF3 ipTM >= 0.5 (82 designs). Only designs that pass orthogonal validation thresholds are included. No junk.

Download FASTA (100 sequences) Download CSV (all metrics) Download Full Report (PDF)

CSV columns: rank, id, rank_source (AF3/OF3), campaign, pipeline, length, sequence, boltz_iptm, boltz_ptm, boltz_plddt, ipsae, af3_iptm, af3_ptm, af3_flag, of3_iptm, of3_ptm
FASTA headers include all metrics for quick reference.

Design Naming Convention

Design Naming Convention: - beta_e2_100-026: beta campaign, E2 face, 100-design batch, design #26 - sat_e2-0675: satellite E2 campaign, design #675 - g10k_e2_sm_j11_binder_125: gen10k E2 small, job 11, binder #125 - cul_small_10-017: cullin face, small binder, 10-design batch, #17 - e2_med_10-027: E2 face, medium binder, 10-design batch, #27 - BG Small 3: BoltzGen small pilot, design #3 Campaigns: - beta_e2: Early E2 face designs from beta RFdiffusion checkpoint - sat_e2: Satellite E2 campaigns (diverse hotspot sampling) - gen10k / g10k: 10,000-scale generation campaign - cul_small / cul_med: Cullin face, small (40-65 AA) / medium (90-120 AA) binders - e2_small / e2_med: E2 face, small / medium binders - BoltzGen: Designs from BoltzGen generative model (not RFdiffusion)

Cross-Validation: Scoring Bias

The primary competition metric (ipSAE) and Boltz-2 ipTM show no correlation with AF3 ground-truth validation. AF3 does correlate with OF3, suggesting both capture real structural features that Boltz-2 misses.

Figure 0. ipSAE (Boltz-2 competition metric) vs AF3 ipTM for 60 validated designs. Pearson r = -0.044 (p = 0.741), Spearman rho = -0.233. ipSAE has no predictive value for AF3 validation. Marginal boxplots show per-face distributions. Edge color: black = top hit, green = promote, orange = review, gray = discard. This is the central finding: the competition ranking metric does not predict real binding as assessed by AF3.

Three independent structure prediction methods -- Boltz-2 (scoring), OpenFold3, and AlphaFold3 -- show dramatically different agreement patterns. AF3 is the ground-truth orthogonal validation.

Figure 1. Boltz-2 ipTM vs AlphaFold3 ipTM for 60 validated designs. Points colored by target face: E2 (blue, n=45), Cullin (red, n=6), BoltzGen (green, n=9). Marginal boxplots show per-face distributions. Orange line: Ridge regression (r=0.08, rho=-0.12). Dashed diagonal: perfect agreement. Boltz-2 scores show no correlation with AF3 validation (r=0.08), confirming that high Boltz scores do not predict genuine binding.

Figure 2. AF3 ipTM vs OF3 ipTM for 49 designs with both scores. Points colored by target face: E2 (blue, n=40), Cullin (red, n=5), BoltzGen (green, n=4). Ridge regression (r=0.57, rho=0.49) shows moderate positive correlation between the two independent validation methods, suggesting they capture overlapping but distinct structural signals.

Figure 3. Boltz-2 ipTM vs OF3 ipTM for 1,225 designs. Points colored by target face: E2 (blue, n=845), Cullin (red, n=335), BoltzGen (green, n=45). Ridge regression (r=-0.01, rho=-0.04) shows essentially zero correlation between Boltz-2 scoring and OF3 validation at scale, consistent with the AF3 result. Boltz-2 ipTM is uninformative for predicting orthogonal validation scores.

Performance by Target Face

Designs target three RBX1 surfaces: E2 interface (12,112 designs), Cullin interface (6,791), and BoltzGen full-surface (1,820). ipSAE and structural confidence metrics vary by target.

Figure 4. ipSAE distribution by target face with individual points (downsampled to 300 per group). BoltzGen designs achieve higher mean ipSAE (0.446) than E2 (0.227) or Cullin (0.205). Significance bars show two-sample t-test p-values between groups.

Figure 5. Four key metrics (ipSAE, Boltz ipTM, Boltz pLDDT, Boltz pTM) compared across target faces. BoltzGen shows higher ipSAE but generally comparable Boltz confidence metrics, suggesting that the ipSAE advantage is interface-specific rather than driven by global structural quality.

Performance by Binder Size

Binder lengths range from under 55 to over 100 amino acids. Size buckets reveal whether longer binders achieve better interface quality or structural confidence.

Figure 6. ipSAE by binder size bucket (<55, 55-65, 66-99, 100+ AA). Mean and count annotations per group. No strong size-dependent trend in interface quality.

Figure 7. Boltz-2 ipTM vs binder length colored by target face. Small binders (40-65 AA) dominate the E2 dataset with wide ipTM spread. Cullin and BoltzGen designs span broader length ranges.

Performance by Model Checkpoint

Designs were generated with three model variants: Base (standard Boltz-2), Beta (beta checkpoint), and BoltzGen (generative model). Comparing ipSAE across checkpoints.

Figure 8. ipSAE by model checkpoint (Base, Beta, BoltzGen). Mean and count annotations per group. BoltzGen designs achieve higher ipSAE on average, while Base and Beta show similar distributions.

Hit Rate Analysis

Hit rate curves show the fraction of designs exceeding a given quality threshold across the full ipSAE and OF3 ipTM ranges, stratified by target face.

Figure 9. Hit rate curves by target face at ipSAE thresholds. BoltzGen maintains the highest hit rate across most thresholds. E2 and Cullin face designs show similar hit rate profiles.

Figure 10. Hit rate curves by target face at OF3 ipTM thresholds (for the 1,225 designs with OF3 validation). Reveals which target face produces the most designs above stringent OF3 cutoffs.

Three-Method Comparison

Side-by-side comparison of Boltz-2, OpenFold3, and AlphaFold3 ipTM scores for top designs ranked by OF3 validation. Highlights consistent Boltz-2 score inflation across methods.

Figure 11. Top 20 designs by OF3 ipTM: Boltz-2 (blue, faded), OpenFold3 (red), and AlphaFold3 (green) scores side-by-side. Faint green bars indicate designs without AF3 data. Boltz-2 consistently overestimates compared to both orthogonal validation methods, while OF3 and AF3 show better mutual agreement.

Hotspot Set Analysis

Each RFdiffusion campaign was conditioned on a specific set of RBX1 hotspot residues. Four distinct hotspot sets were used:

        E2 Enhanced (+DMS): A44, A45, A46, A51, A54, A56, A57, A79, A83, A84, A87, A95, A96, A42, A53 (11,336 designs)

        E2 Standard:        A44, A45, A46, A51, A56, A57, A79, A83, A87, A95, A96 (400 designs)

        E2 Core:            A44, A45, A46, A51, A54, A56, A57, A79, A83, A84, A87, A95, A96 (256 designs)

        Cullin:             A27, A29, A30, A31, A33, A35, A36, A73, A75, A101 (276 designs)

        BoltzGen:           No hotspot conditioning (1,240 designs)

ipSAE vs ipTM by hotspot set

ipSAE vs ipTM scatter with marginal distributions

Joint distribution of ipSAE and ipTM colored by hotspot set. Marginal histograms show per-set density. Stars mark the best design in each set. The E2 Enhanced set (with ESM-2 DMS-derived residues I54, I84, C42, C53) produces the highest ipSAE values, suggesting that adding mutation-sensitive residues to the hotspot specification improves interface quality.

ipSAE vs binder length by hotspot set

Binder length vs ipSAE by hotspot set. Small binders (40-65 AA) dominate the E2 Enhanced set. BoltzGen designs span a wider length range but achieve lower ipSAE. No strong correlation between length and interface quality within any set, though medium-length binders (50-60 AA) appear slightly enriched among top performers.

Hit rate by hotspot set

Fraction of designs exceeding ipSAE thresholds for each hotspot set. The E2 Enhanced set maintains the highest hit rate at all thresholds. The Cullin set shows competitive hit rates despite targeting a different face, while BoltzGen has the highest fraction above ipSAE=0.3 but drops off sharply above 0.5.

Efficiency frontier: ipSAE vs pLDDT

Pareto front (dashed line) of designs optimizing both ipSAE and pLDDT. Designs on the frontier achieve the best trade-off between interface quality and structural confidence. Most frontier designs are from the E2 Enhanced set, but a few Cullin and E2 Standard designs appear at high pLDDT.

Amino acid composition by ipSAE quality tier

Amino acid frequency comparison between top-tier (ipSAE > 0.7), mid-tier (0.3-0.7), and failed (ipSAE < 0.1) designs. Blue = charged (R,K,D,E), red = hydrophobic (F,L,I,M,V,W), green = polar (S,T,C,N,Q,Y). Top-tier designs show higher glutamate (E) and leucine (L) frequency, while failed designs are enriched in alanine (A) and glycine (G), suggesting that oversimplified sequences with low complexity correlate with poor interface formation.

Evolutionary Analysis of RBX1

Conservation analysis from 165-sequence MSA (MAFFT) of RBX1 homologs across eukaryotes. 834 raw homologs collected via HMMER, filtered to 272 non-redundant sequences, aligned with 1155 total rows including outgroups.

165

MSA Sequences

834

Raw Homologs

272

Filtered Seqs

108

Target Residues

Zn²⁺ Sites

Interface Residues

Per-Residue Conservation

Shannon entropy-based conservation score (0 = variable, 1 = perfectly conserved). Colored by functional role: E2 interface, Cullin interface, Zinc-binding, other. Hover for details.

Entropy Distribution

Histogram of Shannon entropy values across all 108 residues. Lower entropy = higher conservation.

Interface Conservation Comparison

Mean conservation by functional region.

Gap Fraction by Region

Mean alignment gap fraction. Lower = better coverage in the MSA.

Highly Conserved Residues (Conservation ≥ 0.93)

Residue #	Amino Acid	Conservation	Entropy	Gap Fraction	Functional Role

Zinc Coordination Sites

Zn²⁺ Site 1 (ZN_109)

Ligands: CYS42, CYS45, HIS80, CYS83
Avg Conservation: 0.943
Role: Stabilizes RING-H2 domain fold, part of E2 recruitment surface

Zn²⁺ Site 2 (ZN_110)

Ligands: CYS75, HIS77, CYS94
Avg Conservation: 0.956
Role: Structural zinc, anchors the beta-sheet core between the two interfaces

Zn²⁺ Site 3 (ZN_111)

Ligands: CYS53, CYS56, CYS68, HIS82
Avg Conservation: 0.922
Role: Bridges the E2-binding loop to the core, critical for domain integrity

Target Interface Definitions

E2 Recruitment Face PRIMARY (60%)

35 residues at UBC12/E2 interface (PDB 4P5O, 8Å cutoff)
Residues: 41,42,43,44,45,46,47,48,51,52,53,54,55,56,57,58,59,79,81,82,83,84,85,86,87,88,90,91,93,94,95,96,97,98,99
Mean conservation: 0.839
Hotspots (RFdiffusion): B44-B48, B52, B55-B57, B84-B88, B95-B98
Key residues: R46 (1.0), C94 (0.988), I44 (0.948), C56 (0.966), P95 (0.947)

Cullin Interface SECONDARY (40%)

31 residues at Cullin C-terminal domain interface
Residues: 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,72,73,74,75,76,89,90,91,92,93,99,100,101,102,103,104
Mean conservation: 0.938
Hotspots (RFdiffusion): B24-B34, B72-B75, B100-B104
Key residues: W33 (1.0), W35 (1.0), G73 (1.0), W101 (0.988), V30 (0.974)

Interpretation

RBX1 is deeply conserved across eukaryotes. The RING-H2 domain (residues ~27–104) shows uniformly high conservation (>0.7 for nearly all positions), with zinc-coordinating cysteines and histidines reaching near-perfect conservation (0.91–1.0). The N-terminal tail (residues 1–20) is more variable, consistent with it being unstructured in the NMR ensemble (2LGV).

The Cullin face is significantly more conserved than the E2 face (mean 0.938 vs 0.839). This makes sense: the Cullin interaction is constitutive (RBX1 is always bound to a Cullin scaffold in vivo), whereas the E2 interface cycles through multiple E2 partners. The higher conservation at the Cullin face means binders targeting it are more likely to disrupt a functionally critical interaction, but the surface may also be harder to compete with due to the tight, conserved binding.

Three perfectly conserved residues stand out: W33, W35 (both Cullin face), G73 (Cullin face), and R46 (E2 face). These are absolutely invariant across all 165 sequences in the MSA. W33 and W35 form a tryptophan pair that likely stacks against the Cullin surface—a classic hot-spot motif. R46 is the catalytic arginine critical for E2 activation.

Gap fraction is low (<2%) for the core domain (residues 36–105), meaning the MSA is well-aligned in the structured region. The N/C-terminal tails show higher gaps (10–27%), reflecting length variation among homologs. This gives confidence that the conservation scores in the core are reliable.

Design implications: The 60/40 E2/Cullin split is well-justified. The Cullin face offers a tighter, more conserved target with multiple tryptophan hot-spots, favoring high-affinity designs. The E2 face is more diverse, potentially offering more epitope options but requiring designs that can outcompete the native E2 partners. The zinc sites should be included in the target structure but not directly targeted—they are buried and structurally critical, not surface-accessible.

RBX1 Sequence — Conservation Colored

Each residue colored by conservation score. Hover for details.

ESM-2 Deep Mutational Scanning

Masked marginal log-likelihood ratios (dLLR) computed with ESM-2 (650M params) for all single-point mutations across the 108-residue RBX1 sequence. More negative dLLR = more deleterious mutation. Sensitivity = mean |dLLR| across all 20 amino acids at each position.

2,160

Mutations Scored

0.581

Spearman r (vs Evo)

0.714

Pearson r (vs Evo)

-18.25

Most Deleterious dLLR

+2.54

Most Beneficial dLLR

Predicted Contacts

Mutation Effect Heatmap

Log-likelihood ratio for each of 20 amino acids at each position. Blue = tolerated/beneficial, red = deleterious. Wild-type residue marked with black dot. Hover for values.

Per-Position Mutation Sensitivity

Mean |dLLR| across all 20 amino acids. Higher = less tolerant of mutations. Colored by functional role.

ESM Sensitivity vs Conservation

Spearman r = 0.581, Pearson r = 0.714. Strong agreement between model-predicted and evolutionary constraint.

Predicted Contact Map

ESM-2 attention-derived contact predictions (probability > 0.5). 31 high-confidence contacts.

Top 20 Most Deleterious Single Mutations

Rank	Position	WT → Mut	dLLR	Functional Role	WT Conservation

Top 30 Mutation-Sensitive Residues

Residue	AA	ESM Sensitivity	Evo Conservation	WT Log-Prob	Functional Role

Interpretation

ESM-2 and evolution strongly agree on which residues are critical. The Spearman correlation of 0.581 (p < 10^-10) and Pearson of 0.714 (p < 10^-17) between ESM-2 sensitivity and Shannon entropy conservation confirm that the protein language model has learned genuine structural and functional constraints from sequence alone.

Zinc-coordinating cysteines dominate the sensitivity landscape. C42, C56, C83, C75, C53, C68, and C94 are all among the top 15 most sensitive positions. Any mutation at these sites is catastrophic (dLLR < -10), consistent with their role as structural zinc ligands that maintain the RING-H2 fold. The most deleterious single mutation in the entire protein is I54W (dLLR = -18.25), a massive tryptophan insertion into the hydrophobic core adjacent to C53 and C56.

The Cullin face has the single most sensitive residue (D36, sensitivity = 13.0), which also has perfect evolutionary conservation (0.945). This aspartate likely forms critical salt bridges in the Cullin interaction. For binder design, targeting residues around D36 could be highly effective at disrupting the complex.

ESM-2 identifies some positions as sensitive that conservation misses. I54 (ESM sensitivity rank 5, conservation only 0.786) and I84 (rank 11, conservation 0.802) are moderately conserved but ESM predicts they are among the most intolerant of mutations — likely because they play critical roles in hydrophobic packing that the MSA alone doesn't fully capture.

Design implications: Binders should maximize contacts with high-sensitivity residues (especially D36, C42, R46, F79, W87) since these positions cannot easily mutate to escape binding. The DMS data also suggests that the N-terminal tail (residues 1–20) has low mutation sensitivity, confirming it is a poor target for binder design.

AF3 Validation Insights

Comprehensive analysis of what predicts AlphaFold3 validation success. 60 designs validated with AF3, analyzed using Lasso regression on sequence and structural features. AF3 is our ground-truth orthogonal validation — Boltz-2 scores are heavily inflated for E2-face RFdiffusion designs.

AF3 Validated

Top Hits (>=0.7)

Promote (0.5-0.7)

Review (0.4-0.5)

Discard (<0.4)

Lasso Regression: What Predicts AF3 Success?

LassoCV regression on 20 sequence and design features to predict AF3 ipTM. Features with non-zero coefficients are the strongest predictors after regularization. Positive = helps AF3 validation, negative = hurts.

Feature correlations with AF3 ipTM. The strongest predictor of AF3 success is being a BoltzGen design (untargeted, diverse topology), followed by sequence features like aromatic content and lower alanine fraction. High Boltz-2 ipTM is actually a weak or negative predictor — designs with inflated Boltz scores tend to fail AF3 validation.

Top Feature Correlations

Top 8 features correlated with AF3 ipTM. Each subplot shows one feature vs AF3 score with Ridge regression line and Pearson r. Points colored by pipeline (blue=RFdiff E2, red=RFdiff Cullin, green=BoltzGen).

What Distinguishes Success from Failure?

Comparison of successful designs (AF3 ipTM >= 0.5) vs failed designs (AF3 ipTM < 0.3) across key features. Successful designs tend to have higher sequence entropy (more diverse amino acid usage), more aromatic residues, and lower alanine content.

Amino Acid Composition

Amino acid frequency comparison between AF3-validated (ipTM >= 0.5) and AF3-failed (ipTM < 0.2) designs. Successful designs are enriched in structurally important residues (F, W, Y aromatics; charged residues) and depleted in simple residues (A, G).

BoltzGen vs RFdiffusion in AF3

BoltzGen designs show dramatically better AF3 validation than RFdiffusion+MPNN designs. BoltzGen mean AF3 ipTM is 2-3x higher, despite lower Boltz-2 scores. This suggests RFdiffusion+MPNN designs may be optimizing for Boltz-2 scoring artifacts rather than genuine binding.

Key Findings

1. Aspartate (D) content is the strongest positive predictor of AF3 success (r=+0.35), while glutamate (E) is the strongest negative predictor (r=-0.33). BoltzGen designs tend to validate better in AF3, despite having lower Boltz-2 scores. This is the single strongest predictor of AF3 success.

2. Sequence diversity matters: designs with higher Shannon entropy (more diverse AA usage) and more unique amino acids tend to validate better. Low-complexity sequences rich in alanine and glycine consistently fail.

3. Aromatic residues (F, W, Y) are enriched in successful designs — these contribute to specific hydrophobic contacts at the interface that AF3 can validate.

4. High Boltz-2 ipTM is NOT predictive of AF3 success — in fact, it may be slightly anti-correlated. Designs with Boltz ipTM > 0.95 mostly fail AF3 validation (delta > 0.7).

5. Cullin-face designs show better AF3 agreement than E2-face designs, though the sample size is small. The one Cullin design tested (design_0357) has AF3 ipTM = 0.78 with low delta.

6. For future campaigns: prioritize BoltzGen-style generation, increase aromatic content in ProteinMPNN sampling, and reduce alanine/glycine bias. Consider Cullin-face targeting which appears more AF3-compatible.

Post-Submission: Deep Analysis with Feature Engineering and Regression

After submission, the ADAPTYV competition independently re-ran Boltz-2 on all 83 submitted designs and scored them with 12 structural metrics. We performed comprehensive feature engineering (64 features from sequence composition, structural propensities, Boltz-2 confidence maps, and our scoring metrics) and regression analysis to understand what predicts competition performance.

Designs Submitted

Survived (ipSAE ≥ 0.6)

Failed (ipSAE < 0.3)

Our ipSAE < 0.5 Submitted

1. Submission Overview

Of 83 submitted designs, 33 survived with their ipSAE ≥ 0.6 and 31 failed below 0.3. The submission included 30 designs with our ipSAE < 0.5 -- some performed surprisingly well in competition scoring. The correlation between our ipSAE and theirs is only r = 0.312, showing that internal scoring was a poor predictor of competition performance. Seven designs we scored below 0.5 actually survived (≥ 0.6) in the competition, while 9 designs we scored above 0.7 failed (< 0.3).

Figure 1. Our ipSAE vs competition ipSAE. Point size = binder length, color = target face. Annotated designs are surprising reversals: low ours but high theirs (upper left) and high ours but low theirs (lower right). The green/red horizontal lines mark survival (0.6) and failure (0.3) thresholds.

2. Feature Engineering (64 Features)

We extracted 64 features per design spanning four categories: (a) amino acid composition (20 individual AA fractions plus grouped fractions for charged, hydrophobic, polar, aromatic residues), (b) sequence properties (Shannon entropy, linguistic complexity, net charge, charge density, hydrophobicity, isoelectric point, molecular weight, max homopolymer repeat), (c) structural propensities (helix, sheet, disorder) and Boltz-2 PAE/pLDDT features from our prediction NPZ files (mean interface PAE, confident contacts, PAE asymmetry, binder pLDDT), and (d) design metadata (wave, face, binder length) plus our scoring metrics (ipSAE, ipTM, pTM, pLDDT, AF3 ipTM, OF3 ipTM). Structural features were extracted for 80 of 81 matched designs.

Figure 2. Full correlation matrix of our features (top-left block) vs their 12 competition metrics (bottom-right block). Black lines separate the two groups. Their metrics cluster tightly (ipSAE, ipTM, LIS all r > 0.9). Our scoring metrics show only weak correlation with theirs (r ~ 0.3). Sequence charge features and structural PAE metrics show moderate associations with competition outcomes.

3. Regression Analysis

We trained three models to predict competition ipSAE from our 64 features. All models show negative cross-validated R2, indicating that with only 81 samples and 64 features, none generalizes reliably. Ridge regression (CV R2 = -2.22) heavily overfits. ElasticNet CV selects 22 non-zero features (CV R2 = -0.69, l1_ratio = 0.10). Random Forest is least negative (CV R2 = -0.33) and achieves train R2 = 0.78, suggesting nonlinear patterns exist but cannot be captured with this sample size. The negative CV R2 values confirm that no model beats predicting the mean -- individual competition scores are inherently unpredictable from our available features.

Figure 3. Top 15 features by absolute coefficient magnitude in Ridge (left) and ElasticNet (right) regression. Green = positive coefficient (higher value predicts higher competition ipSAE), red = negative. Contacts_pae_lt5 (number of confident interface contacts) is a strong positive predictor. Molecular weight is negative, suggesting shorter binders perform better.

Figure 4. Random Forest feature importance (MDI) for predicting competition ipSAE. Net charge is the top predictor, followed by serine content (aa_S), isoelectric point, and charge density. Our ipSAE ranks 7th. Structural features (mean_binder_pae, struct_binder_plddt) also contribute.

4. Survivors vs Failures

We compared 33 survivors (their ipSAE ≥ 0.6) against 31 failures (< 0.3) using Welch's t-test on all 64 features to find the most discriminating ones. The top 6 features are: net charge (p = 3.5e-5), isoelectric point (p = 3.4e-4), our ipTM (p = 0.003), our pTM (p = 0.003), charge density (p = 0.003), and our ipSAE (p = 0.004). All are statistically significant, but effect sizes are modest.

Figure 5. Box + swarm plots of the 6 most discriminating features between survivors (green, their ipSAE ≥ 0.6, n=33) and failures (red, < 0.3, n=31). Net charge is the strongest discriminator: survivors tend to have more negative net charge (mean -8.4 vs -4.0). Our ipTM and ipSAE are significantly higher in survivors but overlap substantially.

5. Cross-Validator Analysis: OpenFold3

We tested whether our OpenFold3 ipTM scores predict competition ipSAE across the 81 designs with OF3 data. OF3 serves as an orthogonal validation method independent of Boltz-2.

Figure 6. Our OpenFold3 ipTM vs competition ipSAE. Color = target face. The dashed line shows the linear trend. OF3 provides some signal for competition performance but the relationship is noisy and face-dependent.

5b. Cross-Validator Analysis: AlphaFold3

We tested whether our AlphaFold3 ipTM scores predict competition ipSAE across the designs with both AF3 and Pb data. AF3 serves as another orthogonal validation method independent of Boltz-2.

Figure 6b. Our AlphaFold3 ipTM vs competition ipSAE with boxplot marginals. Color = target face. Ridge regression line (orange) with Pearson r and Spearman rho shown.

6. Surprising Reversals

The most surprising finding is the extent of score reversals between our predictions and the competition's.

Design	Our ipSAE	Their ipSAE	Surprise
wcm_design_0709	0.42	0.79	We undervalued (+0.37)
wbg_design_1458	0.41	0.74	We undervalued (+0.33)
w2_design_3061	0.19	0.73	We undervalued (+0.54)
w2_design_7713	0.79	0.02	We overvalued (-0.77)
w2_design_7921	0.79	0.09	We overvalued (-0.70)
wbg_design_1817	0.77	0.13	We overvalued (-0.64)
w3_design_2356	0.75	0.05	We overvalued (-0.70)

7. Key Findings

1. Competition ipSAE is not predictable from our features. All regression models yield negative cross-validated R2 (Ridge: -2.22, ElasticNet: -0.69, RF: -0.33), meaning they do worse than predicting the mean. With 81 samples and 64 features, overfitting is severe.

2. Net charge is the strongest single discriminator (p = 3.5e-5). Survivors have significantly more negative net charge than failures. This suggests the competition scoring penalizes positively charged / electrostatically sticky surfaces.

3. Our ipSAE and ipTM are statistically significant discriminators (p < 0.005) but with weak effect sizes (r ~ 0.3). High internal scores are weakly associated with competition survival but provide no guarantee.

4. Structural features from our Boltz-2 predictions (interface PAE, confident contacts) appear in the top Ridge coefficients, suggesting that PAE-derived interface quality has some predictive power even when overall scores do not correlate.

5. Score reversals are dramatic: w2_design_3061 scored 0.19 in our scoring but 0.73 in theirs (+0.54 reversal), while w2_design_7713 scored 0.79 in ours but only 0.02 in theirs (-0.77 reversal). This confirms Boltz-2 scores are not reproducible across runs.

6. The submission included 30 designs with our ipSAE < 0.5, yet 7 of them survived in competition scoring. This validates the diversity-over-optimization strategy: submitting a broad range of designs captures winners that would be missed by strict internal score cutoffs.

UBC12 Natural Binder Optimization

Trimming, alanine scanning, and mutational optimization of UBC12 (PDB 4P5O chain I) — the natural E2 binding partner of RBX1. All scoring via OpenFold3. Auto-updated: 2026-04-09 09:05

25/25

Trimming Done

27/27

Ala Scan Done

366/369

Mutants Scored

0.860

WT ipTM

0.869

Best Mutant

Beneficial (>WT)

Phase 1: Progressive Trimming (Complete)

Best fragment: res 25-135 (111 AA) with OF3 ipTM = 0.860 (vs full-length 0.745). Trimming the disordered N-terminus improves binding by +0.115.

Show trimming results table

Rk	Variant	Range	Len	Iface%	ipTM	pTM	pLDDT
1	both_25_135	25-135	111	100%	0.8604	0.7470	81.8
2	ntrim_25	25-183	159	100%	0.8284	0.7925	84.4
3	both_29_135	29-135	107	100%	0.8186	0.7239	79.9
4	both_25_130	25-130	106	100%	0.8047	0.7111	78.3
5	ntrim_20	20-183	164	100%	0.7945	0.7799	82.8
6	ntrim_15	15-183	169	100%	0.7851	0.7732	82.9
7	ntrim_29	29-183	155	100%	0.7769	0.7716	82.6
8	ntrim_10	10-183	174	100%	0.7683	0.7593	81.9
9	both_29_130	29-130	102	100%	0.7575	0.6754	77.0
10	full_length	2-183	182	100%	0.7454	0.7401	81.3
11	ctrim_170	2-170	169	100%	0.7451	0.7274	80.5
12	ctrim_160	2-160	159	100%	0.7369	0.7123	78.5
13	ctrim_140	2-140	139	100%	0.7368	0.6973	79.3
14	ctrim_135	2-135	134	100%	0.7234	0.6796	77.5
15	ctrim_150	2-150	149	100%	0.7080	0.6791	77.0
16	both_29_127	29-127	99	100%	0.6750	0.6330	73.1
17	ctrim_130	2-130	129	100%	0.6527	0.6296	73.8
18	core_35_120	35-120	86	55%	0.5834	0.5958	68.3
19	ctrim_127	2-127	126	100%	0.5642	0.5739	69.1
20	minimal_patch23	78-135	58	59%	0.5577	0.5355	69.1
21	minimal_29_100	29-100	72	62%	0.3988	0.4935	70.3
22	core_29_120	29-120	92	76%	0.2418	0.4013	59.6
23	core_35_127	35-127	93	79%	0.2109	0.3961	57.7
24	minimal_patch1	25-50	26	41%	0.1881	0.4512	59.8
25	minimal_80_127	80-127	48	59%	0.0517	0.3776	55.8

Phase 2: Alanine Scanning

All 27 interface positions scanned. The interface is remarkably robust — no single Ala mutation drops ipTM by more than 0.02. Most tolerant: K120, W119, E117. Most sensitive: R33, D37.

Show alanine scan table

Mutation	ipTM	ΔipTM	pTM	Class
ala_R33A	0.8404	-0.0199	0.7403	neutral
ala_D37A	0.8419	-0.0184	0.7397	neutral
ala_I38A	0.8471	-0.0132	0.7340	neutral
ala_L32A	0.8479	-0.0123	0.7386	neutral
ala_P87A	0.8513	-0.0090	0.7445	neutral
ala_I34A	0.8519	-0.0083	0.7420	neutral
ala_S127A	0.8524	-0.0079	0.7436	neutral
ala_I125A	0.8525	-0.0078	0.7420	neutral
ala_H88A	0.8526	-0.0076	0.7429	neutral
ala_L123A	0.8533	-0.0070	0.7433	neutral
ala_Y86A	0.8540	-0.0062	0.7440	neutral
ala_Q31A	0.8547	-0.0055	0.7447	neutral
ala_Q84A	0.8547	-0.0055	0.7478	neutral
ala_N126A	0.8552	-0.0050	0.7472	neutral
ala_K36A	0.8553	-0.0049	0.7472	neutral
ala_G85A	0.8559	-0.0044	0.7480	neutral
ala_Q35A	0.8562	-0.0041	0.7490	neutral
ala_E40A	0.8562	-0.0041	0.7470	neutral
ala_N39A	0.8562	-0.0040	0.7444	neutral
ala_P121A	0.8569	-0.0034	0.7432	neutral
ala_T124A	0.8569	-0.0034	0.7430	neutral
ala_D118A	0.8569	-0.0033	0.7450	neutral
ala_V122A	0.8576	-0.0027	0.7475	neutral
ala_D89A	0.8586	-0.0017	0.7454	neutral
ala_E117A	0.8594	-0.0008	0.7478	neutral
wt_res25-135	0.8603	+0.0000	0.7470	neutral
ala_W119A	0.8606	+0.0003	0.7454	neutral
ala_K120A	0.8610	+0.0007	0.7507	neutral

Phase 3: Substitution & Double Mutant Optimization

369 variants: 114 full-scan at 6 tolerant positions + 90 targeted at 6 moderate positions + 165 double mutants. 366 scored so far, 48 show improved ipTM over WT (0.8603).

Top 50 Mutants

Rk	Mutation	Type	ipTM	ΔipTM	pTM	pLDDT
1	dbl_K120R_W119Y	Double	0.8695	+0.0092	0.7510	81.4
2	sub_N126W	Single	0.8679	+0.0076	0.7516	81.9
3	dbl_W119H_D89N	Double	0.8676	+0.0073	0.7499	81.8
4	dbl_W119H_E117K	Double	0.8674	+0.0072	0.7500	81.9
5	sub_E40K	Single	0.8660	+0.0057	0.7510	81.4
6	sub_D118G	Single	0.8647	+0.0044	0.7497	81.7
7	dbl_K120Q_D89N	Double	0.8647	+0.0044	0.7505	81.3
8	dbl_K120Q_W119H	Double	0.8646	+0.0043	0.7491	81.9
9	sub_W119H	Single	0.8643	+0.0040	0.7475	81.5
10	dbl_E117K_D89K	Double	0.8634	+0.0031	0.7486	80.9
11	dbl_E117Q_D89K	Double	0.8630	+0.0028	0.7487	81.1
12	sub_E40R	Single	0.8630	+0.0027	0.7515	81.5
13	sub_Q35R	Single	0.8628	+0.0025	0.7459	81.4
14	dbl_K120R_E117D	Double	0.8626	+0.0023	0.7476	81.0
15	sub_V122D	Single	0.8626	+0.0023	0.7490	81.0
16	dbl_W119H_E117Q	Double	0.8625	+0.0022	0.7478	81.6
17	sub_T124V	Single	0.8624	+0.0021	0.7482	81.4
18	sub_D89Q	Single	0.8623	+0.0020	0.7503	81.3
19	sub_D89F	Single	0.8623	+0.0020	0.7484	81.8
20	sub_N126D	Single	0.8621	+0.0019	0.7447	81.2
21	dbl_W119H_D89Q	Double	0.8620	+0.0017	0.7476	81.5
22	sub_Q35S	Single	0.8620	+0.0017	0.7480	81.5
23	sub_D118W	Single	0.8618	+0.0015	0.7480	81.6
24	dbl_W119Y_D89Q	Double	0.8617	+0.0015	0.7462	81.2
25	dbl_K120R_D89K	Double	0.8615	+0.0012	0.7480	81.4
26	sub_E40G	Single	0.8615	+0.0012	0.7480	81.6
27	sub_D89L	Single	0.8614	+0.0011	0.7477	81.7
28	dbl_W119Y_E117N	Double	0.8613	+0.0011	0.7438	81.6
29	dbl_W119H_E117S	Double	0.8612	+0.0009	0.7466	81.6
30	sub_Q35Y	Single	0.8612	+0.0009	0.7456	81.8
31	sub_E40W	Single	0.8611	+0.0008	0.7472	81.7
32	sub_K120A	Single	0.8610	+0.0007	0.7509	81.7
33	dbl_K120Q_E117S	Double	0.8610	+0.0007	0.7495	81.5
34	sub_N126Q	Single	0.8610	+0.0007	0.7511	81.5
35	dbl_K120N_W119F	Double	0.8609	+0.0006	0.7480	81.7
36	dbl_K120R_W119H	Double	0.8609	+0.0006	0.7438	81.3
37	dbl_K120D_D89S	Double	0.8609	+0.0006	0.7508	82.0
38	dbl_K120N_D89E	Double	0.8608	+0.0005	0.7463	81.5
39	dbl_K120H_D89K	Double	0.8607	+0.0005	0.7455	81.4
40	dbl_K120Q_D89K	Double	0.8607	+0.0004	0.7467	81.3
41	sub_T124K	Single	0.8607	+0.0004	0.7464	81.8
42	dbl_K120R_D89Q	Double	0.8606	+0.0004	0.7498	81.5
43	sub_T124E	Single	0.8606	+0.0003	0.7460	80.9
44	sub_D89I	Single	0.8606	+0.0003	0.7484	81.5
45	dbl_W119H_D89E	Double	0.8606	+0.0003	0.7447	81.3
46	dbl_E117D_D89N	Double	0.8605	+0.0002	0.7465	81.1
47	sub_K120M	Single	0.8604	+0.0001	0.7460	81.4
48	sub_W119A	Single	0.8604	+0.0001	0.7451	81.1
49	dbl_W119Y_E117D	Double	0.8601	-0.0002	0.7444	81.2
50	dbl_W119Y_D89K	Double	0.8600	-0.0002	0.7470	81.6

Top 10 Mutant Structures

dbl_K120R_W119Y

Double Mutant ΔipTM: +0.0092

0.8695

OF3 ipTM

0.7510

OF3 pTM

81.4

pLDDT

0.9120

Ranking

Download CIF

sub_N126W

Single Mutant ΔipTM: +0.0076

0.8679

OF3 ipTM

0.7516

OF3 pTM

81.9

pLDDT

0.9245

Ranking

Download CIF

dbl_W119H_D89N

Double Mutant ΔipTM: +0.0073

0.8676

OF3 ipTM

0.7499

OF3 pTM

81.8

pLDDT

0.9285

Ranking

Download CIF

dbl_W119H_E117K

Double Mutant ΔipTM: +0.0072

0.8674

OF3 ipTM

0.7500

OF3 pTM

81.9

pLDDT

0.9216

Ranking

Download CIF

sub_E40K

Single Mutant ΔipTM: +0.0057

0.8660

OF3 ipTM

0.7510

OF3 pTM

81.4

pLDDT

0.9137

Ranking

Download CIF

sub_D118G

Single Mutant ΔipTM: +0.0044

0.8647

OF3 ipTM

0.7497

OF3 pTM

81.7

pLDDT

0.9193

Ranking

Download CIF

dbl_K120Q_D89N

Double Mutant ΔipTM: +0.0044

0.8647

OF3 ipTM

0.7505

OF3 pTM

81.3

pLDDT

0.9149

Ranking

Download CIF

dbl_K120Q_W119H

Double Mutant ΔipTM: +0.0043

0.8646

OF3 ipTM

0.7491

OF3 pTM

81.9

pLDDT

0.9237

Ranking

Download CIF

sub_W119H

Single Mutant ΔipTM: +0.0040

0.8643

OF3 ipTM

0.7475

OF3 pTM

81.5

pLDDT

0.9208

Ranking

Download CIF

dbl_E117K_D89K

#10

Double Mutant ΔipTM: +0.0031

0.8634

OF3 ipTM

0.7486

OF3 pTM

80.9

pLDDT

0.9135

Ranking

Download CIF

Auto-updated: 2026-04-09 09:05