← Back to home
RBX1 Binder Design
De novo protein binder design for RBX1 (RING Box Protein 1, 108 AA) — GEM Workshop / ICLR 2026
Last updated: 2026-03-23 (+6,516 designs from boltz2_wcm wave) — 20,724 unique designs
20,724
Total Designs
20,724
Scored
0.966
Best ipTM
0.892
Best pLDDT
0.877
Best ipSAE
Top 10 Designs

Ranked by ipSAE (real values from Boltz-2 ipSAE ranking) across all campaigns. Higher ipSAE = better predicted interface contacts.

Rank Name Campaign Length ipTM ipSAE pLDDT pTM

Top 10 Designs Overview

Top 10 designs bar chart

ipTM vs ipSAE

ipTM vs ipSAE scatter plot
Target: RBX1
Sequence (108 AA)
MAAAMDVDTPSGTNSGAGKKRFEVKKWNAVALWAWDIVVDNCAICRNHIMDLCIECQANQASATSEECTVAWGVCNHAFHFHCISRWLKTRQVCPLDNREWEFQKYGH
Key Properties
PDB: 2LGV (NMR), 4P5O (complex)
Zinc sites: 3 Zn2+ (11 Cys, 3 His)
E2 interface: 35 res, mean conservation 0.839
Cullin interface: 31 res, mean conservation 0.938
Competition: GEM / ICLR 2026, deadline Apr 26
Database Growth
Design database growth over 24 hours
Scored designs accumulated over 24 hours of continuous generation, scoring, and validation. Wave 1 (676 designs) completed in ~2 hours. Wave 2 gen10k campaign added ~9,000 designs over 6 hours. Wave 3 contributed another ~3,500. Growth plateaued at 14,229 as generation jobs finished.
All Designed Structures
Rank Design Campaign Length Boltz ipTM ipSAE Boltz pLDDT Boltz pTM AF3 ipTM AF3 pTM OF3 ipTM OF3 pTM Pipeline Download
All Structure Cards
Campaign Summary
Campaign Pipeline Hotspots Size
RFdiff E2 smallRFdiff+MPNNA44,A45,A46,A51,A54,A56,A57,A79,A83,A84,A87,A95,A9640-65 AA
RFdiff E2 medRFdiff+MPNNA44,A45,A46,A51,A54,A56,A57,A79,A83,A84,A87,A95,A9690-120 AA
RFdiff E2 enhancedRFdiff+MPNNAbove + A42,A53 (ESM-2 DMS additions)40-65 AA
RFdiff Beta E2RFdiff+MPNNA44,A45,A46,A51,A56,A57,A79,A83,A87,A95,A9640-65 AA
RFdiff Cullin smallRFdiff+MPNNA27,A29,A30,A31,A33,A35,A36,A73,A75,A10140-65 AA
RFdiff Cullin medRFdiff+MPNNA27,A29,A30,A31,A33,A35,A36,A73,A75,A10190-120 AA
RFdiff Beta CullinRFdiff+MPNNA27,A29,A30,A31,A33,A35,A36,A73,A75,A10140-65 AA
BoltzGen smallBoltzGenNone (untargeted)40-65 AA
BoltzGen mediumBoltzGenNone (untargeted)90-120 AA
Scoring Results by Campaign
Campaign Scored iPSAE mean iPSAE max >0.7 >0.5 Best len
RFdiff E2 small11,7920.2260.8773551,79051
RFdiff E2 med1960.2520.80183198
RFdiff Beta E21040.3130.821112455
RFdiff Cullin small1280.2400.80582460
RFdiff Cullin med1280.2780.793725111
BoltzGen (untargeted)1,8400.3780.773341862
Other/pilot400.1740.6850250
Totals
Total RFdiffusion backbones: 9,964
E2 face: 5,649 — Cullin face: 1,709 — Beta model: 110
Total BoltzGen designs: 1,820 complete + 1,400 resuming
Total scored in master: 14,228
Total with OF3 validation: 248
Total with AF3 validation: 32
Hotspot Details
E2 Face (from PDB 4P5O UBC12 interface + ESM-2 DMS):
Core: A44(I), A45(C), A46(R), A51(D), A56(C), A57(Q)
Extended: A54(I*), A79(F), A83(C), A84(I*), A87(W), A95(P), A96(L)
* I54, I84 added from ESM-2 DMS (mutation-sensitive, missed by conservation)
Cullin Face (from PDB 4P5O Cullin-1 interface):
Core: A27(W), A29(A), A30(V), A31(A), A33(W), A35(W)
Extended: A36(D), A73(G), A75(C), A101(W)
BoltzGen: No hotspot conditioning — binder targets determined by model
RFdiffusion Checkpoints
Complex_base_ckpt.pt — standard binder design (mostly helical)
Complex_beta_ckpt.pt — diverse topology binder design (mixed sheet/helix)
Design Naming Convention
Design Naming Convention: - beta_e2_100-026: beta campaign, E2 face, 100-design batch, design #26 - sat_e2-0675: satellite E2 campaign, design #675 - g10k_e2_sm_j11_binder_125: gen10k E2 small, job 11, binder #125 - cul_small_10-017: cullin face, small binder, 10-design batch, #17 - e2_med_10-027: E2 face, medium binder, 10-design batch, #27 - BG Small 3: BoltzGen small pilot, design #3 Campaigns: - beta_e2: Early E2 face designs from beta RFdiffusion checkpoint - sat_e2: Satellite E2 campaigns (diverse hotspot sampling) - gen10k / g10k: 10,000-scale generation campaign - cul_small / cul_med: Cullin face, small (40-65 AA) / medium (90-120 AA) binders - e2_small / e2_med: E2 face, small / medium binders - BoltzGen: Designs from BoltzGen generative model (not RFdiffusion)
AlphaFold3 Validation

28 top designs validated with AlphaFold3. AF3 ipTM is the ground-truth orthogonal validation. Most RFdiffusion+MPNN E2 face designs show massive Boltz-2 score inflation (delta 0.5-0.86). BoltzGen designs targeting the full RBX1 surface show better AF3 agreement.

Boltz-2 vs AF3 ipTM

Boltz vs AF3 scatter
Each point is one validated design. Dashed line = perfect agreement. BoltzGen designs (green) cluster near the diagonal, while RFdiffusion+MPNN E2 designs (blue) show extreme inflation. Edge color indicates AF3 flag: black = top hit, green = promote, orange = review, gray = discard.

AF3 vs Boltz ipTM per design

AF3 ranked bar chart
Solid bars = AF3 ipTM, faded bars = Boltz ipTM. Sorted by AF3 score. Only design_0673 (AF3=0.85) and design_0357 (AF3=0.78) pass ipTM > 0.5.

Score inflation by pipeline

Delta by pipeline
Delta = Boltz ipTM - AF3 ipTM. BoltzGen: delta 0.05-0.4. RFdiff+MPNN E2: delta 0.5-0.86. Cullin RFdiff (0357): delta 0.165.
OpenFold3 Validation (211 designs)

Boltz-2 vs OpenFold3 ipTM

Boltz vs OF3 scatter
OF3 shows much better agreement with Boltz-2 than AF3 does — the cloud sits higher. Top OF3 hits (design_1459 at 0.776, design_0283 at 0.749) still show significant Boltz inflation but the correlation is stronger. E2 face designs (blue) show more inflation than Cullin (red).

AF3 vs OpenFold3 ipTM

AF3 vs OF3 scatter
For the 20 designs with both AF3 and OF3 scores, OF3 is consistently more generous than AF3. Most designs fall above the diagonal. The two methods agree on relative ranking but not absolute values — OF3 ipTM is ~2-3x higher than AF3 ipTM for the same designs.

Three-method comparison (top 20 by OF3)

Boltz vs OF3 vs AF3 bars
Side-by-side Boltz-2 (blue, faded), OpenFold3 (red), and AF3 (green) ipTM for the top 20 OF3 designs. The gap between Boltz and OF3/AF3 is consistent across designs. Faint green bars indicate designs without AF3 data.
Pipeline Performance Comparison

Meta-analysis of design pipelines across campaigns. All plots generated from Python analysis scripts.

Pipeline Comparison (Boxplots)

Pipeline comparison boxplots
Boxplots of ipTM, ipSAE, and pLDDT by sub-campaign. Scale-up and wave2 MPNN designs consistently outperform BoltzGen across all metrics.

Hit Rate Comparison

Hit rate curves
Cumulative hit rate curves showing the fraction of designs above various ipTM thresholds. Scale-up and wave2 campaigns show similar performance profiles.

ipTM vs Binder Length

ipTM vs binder length
Relationship between binder length and ipTM score. Small binders (40-65 AA) dominate the dataset and show wide ipTM variation. No clear length-performance correlation observed.
Metric Distributions

ipTM by Campaign

ipTM histogram by campaign
Distribution of ipTM scores stratified by campaign. Both E2 face campaigns (scale-up and wave2) show bimodal distributions with peaks near 0.3 and 0.8.

ipSAE by Campaign

ipSAE histogram by campaign
ipSAE distribution by campaign (real values from Boltz-2 ranking). Higher ipSAE indicates tighter predicted interface contacts.

pLDDT by Campaign

pLDDT histogram by campaign
pLDDT distribution across campaigns. Most scored designs cluster between 0.6 and 0.8 pLDDT, with the best designs reaching 0.892.

Binder Length Distribution

Binder length distribution
Binder length distribution across all designs colored by campaign. The majority are small binders (40-65 AA) from E2 face campaigns.
Hotspot Set Analysis

Each RFdiffusion campaign was conditioned on a specific set of RBX1 hotspot residues. Four distinct hotspot sets were used:

E2 Enhanced (+DMS): A44, A45, A46, A51, A54, A56, A57, A79, A83, A84, A87, A95, A96, A42, A53 (11,336 designs)
E2 Standard: A44, A45, A46, A51, A56, A57, A79, A83, A87, A95, A96 (400 designs)
E2 Core: A44, A45, A46, A51, A54, A56, A57, A79, A83, A84, A87, A95, A96 (256 designs)
Cullin: A27, A29, A30, A31, A33, A35, A36, A73, A75, A101 (276 designs)
BoltzGen: No hotspot conditioning (1,240 designs)

ipSAE vs ipTM by hotspot set

ipSAE vs ipTM scatter with marginal distributions
Joint distribution of ipSAE and ipTM colored by hotspot set. Marginal histograms show per-set density. Stars mark the best design in each set. The E2 Enhanced set (with ESM-2 DMS-derived residues I54, I84, C42, C53) produces the highest ipSAE values, suggesting that adding mutation-sensitive residues to the hotspot specification improves interface quality.

ipSAE vs binder length by hotspot set

ipSAE vs binder length with marginals
Binder length vs ipSAE by hotspot set. Small binders (40-65 AA) dominate the E2 Enhanced set. BoltzGen designs span a wider length range but achieve lower ipSAE. No strong correlation between length and interface quality within any set, though medium-length binders (50-60 AA) appear slightly enriched among top performers.

Hit rate by hotspot set

Hit rate by hotspot set
Fraction of designs exceeding ipSAE thresholds for each hotspot set. The E2 Enhanced set maintains the highest hit rate at all thresholds. The Cullin set shows competitive hit rates despite targeting a different face, while BoltzGen has the highest fraction above ipSAE=0.3 but drops off sharply above 0.5.

Efficiency frontier: ipSAE vs pLDDT

Pareto front ipSAE vs pLDDT
Pareto front (dashed line) of designs optimizing both ipSAE and pLDDT. Designs on the frontier achieve the best trade-off between interface quality and structural confidence. Most frontier designs are from the E2 Enhanced set, but a few Cullin and E2 Standard designs appear at high pLDDT.

Amino acid composition by ipSAE quality tier

Amino acid composition by tier
Amino acid frequency comparison between top-tier (ipSAE > 0.7), mid-tier (0.3-0.7), and failed (ipSAE < 0.1) designs. Blue = charged (R,K,D,E), red = hydrophobic (F,L,I,M,V,W), green = polar (S,T,C,N,Q,Y). Top-tier designs show higher glutamate (E) and leucine (L) frequency, while failed designs are enriched in alanine (A) and glycine (G), suggesting that oversimplified sequences with low complexity correlate with poor interface formation.
Evolutionary Analysis of RBX1

Conservation analysis from 165-sequence MSA (MAFFT) of RBX1 homologs across eukaryotes. 834 raw homologs collected via HMMER, filtered to 272 non-redundant sequences, aligned with 1155 total rows including outgroups.

165
MSA Sequences
834
Raw Homologs
272
Filtered Seqs
108
Target Residues
3
Zn2+ Sites
66
Interface Residues
Per-Residue Conservation

Shannon entropy-based conservation score (0 = variable, 1 = perfectly conserved). Colored by functional role: E2 interface, Cullin interface, Zinc-binding, other. Hover for details.

Entropy Distribution

Histogram of Shannon entropy values across all 108 residues. Lower entropy = higher conservation.

Interface Conservation Comparison

Mean conservation by functional region.

Gap Fraction by Region

Mean alignment gap fraction. Lower = better coverage in the MSA.

Highly Conserved Residues (Conservation ≥ 0.93)
Residue # Amino Acid Conservation Entropy Gap Fraction Functional Role
Zinc Coordination Sites
Zn2+ Site 1 (ZN_109)
Ligands: CYS42, CYS45, HIS80, CYS83
Avg Conservation: 0.943
Role: Stabilizes RING-H2 domain fold, part of E2 recruitment surface
Zn2+ Site 2 (ZN_110)
Ligands: CYS75, HIS77, CYS94
Avg Conservation: 0.956
Role: Structural zinc, anchors the beta-sheet core between the two interfaces
Zn2+ Site 3 (ZN_111)
Ligands: CYS53, CYS56, CYS68, HIS82
Avg Conservation: 0.922
Role: Bridges the E2-binding loop to the core, critical for domain integrity
Target Interface Definitions
E2 Recruitment Face PRIMARY (60%)
35 residues at UBC12/E2 interface (PDB 4P5O, 8Å cutoff)
Residues: 41,42,43,44,45,46,47,48,51,52,53,54,55,56,57,58,59,79,81,82,83,84,85,86,87,88,90,91,93,94,95,96,97,98,99
Mean conservation: 0.839
Hotspots (RFdiffusion): B44-B48, B52, B55-B57, B84-B88, B95-B98
Key residues: R46 (1.0), C94 (0.988), I44 (0.948), C56 (0.966), P95 (0.947)
Cullin Interface SECONDARY (40%)
31 residues at Cullin C-terminal domain interface
Residues: 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,72,73,74,75,76,89,90,91,92,93,99,100,101,102,103,104
Mean conservation: 0.938
Hotspots (RFdiffusion): B24-B34, B72-B75, B100-B104
Key residues: W33 (1.0), W35 (1.0), G73 (1.0), W101 (0.988), V30 (0.974)
Interpretation

RBX1 is deeply conserved across eukaryotes. The RING-H2 domain (residues ~27–104) shows uniformly high conservation (>0.7 for nearly all positions), with zinc-coordinating cysteines and histidines reaching near-perfect conservation (0.91–1.0). The N-terminal tail (residues 1–20) is more variable, consistent with it being unstructured in the NMR ensemble (2LGV).

The Cullin face is significantly more conserved than the E2 face (mean 0.938 vs 0.839). This makes sense: the Cullin interaction is constitutive (RBX1 is always bound to a Cullin scaffold in vivo), whereas the E2 interface cycles through multiple E2 partners. The higher conservation at the Cullin face means binders targeting it are more likely to disrupt a functionally critical interaction, but the surface may also be harder to compete with due to the tight, conserved binding.

Three perfectly conserved residues stand out: W33, W35 (both Cullin face), G73 (Cullin face), and R46 (E2 face). These are absolutely invariant across all 165 sequences in the MSA. W33 and W35 form a tryptophan pair that likely stacks against the Cullin surface—a classic hot-spot motif. R46 is the catalytic arginine critical for E2 activation.

Gap fraction is low (<2%) for the core domain (residues 36–105), meaning the MSA is well-aligned in the structured region. The N/C-terminal tails show higher gaps (10–27%), reflecting length variation among homologs. This gives confidence that the conservation scores in the core are reliable.

Design implications: The 60/40 E2/Cullin split is well-justified. The Cullin face offers a tighter, more conserved target with multiple tryptophan hot-spots, favoring high-affinity designs. The E2 face is more diverse, potentially offering more epitope options but requiring designs that can outcompete the native E2 partners. The zinc sites should be included in the target structure but not directly targeted—they are buried and structurally critical, not surface-accessible.

RBX1 Sequence — Conservation Colored

Each residue colored by conservation score. Hover for details.

ESM-2 Deep Mutational Scanning

Masked marginal log-likelihood ratios (dLLR) computed with ESM-2 (650M params) for all single-point mutations across the 108-residue RBX1 sequence. More negative dLLR = more deleterious mutation. Sensitivity = mean |dLLR| across all 20 amino acids at each position.

2,160
Mutations Scored
0.581
Spearman r (vs Evo)
0.714
Pearson r (vs Evo)
-18.25
Most Deleterious dLLR
+2.54
Most Beneficial dLLR
31
Predicted Contacts
Mutation Effect Heatmap

Log-likelihood ratio for each of 20 amino acids at each position. Blue = tolerated/beneficial, red = deleterious. Wild-type residue marked with black dot. Hover for values.

Per-Position Mutation Sensitivity

Mean |dLLR| across all 20 amino acids. Higher = less tolerant of mutations. Colored by functional role.

ESM Sensitivity vs Conservation

Spearman r = 0.581, Pearson r = 0.714. Strong agreement between model-predicted and evolutionary constraint.

Predicted Contact Map

ESM-2 attention-derived contact predictions (probability > 0.5). 31 high-confidence contacts.

Top 20 Most Deleterious Single Mutations
Rank Position WT → Mut dLLR Functional Role WT Conservation
Top 30 Mutation-Sensitive Residues
Residue AA ESM Sensitivity Evo Conservation WT Log-Prob Functional Role
Interpretation

ESM-2 and evolution strongly agree on which residues are critical. The Spearman correlation of 0.581 (p < 10-10) and Pearson of 0.714 (p < 10-17) between ESM-2 sensitivity and Shannon entropy conservation confirm that the protein language model has learned genuine structural and functional constraints from sequence alone.

Zinc-coordinating cysteines dominate the sensitivity landscape. C42, C56, C83, C75, C53, C68, and C94 are all among the top 15 most sensitive positions. Any mutation at these sites is catastrophic (dLLR < -10), consistent with their role as structural zinc ligands that maintain the RING-H2 fold. The most deleterious single mutation in the entire protein is I54W (dLLR = -18.25), a massive tryptophan insertion into the hydrophobic core adjacent to C53 and C56.

The Cullin face has the single most sensitive residue (D36, sensitivity = 13.0), which also has perfect evolutionary conservation (0.945). This aspartate likely forms critical salt bridges in the Cullin interaction. For binder design, targeting residues around D36 could be highly effective at disrupting the complex.

ESM-2 identifies some positions as sensitive that conservation misses. I54 (ESM sensitivity rank 5, conservation only 0.786) and I84 (rank 11, conservation 0.802) are moderately conserved but ESM predicts they are among the most intolerant of mutations — likely because they play critical roles in hydrophobic packing that the MSA alone doesn't fully capture.

Design implications: Binders should maximize contacts with high-sensitivity residues (especially D36, C42, R46, F79, W87) since these positions cannot easily mutate to escape binding. The DMS data also suggests that the N-terminal tail (residues 1–20) has low mutation sensitivity, confirming it is a poor target for binder design.

AF3 Validation Insights

Comprehensive analysis of what predicts AlphaFold3 validation success. ~51 designs validated with AF3, analyzed using Lasso regression on sequence and structural features. AF3 is our ground-truth orthogonal validation — Boltz-2 scores are heavily inflated for E2-face RFdiffusion designs.

~51
AF3 Validated
7
Top Hits (>=0.7)
5
Promote (0.5-0.7)
4
Review (0.4-0.5)
35
Discard (<0.4)
Lasso Regression: What Predicts AF3 Success?

LassoCV regression on 20 sequence and design features to predict AF3 ipTM. Features with non-zero coefficients are the strongest predictors after regularization. Positive = helps AF3 validation, negative = hurts.

Lasso coefficients
Lasso regression coefficients (standardized). The strongest predictor of AF3 success is being a BoltzGen design (untargeted, diverse topology), followed by sequence features like aromatic content and lower alanine fraction. High Boltz-2 ipTM is actually a weak or negative predictor — designs with inflated Boltz scores tend to fail AF3 validation.
Top Feature Correlations
Feature importance scatter
Top 8 features correlated with AF3 ipTM. Each subplot shows one feature vs AF3 score with Ridge regression line and Pearson r. Points colored by pipeline (blue=RFdiff E2, red=RFdiff Cullin, green=BoltzGen).
What Distinguishes Success from Failure?
Success vs failure comparison
Comparison of successful designs (AF3 ipTM >= 0.5) vs failed designs (AF3 ipTM < 0.3) across key features. Successful designs tend to have higher sequence entropy (more diverse amino acid usage), more aromatic residues, and lower alanine content.
Amino Acid Composition
AA composition
Amino acid frequency comparison between AF3-validated (ipTM >= 0.5) and AF3-failed (ipTM < 0.2) designs. Successful designs are enriched in structurally important residues (F, W, Y aromatics; charged residues) and depleted in simple residues (A, G).
BoltzGen vs RFdiffusion in AF3
Pipeline comparison
BoltzGen designs show dramatically better AF3 validation than RFdiffusion+MPNN designs. BoltzGen mean AF3 ipTM is 2-3x higher, despite lower Boltz-2 scores. This suggests RFdiffusion+MPNN designs may be optimizing for Boltz-2 scoring artifacts rather than genuine binding.
Key Findings

1. BoltzGen designs validate significantly better than RFdiffusion+MPNN designs in AF3, despite having lower Boltz-2 scores. This is the single strongest predictor of AF3 success.

2. Sequence diversity matters: designs with higher Shannon entropy (more diverse AA usage) and more unique amino acids tend to validate better. Low-complexity sequences rich in alanine and glycine consistently fail.

3. Aromatic residues (F, W, Y) are enriched in successful designs — these contribute to specific hydrophobic contacts at the interface that AF3 can validate.

4. High Boltz-2 ipTM is NOT predictive of AF3 success — in fact, it may be slightly anti-correlated. Designs with Boltz ipTM > 0.95 mostly fail AF3 validation (delta > 0.7).

5. Cullin-face designs show better AF3 agreement than E2-face designs, though the sample size is small. The one Cullin design tested (design_0357) has AF3 ipTM = 0.78 with low delta.

6. For future campaigns: prioritize BoltzGen-style generation, increase aromatic content in ProteinMPNN sampling, and reduce alanine/glycine bias. Consider Cullin-face targeting which appears more AF3-compatible.