Sergio E. Mares
Computational Biology Ph.D. Candidate at UC Berkeley

I am a fifth year Computational Biology Ph.D. Candidate at UC Berkeley, advised by Professor Nilah Ioannidis at the Center for Computational Biology and Professor Joseph Costello at the UCSF Neurosurgery Department.


My research focuses on building machine learning models for cancer immunotherapy. I develop protein language models for predicting peptide-MHC class I binding affinity and use structure-conditioned diffusion models to design novel immunogenic peptide libraries. The goal is to expand the space of targetable tumor antigens, particularly for brain tumors where current therapeutic options are limited.


In Summer 2025 I interned at Ultima Genomics, where I integrated a DNA sequence simulation into the production sequencing pipeline (reducing reagent use by 50%) and built a scalable single-cell ATAC-seq processing pipeline handling up to 100M cells end-to-end.

Sergio E. Mares
pMHC-I binding
Sergio E. Mares, Ariel Espinoza, Nilah M. Ioannidis
Machine Learning in Computational Biology (MLCB), 2025
We test whether domain-specific continued pre-training of protein language models is beneficial for pMHC-I binding affinity prediction. Starting from ESM Cambrian (300M parameters), we perform masked-language modeling on HLA-associated peptides and fine-tune for quantitative IC50 binding affinity prediction.
Structure-guided pMHC-I design
Sergio E. Mares, Ariel Espinoza, Nilah M. Ioannidis
ICML Gen AI and Biology Workshop, 2025
We introduce a structure-guided benchmark of pMHC-I peptides designed using diffusion models conditioned on crystal structure interaction distances, spanning twenty high-priority HLA alleles.
Calcium signaling protein structure
Biraj B. Kayastha, A. Kubo, J. Burch-Konda, R. L. Dohmen, J. L. McCoy, R. R. Rogers, Sergio E. Mares, J. Bevere, A. Huckaby, W. Witt, S. Peng, B. Chaudhary, S. Mohanty, M. Barbier, G. Cook, J. Deng, M. Patrauchan
Nature Scientific Reports, 2022
We study the putative Ca²+-binding protein EfhP (PA4107) and CalC as proteins involved in the calcium network, elucidating the mechanisms of bacterial Ca²+ signaling in Pseudomonas aeruginosa.
Baculovirus invadosome dynamics
Domokos I. Lauko, Taro Ohkawa, Sergio E. Mares, Matthew D. Welch
Molecular Biology of the Cell, 2021
We investigate how AcMNPV protein actin rearrangement inducing factor-1 (Arif-1) induces the formation of cortical concentrations of polymerized actin (ventral aggregates) in cultured insect cells.
Pseudomonas aeruginosa
Sergio E. Mares, M. King, A. Kubo, A. Khavov, E. Lutter, N. Youssef, M. Patrauchan
Journal of Microbiology, 2020
We study the conservation of carP sequence and its occurrence in diverse phylogenetic groups, finding that carP and its two paralogues are primarily present in P. aeruginosa and belong to the core genome, demonstrating potential as a biomarker.
Myxococcota swarming
Chelsea L. Murphy, R. Yang, T. Decker, C. Cavalliere, V. Andreev, N. Bircher, J. Cornell, R. Dohmen, C. J. Pratt, A. Grinnell, J. Higgs, C. Jett, E. Gillett, R. Khadka, Sergio E. Mares, C. Meili, J. Liu, H. Mukhtar, Mostafa S. Elshahed, Noha H. Youssef
Environmental Microbiology, 2021
Detailed analysis of 13 distinct pathways crucial to predation and cellular differentiation reveals severely curtailed machineries, proposing that these represent a niche adaptation strategy that evolved circa 500 million years ago.
Blog post image
Teaching a Protein Language Model to Speak "Immune"
February 2026
A walkthrough of our MLCB 2025 paper on continued pre-training of protein language models for pMHC-I binding prediction — why we did it, how it works, and what surprised us.
Blog post image
What If We Could Design Immune Peptides from Scratch — Using Physics Instead of Data?
February 2026
A walkthrough of our ICML 2025 workshop paper on generating pMHC-I libraries with diffusion models — the dataset bias problem, our structure-first approach, and why existing predictors completely failed on our designed peptides.

A collection of informal reviews of papers I find interesting — mostly in the protein structure prediction, protein design, and protein language model space. These are from Sergey Ovchinnikov's lab and related groups. Just my thoughts, nothing too formal.

Protein Diffusion Models as Statistical Potentials
Roney, Ou, Ovchinnikov · bioRxiv 2025
What if we could repurpose protein diffusion models as energy functions? ProteinEBM does exactly that — turning a generative model into a scoring function that can rank structures, predict conformational landscapes, and estimate mutation effects.
Designing Novel Solenoid Proteins with In Silico Evolution
Pretorius, Nikov, Washio, Florent, Taunt, Ovchinnikov, Murray · Communications Chemistry 2025
Solenoid proteins are nature's modular building blocks. This paper uses AlphaFold2 as an oracle inside a genetic algorithm to design entirely new solenoid folds — and 20% of them actually work in the lab.
CIRPIN: Learning Circular Permutation-Invariant Representations to Uncover Putative Protein Homologs
Kolodziej, Abulnaga, Ovchinnikov · bioRxiv 2025
Most structure comparison tools miss proteins that are related by circular permutation. CIRPIN fixes this with a clever graph neural network that doesn't care where the chain starts — uncovering thousands of hidden evolutionary relationships.
Hit or Miss: Understanding Emergence and Absence of Homo-oligomeric Contacts in Protein Language Models
Zhang, Akiyama, Cho, Jajoo, Ovchinnikov · bioRxiv 2025
Protein language models are trained on single chains, yet they somehow learn about protein-protein interfaces. This paper digs into how and why — and finds that bigger models keep getting better at inter-chain contacts even after intra-chain accuracy plateaus.
Assessing the Utility of Coevolution-Based Residue–Residue Contact Predictions in a Sequence- and Structure-Rich Era
Kamisetty, Ovchinnikov, Baker · PNAS 2013
The 2013 paper that helped establish when coevolution-based contact prediction is actually useful. A foundational work that set the stage for everything from direct coupling analysis to AlphaFold.

More papers I find interesting

De Novo Design of Protein Structure and Function with RFdiffusion
Watson, Juergens, Bennett et al. · Nature 2023
The paper that brought diffusion models to protein design in a big way. RFdiffusion generates protein backbones from scratch and can design binders, symmetric assemblies, and enzyme scaffolds — many validated experimentally.
Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model (ESMFold)
Lin, Abanades, Rao, Johnson, Rives et al. · Science 2023
What if you could predict protein structure from a single sequence, no alignment needed? ESMFold does this at AlphaFold-like accuracy with a 15 billion parameter language model, enabling structure prediction for 600+ million metagenomic proteins.
Simulating 500 Million Years of Evolution with a Language Model (ESM3)
Hayes, Rao, Akin et al. · Science 2025
ESM3 is a 98-billion-parameter multimodal model that reasons over protein sequence, structure, and function simultaneously. It designed a novel fluorescent protein with only 58% identity to anything in nature — equivalent to 500 million years of evolution.
Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3
Abramson, Adler, Dunger et al. · Nature 2024
AlphaFold 3 moves beyond proteins to predict the structures of complexes involving DNA, RNA, small molecules, and ions — with a diffusion-based architecture that substantially outperforms specialized tools for drug-like interactions.
Protein Language Models Learn Evolutionary Statistics of Interacting Sequence Motifs
Zhang, Wayment-Steele, Brixi, Wang, Kern, Ovchinnikov · PNAS 2024
What do protein language models actually learn? This paper shows ESM-2 stores coevolutionary statistics as motifs of pairwise contacts — bridging the gap between classical coevolution and modern deep learning.
Molecular Modeling and Simulation
Molecular Modeling and Simulation: An Interdisciplinary Guide
Tamar Schlick
Finished
Pedro Páramo
Pedro Páramo
Juan Rulfo
Currently Reading
On the Origin of Species
On the Origin of Species
Charles Darwin
Currently Reading
Structural Bioinformatics
Structural Bioinformatics
Philip E. Bourne & Helge Weissig
Currently Reading
Soviet Middlegame Technique
Soviet Middlegame Technique
Peter Romanovsky
Currently Reading
Miles de millones
Miles de millones
Carl Sagan
Currently Reading
Cien años de soledad
Cien años de soledad
Gabriel García Márquez
Currently Reading
♘ Chess
I've been playing chess since I moved to the US. I mainly play rapid and blitz on Lichess. Feel free to challenge me!
💻 Open Source
Building tools at the intersection of ML and biology. Check out my projects on GitHub.