Generative Foundation Model for Drug Discovery
An open-source, zero-setup Google Colab notebook for running the advanced Boltz2 model. Predict protein structures, complexes, and binding affinities with an interactive UI and automated analysis.
Powerful Features, Simplified
Everything you need for advanced protein modeling, packed into one accessible notebook.
High-Resolution Structure Prediction
Generate detailed 3D structures for proteins, antibodies, and multi-chain complexes using generative diffusion models.
Binding Affinity & Pose Prediction
Predict structures with small molecule ligands (CCD/SMILES) and get binding affinity estimates.
Zero-Setup Colab Environment
Run directly in your browser with free GPU access. All dependencies are installed automatically.
Interactive UI for Parameters
No more manual config files. Use a simple form to input sequences, ligands, and run settings.
Automated Analysis & Reports
Automatically generates confidence plots (pLDDT, PAE), binding affinity dashboard, and a 3D viewer in a clean report.
Advanced User Controls
Customize MSA options, recycling steps, and diffusion parameters for fine-tuned predictions.
The Boltz2-Notebook Pipeline
Explore the end-to-end process of predicting protein structures, from initial setup to final results, in four distinct phases.
Setup & Configuration
This initial phase prepares the software environment and defines the inputs for the prediction task through a user-friendly interface.
Environment Setup
Installs all CUDA-enabled dependencies and validates the GPU.
Input Configuration
Define protein sequences, ligands, and run parameters via an interactive UI.
YAML Generation
Automatically formats user input into a machine-readable `params.yaml` file.
Core Prediction Engine
This is the automated, computational core of the pipeline. Boltz2 takes the configuration file and begins the intensive process of searching for evolutionary data and then running the deep learning model to predict the 3D structure.
MSA Search
Boltz2 connects to online bioinformatics servers to automatically fetch Multiple Sequence Alignments (MSAs), gathering crucial evolutionary context for the model.
Structure Prediction
The deep learning model uses the MSA and a diffusion process to generate 3D coordinates. It iteratively refines the structure through recycling steps for higher accuracy.
Binding Affinity Prediction
A specialized model predicts the binding strength (IC₅₀) and interaction probability for a given ligand, providing key insights for drug discovery.
Analysis & Visualization
Once the prediction is complete, the pipeline generates a rich set of outputs. The notebook then automates the analysis of these results, creating plots and an interactive 3D view to help the user assess the quality and characteristics of the predicted structure.
Output Generation
Generates 3D models (PDB), confidence scores (pLDDT), error matrices (PAE), and ligand affinity predictions (`affinity.json`).
Automated Plotting
Parses data and creates pLDDT confidence plots and PAE heatmaps for each predicted chain. It also generates a dashboard visualizing the binding probability (Hit Discovery) and the calculated IC₅₀ and ΔG values (Lead Optimization).
Interactive Visualization
Renders the predicted 3D structure in a 3Dmol.js viewer and assembles all plots into a final, comprehensive HTML report.
Export & Archiving
The final phase provides convenient options for saving and exporting all generated results. Users can either back up their work to the cloud for persistent storage or download a complete package to their local machine for offline analysis and record-keeping.
Export to Drive
Provides an option to copy the entire results folder—containing PDBs, plots, and logs—directly to a user's Google Drive.
Download Package
Packages all generated results into a single `.zip` file, allowing for easy download to a local computer.
See it in Action
Explore a sample output from a Boltz2-Notebook run. Interact with the 3D model and check the confidence plots.
pLDDT Score per Residue
Predicted Aligned Error (PAE)
Affinity Analysis
A visual report showing binding probability and predicted IC₅₀ values.
Frequently Asked Questions
Have questions? We've got answers.
Do I need a powerful computer to run this?
No! The notebook is designed to run on Google Colab, which provides free access to powerful GPUs in the cloud. All you need is a web browser and a Google account.
Is Boltz2-Notebook free to use?
Yes, this project is completely free and open-source, licensed under the MIT License. It builds upon the open-source Boltz2 model.
Can I predict multi-chain or complex proteins?
Yes. You can define multi-chain complexes by adding multiple "protein" blocks in the user interface. Simply click the "Add Protein" button for each new chain you want to include in the complex. Ensure that each chain has a unique ID (e.g., A, B, C).
What are `recycling_steps` and `sampling_steps`?
Recycling steps control how many times the model's output is fed back into the network for refinement. More steps (e.g., 3-6) can improve accuracy but increase runtime.
Sampling steps refer to the number of steps in the diffusion process that generates the structure from noise. Higher values (e.g., 100-200) can lead to higher-quality structures but also take longer.
What's the difference between "Hit Discovery" and "Lead Optimization"?
Hit Discovery: This is the initial screening stage. It answers the question: "Does this ligand bind to the protein at all?" The output is a probability score that helps identify promising compounds, or "hits."
Lead Optimization: This is the next step. It answers the question: "How strongly does this ligand bind?" The output is a predicted IC₅₀ value, which measures the drug's potency. This stage helps refine a "hit" into a more effective "lead" compound.
How are the IC₅₀ and ΔG values calculated?
The Boltz2 model does not directly predict the IC₅₀ or ΔG values. Instead, it predicts a core value called `affinity_pred_value`, which represents the log₁₀(IC₅₀) in micromolar (µM) units.
The notebook then uses this prediction to derive the final metrics:
-
IC₅₀ Calculation: The IC₅₀ value is calculated by taking 10 to
the power of the model's prediction.
IC₅₀ (µM) = 10 ^ affinity_pred_value
-
ΔG Calculation: The Gibbs free energy (ΔG), which represents
the spontaneity of binding, is derived from the predicted `affinity_pred_value`
using the following formula found in the analysis script:
ΔG (kcal/mol) = (6 - affinity_pred_value) * 1.364
This approach allows the model to predict a logarithmic value, which is often more stable for machine learning, while the notebook provides the final, interpretable values for scientific analysis.
Can I predict the structure of a protein that is not in any database?
Yes. Boltz2, like other modern predictors, uses Multiple Sequence Alignments (MSAs) to learn evolutionary patterns. As long as it can find related protein sequences (homologs), it can make a confident prediction even for a novel protein whose structure has never been experimentally determined.
What is pLDDT and PAE?
pLDDT (predicted Local Distance Difference Test) is a per-residue confidence score from 0-100. Higher scores mean the model is more confident about the local structure. PAE (Predicted Aligned Error) indicates the model's confidence in the relative position of two residues. Lower PAE values (in Ångströms) are better, suggesting high confidence in the overall domain structure.
Cite Us
If you use Boltz2-Notebook in your research, please cite the following works.
For the Boltz2-Notebook
Tilewale, A. (2025). Boltz2-Notebook: A Google Colab platform for simplified protein structure prediction. GitHub. https://github.com/AtharvaTilewale/Boltz2-Notebook
For the Boltz2 Model
Passaro, S., Corso, G., Wohlwend, J., et al. (2025). Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction. bioRxiv. doi:10.1101/2025.06.14.659707