Generative Foundation Model for Drug Discovery

An open-source, zero-setup Google Colab notebook for running the advanced Boltz2 model. Predict protein structures, complexes, and binding affinities with an interactive UI and automated analysis.

Powerful Features, Simplified

Everything you need for advanced protein modeling, packed into one accessible notebook.

High-Resolution Structure Prediction

Generate detailed 3D structures for proteins, antibodies, and multi-chain complexes using generative diffusion models.

Binding Affinity & Pose Prediction

Predict structures with small molecule ligands (CCD/SMILES) and get binding affinity estimates.

Zero-Setup Colab Environment

Run directly in your browser with free GPU access. All dependencies are installed automatically.

Interactive UI for Parameters

No more manual config files. Use a simple form to input sequences, ligands, and run settings.

Automated Analysis & Reports

Automatically generates confidence plots (pLDDT, PAE), binding affinity dashboard, and a 3D viewer in a clean report.

Advanced User Controls

Customize MSA options, recycling steps, and diffusion parameters for fine-tuned predictions.

The Boltz2-Notebook Pipeline

Explore the end-to-end process of predicting protein structures, from initial setup to final results, in four distinct phases.

1

Setup & Configuration

This initial phase prepares the software environment and defines the inputs for the prediction task through a user-friendly interface.

Environment Setup

Installs all CUDA-enabled dependencies and validates the GPU.

Input Configuration

Define protein sequences, ligands, and run parameters via an interactive UI.

YAML Generation

Automatically formats user input into a machine-readable `params.yaml` file.

2

Core Prediction Engine

This is the automated, computational core of the pipeline. Boltz2 takes the configuration file and begins the intensive process of searching for evolutionary data and then running the deep learning model to predict the 3D structure.

MSA Search

Boltz2 connects to online bioinformatics servers to automatically fetch Multiple Sequence Alignments (MSAs), gathering crucial evolutionary context for the model.

Structure Prediction

The deep learning model uses the MSA and a diffusion process to generate 3D coordinates. It iteratively refines the structure through recycling steps for higher accuracy.

Binding Affinity Prediction

A specialized model predicts the binding strength (IC₅₀) and interaction probability for a given ligand, providing key insights for drug discovery.

3

Analysis & Visualization

Once the prediction is complete, the pipeline generates a rich set of outputs. The notebook then automates the analysis of these results, creating plots and an interactive 3D view to help the user assess the quality and characteristics of the predicted structure.

Output Generation

Generates 3D models (PDB), confidence scores (pLDDT), error matrices (PAE), and ligand affinity predictions (`affinity.json`).

Automated Plotting

Parses data and creates pLDDT confidence plots and PAE heatmaps for each predicted chain. It also generates a dashboard visualizing the binding probability (Hit Discovery) and the calculated IC₅₀ and ΔG values (Lead Optimization).

Interactive Visualization

Renders the predicted 3D structure in a 3Dmol.js viewer and assembles all plots into a final, comprehensive HTML report.

4

Export & Archiving

The final phase provides convenient options for saving and exporting all generated results. Users can either back up their work to the cloud for persistent storage or download a complete package to their local machine for offline analysis and record-keeping.

Export to Drive

Provides an option to copy the entire results folder—containing PDBs, plots, and logs—directly to a user's Google Drive.

Download Package

Packages all generated results into a single `.zip` file, allowing for easy download to a local computer.

See it in Action

Explore a sample output from a Boltz2-Notebook run. Interact with the 3D model and check the confidence plots.

pLDDT Score per Residue
Predicted Aligned Error (PAE)

Affinity Analysis

A visual report showing binding probability and predicted IC₅₀ values.

Ensemble Model
85%
(High Confidence)
Predicted IC₅₀: 0.18 µM
Predicted ΔG ≈ -9.2 kcal/mol
Weak Strong
(Strong Binder)

Frequently Asked Questions

Have questions? We've got answers.

Do I need a powerful computer to run this?

No! The notebook is designed to run on Google Colab, which provides free access to powerful GPUs in the cloud. All you need is a web browser and a Google account.

Is Boltz2-Notebook free to use?

Yes, this project is completely free and open-source, licensed under the MIT License. It builds upon the open-source Boltz2 model.

Can I predict multi-chain or complex proteins?

Yes. You can define multi-chain complexes by adding multiple "protein" blocks in the user interface. Simply click the "Add Protein" button for each new chain you want to include in the complex. Ensure that each chain has a unique ID (e.g., A, B, C).

What are `recycling_steps` and `sampling_steps`?

Recycling steps control how many times the model's output is fed back into the network for refinement. More steps (e.g., 3-6) can improve accuracy but increase runtime.

Sampling steps refer to the number of steps in the diffusion process that generates the structure from noise. Higher values (e.g., 100-200) can lead to higher-quality structures but also take longer.

What's the difference between "Hit Discovery" and "Lead Optimization"?

Hit Discovery: This is the initial screening stage. It answers the question: "Does this ligand bind to the protein at all?" The output is a probability score that helps identify promising compounds, or "hits."

Lead Optimization: This is the next step. It answers the question: "How strongly does this ligand bind?" The output is a predicted IC₅₀ value, which measures the drug's potency. This stage helps refine a "hit" into a more effective "lead" compound.

How are the IC₅₀ and ΔG values calculated?

The Boltz2 model does not directly predict the IC₅₀ or ΔG values. Instead, it predicts a core value called `affinity_pred_value`, which represents the log₁₀(IC₅₀) in micromolar (µM) units.

The notebook then uses this prediction to derive the final metrics:

  • IC₅₀ Calculation: The IC₅₀ value is calculated by taking 10 to the power of the model's prediction.
    IC₅₀ (µM) = 10 ^ affinity_pred_value
  • ΔG Calculation: The Gibbs free energy (ΔG), which represents the spontaneity of binding, is derived from the predicted `affinity_pred_value` using the following formula found in the analysis script:
    ΔG (kcal/mol) = (6 - affinity_pred_value) * 1.364

This approach allows the model to predict a logarithmic value, which is often more stable for machine learning, while the notebook provides the final, interpretable values for scientific analysis.

Can I predict the structure of a protein that is not in any database?

Yes. Boltz2, like other modern predictors, uses Multiple Sequence Alignments (MSAs) to learn evolutionary patterns. As long as it can find related protein sequences (homologs), it can make a confident prediction even for a novel protein whose structure has never been experimentally determined.

What is pLDDT and PAE?

pLDDT (predicted Local Distance Difference Test) is a per-residue confidence score from 0-100. Higher scores mean the model is more confident about the local structure. PAE (Predicted Aligned Error) indicates the model's confidence in the relative position of two residues. Lower PAE values (in Ångströms) are better, suggesting high confidence in the overall domain structure.

Cite Us

If you use Boltz2-Notebook in your research, please cite the following works.

For the Boltz2-Notebook

Tilewale, A. (2025). Boltz2-Notebook: A Google Colab platform for simplified protein structure prediction. GitHub. https://github.com/AtharvaTilewale/Boltz2-Notebook

Copied!

For the Boltz2 Model

Passaro, S., Corso, G., Wohlwend, J., et al. (2025). Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction. bioRxiv. doi:10.1101/2025.06.14.659707

Copied!
Atharva Tilewale

About the Author

Atharva Tilewale

A bioinformatics and computational biology enthusiast passionate about creating accessible tools for scientific research. Currently at Gujarat Biotechnology University.