Getting Started¶
Overview¶
pdb2reaction is a Python CLI toolkit for turning PDB structures into enzymatic reaction pathways using machine-learning interatomic potentials (MLIPs).
In many workflows, a single command like the one below is enough to generate a useful first-pass reaction path:
pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3'
You can also run MEP search → TS optimization → IRC → thermochemistry → single-point DFT in a single run by adding --tsopt True --thermo True --dft True:
pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --tsopt True --thermo True --dft True
Given (i) two or more full protein–ligand PDB files (R → … → P), or (ii) one PDB with --scan-lists, or (iii) one TS candidate with --tsopt True, pdb2reaction automatically:
extracts an active-site pocket around user‑defined substrates to build a cluster model,
explores minimum‑energy paths (MEPs) with path optimization methods such as the Growing String Method (GSM) and Direct Max Flux (DMF),
optionally optimizes transition states, runs vibrational analysis, IRC calculations, and single‑point DFT calculations.
Important
Treat single-command TS outputs as initial candidates. For enzyme reactions, iterative refinement is common (endpoint quality, pocket definition, constraints, scan targets), and TS validation with both freq and irc is required before interpretation.
At the UMA stage, calculations use Meta’s UMA machine-learning interatomic potential (MLIP).
The CLI is designed to generate multi‑step enzymatic reaction mechanisms with minimal manual intervention. The same workflow also works for small‑molecule systems. When you skip pocket extraction (omit --center/-c and --ligand-charge), you can also use .xyz or .gjf inputs.
On HPC clusters or multi‑GPU workstations, pdb2reaction can scale to large cluster models (and optionally full protein–ligand complexes) by parallelizing UMA inference across nodes. Set workers and workers_per_node to enable multi‑worker inference; see UMA Calculator for configuration details.
Important
Input PDB files must already contain hydrogen atoms.
When you provide multiple PDBs, they must contain the same atoms in the same order (only coordinates may differ); otherwise an error is raised.
Tip
If you are new to the project, read Concepts & Workflow first. If you encounter an error during setup or runtime, refer to Troubleshooting.
CLI conventions¶
Convention |
Example |
|---|---|
Boolean options |
|
Residue selectors |
|
Charge mapping |
|
Atom selectors |
|
For full details, see CLI Conventions.
path-search naming note: the CLI subcommand is path-search, while the documentation filename is path_search.md.
Recommended tools for hydrogen addition¶
If your PDB lacks hydrogen atoms, use one of the following tools before running pdb2reaction:
Tool |
Example Command |
Notes |
|---|---|---|
reduce (Richardson Lab) |
|
Fast, widely used for crystallographic structures |
pdb2pqr |
|
Adds hydrogens and assigns partial charges |
Open Babel |
|
General-purpose cheminformatics toolkit |
To ensure identical atom ordering across multiple PDB inputs, apply the same hydrogen-addition tool with consistent settings to all structures.
Warning
This software is still under development. Please use it at your own risk.
Installation¶
pdb2reaction is intended for Linux environments (local workstations or HPC clusters) with a CUDA‑capable GPU. Several dependencies – notably PyTorch, fairchem‑core (UMA), and gpu4pyscf‑cuda12x – expect a working CUDA installation.
Refer to the upstream projects for additional details:
fairchem / UMA: https://github.com/facebookresearch/fairchem, https://huggingface.co/facebook/UMA
Hugging Face token & security: https://huggingface.co/docs/hub/security-tokens
Quick start¶
Below is a minimal setup example that works on many CUDA 12.9 clusters. Adjust module names and versions to match your system. This example assumes the default GSM MEP mode (no DMF). For DMF, install cyipopt via conda first.
# 1) Install a CUDA-enabled PyTorch build
# 2) Install pdb2reaction from GitHub
# 3) Install a headless Chrome for Plotly figure export
pip install torch --index-url https://download.pytorch.org/whl/cu129
pip install git+https://github.com/t-0hmura/pdb2reaction.git
plotly_get_chrome -y
Finally, log in to Hugging Face Hub so that UMA models can be downloaded. Either:
# Hugging Face CLI
hf auth login --token '<YOUR_ACCESS_TOKEN>' --add-to-git-credential
or
# Classic CLI
huggingface-cli login
You only need to do this once per machine / environment.
If you want to use the Direct Max Flux (DMF) method for MEP search, create a conda environment and install cyipopt before installing pdb2reaction.
# Create and activate a dedicated conda environment conda create -n pdb2reaction python=3.11 -y conda activate pdb2reaction # Install cyipopt (required for the DMF method in MEP search) conda install -c conda-forge cyipopt -y
If you are on an HPC cluster that uses environment modules, load CUDA before installing PyTorch, like this:
module load cuda/12.9
Step-by-step installation¶
If you prefer to build the environment piece by piece:
Load CUDA (if you use environment modules on an HPC cluster)
module load cuda/12.9
Create and activate a conda environment
conda create -n pdb2reaction python=3.11 -y conda activate pdb2reaction
Install cyipopt Required if you want to use the DMF method in MEP search.
conda install -c conda-forge cyipopt -y
Install PyTorch with the right CUDA build
For CUDA 12.9:
pip install torch --index-url https://download.pytorch.org/whl/cu129
(You may use another compatible version if your cluster recommends it.)
Install
pdb2reactionitself and Chrome for visualizationpip install git+https://github.com/t-0hmura/pdb2reaction.git plotly_get_chrome -y
Log in to Hugging Face Hub (UMA model)
huggingface-cli loginSee also:
Verify installation
pdb2reaction --versionThis should display the installed version (e.g.,
{{ version }}).
Command line basics¶
The main entry point is the pdb2reaction command, installed via pip. Internally it uses the Click library, and the default subcommand is all.
That means:
pdb2reaction [OPTIONS] ...
# is equivalent to
pdb2reaction all [OPTIONS] ...
The all command runs the full pipeline—cluster extraction, MEP search, TS optimization, vibrational analysis, and optional DFT—in a single invocation.
All high-level workflows share two important options when you use cluster extraction:
-i/--input: one or more full structures (reactant, intermediate(s), product).-c/--center: how to define the substrate / extraction center (e.g., residue names or residue IDs).
If you omit --center/-c, cluster extraction is skipped and the full input structure is used directly.
Main workflow modes¶
Multi‑structure MEP workflow (reactant → product)¶
Use this when you already have several full PDB structures along a putative reaction coordinate (e.g., R → I1 → I2 → P).
Minimal example
pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3'
Richer example
pdb2reaction -i R.pdb I1.pdb I2.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --out-dir ./result_all --tsopt True --thermo True --dft True
Behavior:
takes two or more full systems in reaction order,
extracts catalytic cluster models for each structure,
performs a recursive MEP search via
path-searchby default (outputs underpath_search/),optionally switches to a single‑pass
path-optrun with--refine-path False,when PDB templates are available, merges the cluster-model MEP back into the full system,
optionally runs TS optimization, vibrational analysis, and single-point DFT calculations for each segment.
This is the recommended mode when you can generate reasonably spaced intermediates (e.g., from docking, MD, or manual modeling).
Important
pdb2reaction assumes that multiple input PDBs contain exactly the same atoms in the same order (only coordinates may differ). If any non-coordinate fields differ across inputs, an error is raised. Input PDB files must also contain hydrogen atoms.
Single‑structure + staged scan (feeds MEP refinement)¶
Use this when you only have one PDB structure, but you know which inter‑atomic distances should change along the reaction.
Provide a single -i together with --scan-lists:
Minimal example
pdb2reaction -i R.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --scan-lists '[("TYR 285 CA","MMT 309 C10",2.20),("TYR 285 CB","MMT 309 C11",1.80)]' '[("TYR 285 CB","MMT 309 C11",1.20)]'
Richer example
pdb2reaction -i SINGLE.pdb -c 'SAM,GPP' --scan-lists '[("TYR 285 CA","MMT 309 C10",2.20),("TYR 285 CB","MMT 309 C11",1.80)]' '[("TYR 285 CB","MMT 309 C11",1.20)]' --mult 1 --out-dir ./result_scan_all --tsopt True --thermo True --dft True
Key points:
--scan-listsdescribes staged distance scans on the extracted cluster model.Each tuple
(i, j, target_Å)is:a PDB atom selector string like
'TYR,285,CA'(delimiters can be: space/comma/slash/backtick/backslash,/`\) or a 1‑based atom index,automatically remapped to the cluster-model indices.
Supplying one
--scan-listsliteral runs a single scan stage; multiple literals run sequential stages. Pass multiple literals after a single flag (repeated flags are not accepted).Each stage writes a
stage_XX/result.pdb, which is treated as a candidate intermediate or product.The default
allworkflow refines the concatenated stages with recursivepath-search.With
--refine-path False, it instead performs a single-passpath-optchain and skips the recursive refiner (no mergedmep_w_ref*.pdb).
This mode is useful for building reaction paths starting from a single structure.
Single‑structure TSOPT‑only mode¶
Use this when you already have a transition state candidate and only want to optimize it and proceed to IRC calculations.
Provide exactly one PDB and enable --tsopt:
Minimal example
pdb2reaction -i TS_CANDIDATE.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --tsopt True
Richer example
pdb2reaction -i TS_CANDIDATE.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --tsopt True --thermo True --dft True --out-dir ./result_tsopt_only
Behavior:
skips the MEP/path search entirely,
optimizes the cluster-model TS with TS optimization,
runs an IRC in both directions and optimizes both ends to relax down to R and P minima,
can then perform
freqanddfton the R/TS/P,produces UMA, Gibbs, and DFT//UMA energy diagrams.
Outputs such as energy_diagram_*_all.png and irc_plot_all.png are mirrored under the top‑level --out-dir.
Important
Single‑input runs require either --scan-lists (staged scan → GSM) or --tsopt True (TSOPT‑only). Supplying only a single -i without one of these will not trigger a full workflow.
Important CLI options and behaviors¶
Below are the most commonly used options across workflows.
Option |
Description |
|---|---|
|
Input structures. ≥ 2 PDBs → MEP search; 1 PDB + |
|
Defines the substrate / extraction center. Supports residue names ( |
|
Charge info: mapping ( |
|
Hard override of total system charge. |
|
Spin multiplicity (e.g., |
|
Staged distance scans for single‑input runs. |
|
Top‑level output directory. |
|
Enable TS optimization and IRC. |
|
Run vibrational analysis and thermochemistry. |
|
Perform single‑point DFT calculations. |
|
Recursive MEP refinement (default) vs single‑pass. |
|
Optimization method: Light (LBFGS/Dimer) or Heavy (RFO/RS-I-RFO). |
|
MEP method: Growing String Method or Direct Max Flux. |
|
Hessian calculation mode. Analytical recommended when VRAM available. |
For a full matrix of options and YAML schemas, see all and YAML Reference.
Run summaries¶
Every pdb2reaction all run writes:
summary.log– formatted summary for quick inspection, andsummary.yaml– YAML version summary.
They typically contain:
the exact CLI command invoked,
global MEP statistics (e.g. maximum barrier, path length),
per‑segment barrier heights and key bond changes,
energies from UMA, thermochemistry, and DFT post‑processing (where enabled).
Each segment directory under path_search/ also gets its own summary.log and summary.yaml, so you can inspect local refinements independently.
CLI commands¶
Most users will primarily call pdb2reaction all. The CLI also exposes individual subcommands—each supports -h/--help.
Subcommand |
Role |
Documentation |
|---|---|---|
|
End-to-end workflow |
|
|
Extract active-site pocket (cluster model) |
|
|
Geometry optimization |
|
|
Transition state optimization |
|
|
MEP optimization (GSM/DMF) |
|
|
Recursive MEP search |
|
|
1D bond-length scan |
|
|
2D distance scan |
|
|
3D distance scan |
|
|
IRC calculation |
|
|
Vibrational analysis |
|
|
Single-point DFT |
|
|
Plot energy profiles |
|
|
Draw state energy diagram from numeric values |
|
|
Repair PDB element columns |
Important
Subcommands (except all) assume cluster models generated by extract. In these models, the atom closest to the Link‑H cap is automatically frozen. If you build a cluster model yourself, set the Link‑H residue name to LKH and atom name to HL, or specify atoms to freeze via --args-yaml → geom.freeze_atoms.
Tip
In all, tsopt, freq and irc, setting --hessian-calc-mode Analytical is strongly recommended when you have enough VRAM.
Quick reference¶
Common command patterns:
# Basic MEP search (2+ structures)
pdb2reaction -i R.pdb P.pdb -c 'SUBSTRATE' --ligand-charge 'SUB:-1'
# Full workflow with post-processing
pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' \
--tsopt True --thermo True --dft True
# Single structure with staged scan
pdb2reaction -i SINGLE.pdb -c 'LIG' --scan-lists '[("RES1,100,CA","LIG,200,C1",2.0)]'
# TS-only optimization
pdb2reaction -i TS.pdb -c 'LIG' --tsopt True --thermo True
Essential options:
Option |
Purpose |
|---|---|
|
Input structure(s) |
|
Substrate definition for pocket extraction |
|
Substrate charges (e.g., |
|
Enable TS optimization + IRC |
|
Run vibrational analysis |
|
Run single-point DFT |
|
Output directory |
Getting help¶
For any subcommand:
pdb2reaction <subcommand> --help
This prints the available options, defaults, and a short description. For detailed UMA calculator options, see UMA Calculator.
If you encounter any issues, please open an Issue on the GitHub repository.