Getting Started

Overview

pdb2reaction is a Python CLI toolkit for turning PDB structures into enzymatic reaction pathways using machine-learning interatomic potentials (MLIPs).

In many workflows, a single command like the one below is enough to generate a useful first-pass reaction path:

pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3'

You can also run MEP search → TS optimization → IRC → thermochemistry → single-point DFT in a single run by adding --tsopt True --thermo True --dft True:

pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --tsopt True --thermo True --dft True

Given (i) two or more full protein–ligand PDB files (R → … → P), or (ii) one PDB with --scan-lists, or (iii) one TS candidate with --tsopt True, pdb2reaction automatically:

  • extracts an active-site pocket around user‑defined substrates to build a cluster model,

  • explores minimum‑energy paths (MEPs) with path optimization methods such as the Growing String Method (GSM) and Direct Max Flux (DMF),

  • optionally optimizes transition states, runs vibrational analysis, IRC calculations, and single‑point DFT calculations.

Important

Treat single-command TS outputs as initial candidates. For enzyme reactions, iterative refinement is common (endpoint quality, pocket definition, constraints, scan targets), and TS validation with both freq and irc is required before interpretation.

At the UMA stage, calculations use Meta’s UMA machine-learning interatomic potential (MLIP).

The CLI is designed to generate multi‑step enzymatic reaction mechanisms with minimal manual intervention. The same workflow also works for small‑molecule systems. When you skip pocket extraction (omit --center/-c and --ligand-charge), you can also use .xyz or .gjf inputs.

On HPC clusters or multi‑GPU workstations, pdb2reaction can scale to large cluster models (and optionally full protein–ligand complexes) by parallelizing UMA inference across nodes. Set workers and workers_per_node to enable multi‑worker inference; see UMA Calculator for configuration details.

Important

  • Input PDB files must already contain hydrogen atoms.

  • When you provide multiple PDBs, they must contain the same atoms in the same order (only coordinates may differ); otherwise an error is raised.

Tip

If you are new to the project, read Concepts & Workflow first. If you encounter an error during setup or runtime, refer to Troubleshooting.

CLI conventions

Convention

Example

Boolean options

--tsopt True, --dft False (case-insensitive; true/1/yes also work, but flag-style --tsopt alone is not allowed)

Residue selectors

'SAM,GPP' or 'A:123,B:456'

Charge mapping

--ligand-charge 'SAM:1,GPP:-3'

Atom selectors

'TYR,285,CA' or 'TYR 285 CA'

For full details, see CLI Conventions.

path-search naming note: the CLI subcommand is path-search, while the documentation filename is path_search.md.


Installation

pdb2reaction is intended for Linux environments (local workstations or HPC clusters) with a CUDA‑capable GPU. Several dependencies – notably PyTorch, fairchem‑core (UMA), and gpu4pyscf‑cuda12x – expect a working CUDA installation.

Refer to the upstream projects for additional details:

Quick start

Below is a minimal setup example that works on many CUDA 12.9 clusters. Adjust module names and versions to match your system. This example assumes the default GSM MEP mode (no DMF). For DMF, install cyipopt via conda first.

# 1) Install a CUDA-enabled PyTorch build
# 2) Install pdb2reaction from GitHub
# 3) Install a headless Chrome for Plotly figure export

pip install torch --index-url https://download.pytorch.org/whl/cu129
pip install git+https://github.com/t-0hmura/pdb2reaction.git
plotly_get_chrome -y

Finally, log in to Hugging Face Hub so that UMA models can be downloaded. Either:

# Hugging Face CLI
hf auth login --token '<YOUR_ACCESS_TOKEN>' --add-to-git-credential

or

# Classic CLI
huggingface-cli login

You only need to do this once per machine / environment.

  • If you want to use the Direct Max Flux (DMF) method for MEP search, create a conda environment and install cyipopt before installing pdb2reaction.

    # Create and activate a dedicated conda environment
    conda create -n pdb2reaction python=3.11 -y
    conda activate pdb2reaction
    
    # Install cyipopt (required for the DMF method in MEP search)
    conda install -c conda-forge cyipopt -y
    
  • If you are on an HPC cluster that uses environment modules, load CUDA before installing PyTorch, like this:

    module load cuda/12.9
    

Step-by-step installation

If you prefer to build the environment piece by piece:

  1. Load CUDA (if you use environment modules on an HPC cluster)

    module load cuda/12.9
    
  2. Create and activate a conda environment

    conda create -n pdb2reaction python=3.11 -y
    conda activate pdb2reaction
    
  3. Install cyipopt Required if you want to use the DMF method in MEP search.

    conda install -c conda-forge cyipopt -y
    
  4. Install PyTorch with the right CUDA build

    For CUDA 12.9:

    pip install torch --index-url https://download.pytorch.org/whl/cu129
    

    (You may use another compatible version if your cluster recommends it.)

  5. Install pdb2reaction itself and Chrome for visualization

    pip install git+https://github.com/t-0hmura/pdb2reaction.git
    plotly_get_chrome -y
    
  6. Log in to Hugging Face Hub (UMA model)

    huggingface-cli login
    

    See also:

  7. Verify installation

    pdb2reaction --version
    

    This should display the installed version (e.g., {{ version }}).


Command line basics

The main entry point is the pdb2reaction command, installed via pip. Internally it uses the Click library, and the default subcommand is all.

That means:

pdb2reaction [OPTIONS] ...
# is equivalent to
pdb2reaction all [OPTIONS] ...

The all command runs the full pipeline—cluster extraction, MEP search, TS optimization, vibrational analysis, and optional DFT—in a single invocation.

All high-level workflows share two important options when you use cluster extraction:

  • -i/--input: one or more full structures (reactant, intermediate(s), product).

  • -c/--center: how to define the substrate / extraction center (e.g., residue names or residue IDs).

If you omit --center/-c, cluster extraction is skipped and the full input structure is used directly.


Main workflow modes

Multi‑structure MEP workflow (reactant → product)

Use this when you already have several full PDB structures along a putative reaction coordinate (e.g., R → I1 → I2 → P).

Minimal example

pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3'

Richer example

pdb2reaction -i R.pdb I1.pdb I2.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --out-dir ./result_all --tsopt True --thermo True --dft True

Behavior:

  • takes two or more full systems in reaction order,

  • extracts catalytic cluster models for each structure,

  • performs a recursive MEP search via path-search by default (outputs under path_search/),

  • optionally switches to a single‑pass path-opt run with --refine-path False,

  • when PDB templates are available, merges the cluster-model MEP back into the full system,

  • optionally runs TS optimization, vibrational analysis, and single-point DFT calculations for each segment.

This is the recommended mode when you can generate reasonably spaced intermediates (e.g., from docking, MD, or manual modeling).

Important

pdb2reaction assumes that multiple input PDBs contain exactly the same atoms in the same order (only coordinates may differ). If any non-coordinate fields differ across inputs, an error is raised. Input PDB files must also contain hydrogen atoms.


Single‑structure + staged scan (feeds MEP refinement)

Use this when you only have one PDB structure, but you know which inter‑atomic distances should change along the reaction.

Provide a single -i together with --scan-lists:

Minimal example

pdb2reaction -i R.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --scan-lists '[("TYR 285 CA","MMT 309 C10",2.20),("TYR 285 CB","MMT 309 C11",1.80)]' '[("TYR 285 CB","MMT 309 C11",1.20)]'

Richer example

pdb2reaction -i SINGLE.pdb -c 'SAM,GPP' --scan-lists '[("TYR 285 CA","MMT 309 C10",2.20),("TYR 285 CB","MMT 309 C11",1.80)]' '[("TYR 285 CB","MMT 309 C11",1.20)]' --mult 1 --out-dir ./result_scan_all --tsopt True --thermo True --dft True

Key points:

  • --scan-lists describes staged distance scans on the extracted cluster model.

  • Each tuple (i, j, target_Å) is:

    • a PDB atom selector string like 'TYR,285,CA' (delimiters can be: space/comma/slash/backtick/backslash , / ` \) or a 1‑based atom index,

    • automatically remapped to the cluster-model indices.

  • Supplying one --scan-lists literal runs a single scan stage; multiple literals run sequential stages. Pass multiple literals after a single flag (repeated flags are not accepted).

  • Each stage writes a stage_XX/result.pdb, which is treated as a candidate intermediate or product.

  • The default all workflow refines the concatenated stages with recursive path-search.

  • With --refine-path False, it instead performs a single-pass path-opt chain and skips the recursive refiner (no merged mep_w_ref*.pdb).

This mode is useful for building reaction paths starting from a single structure.


Single‑structure TSOPT‑only mode

Use this when you already have a transition state candidate and only want to optimize it and proceed to IRC calculations.

Provide exactly one PDB and enable --tsopt:

Minimal example

pdb2reaction -i TS_CANDIDATE.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --tsopt True

Richer example

pdb2reaction -i TS_CANDIDATE.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' --tsopt True --thermo True --dft True --out-dir ./result_tsopt_only

Behavior:

  • skips the MEP/path search entirely,

  • optimizes the cluster-model TS with TS optimization,

  • runs an IRC in both directions and optimizes both ends to relax down to R and P minima,

  • can then perform freq and dft on the R/TS/P,

  • produces UMA, Gibbs, and DFT//UMA energy diagrams.

Outputs such as energy_diagram_*_all.png and irc_plot_all.png are mirrored under the top‑level --out-dir.

Important

Single‑input runs require either --scan-lists (staged scan → GSM) or --tsopt True (TSOPT‑only). Supplying only a single -i without one of these will not trigger a full workflow.


Important CLI options and behaviors

Below are the most commonly used options across workflows.

Option

Description

-i, --input PATH...

Input structures. ≥ 2 PDBs → MEP search; 1 PDB + --scan-lists → staged scan → GSM; 1 PDB + --tsopt True → TSOPT‑only mode.

-c, --center TEXT

Defines the substrate / extraction center. Supports residue names ('SAM,GPP'), residue IDs (A:123,B:456), or PDB paths.

--ligand-charge TEXT

Charge info: mapping ('SAM:1,GPP:-3') or single integer.

-q, --charge INT

Hard override of total system charge.

-m, --mult INT

Spin multiplicity (e.g., 1 for singlet). Note: Use --multiplicity in subcommands other than all.

--scan-lists TEXT...

Staged distance scans for single‑input runs.

--out-dir PATH

Top‑level output directory.

--tsopt {True|False}

Enable TS optimization and IRC.

--thermo {True|False}

Run vibrational analysis and thermochemistry.

--dft {True|False}

Perform single‑point DFT calculations.

--refine-path {True|False}

Recursive MEP refinement (default) vs single‑pass.

--opt-mode light|heavy

Optimization method: Light (LBFGS/Dimer) or Heavy (RFO/RS-I-RFO).

--mep-mode gsm|dmf

MEP method: Growing String Method or Direct Max Flux.

--hessian-calc-mode Analytical|FiniteDifference

Hessian calculation mode. Analytical recommended when VRAM available.

For a full matrix of options and YAML schemas, see all and YAML Reference.


Run summaries

Every pdb2reaction all run writes:

  • summary.log – formatted summary for quick inspection, and

  • summary.yaml – YAML version summary.

They typically contain:

  • the exact CLI command invoked,

  • global MEP statistics (e.g. maximum barrier, path length),

  • per‑segment barrier heights and key bond changes,

  • energies from UMA, thermochemistry, and DFT post‑processing (where enabled).

Each segment directory under path_search/ also gets its own summary.log and summary.yaml, so you can inspect local refinements independently.


CLI commands

Most users will primarily call pdb2reaction all. The CLI also exposes individual subcommands—each supports -h/--help.

Subcommand

Role

Documentation

all

End-to-end workflow

all

extract

Extract active-site pocket (cluster model)

extract

opt

Geometry optimization

opt

tsopt

Transition state optimization

tsopt

path-opt

MEP optimization (GSM/DMF)

path_opt

path-search

Recursive MEP search

path_search

scan

1D bond-length scan

scan

scan2d

2D distance scan

scan2d

scan3d

3D distance scan

scan3d

irc

IRC calculation

irc

freq

Vibrational analysis

freq

dft

Single-point DFT

dft

trj2fig

Plot energy profiles

trj2fig

energy-diagram

Draw state energy diagram from numeric values

energy-diagram

add-elem-info

Repair PDB element columns

add_elem_info

Important

Subcommands (except all) assume cluster models generated by extract. In these models, the atom closest to the Link‑H cap is automatically frozen. If you build a cluster model yourself, set the Link‑H residue name to LKH and atom name to HL, or specify atoms to freeze via --args-yamlgeom.freeze_atoms.

Tip

In all, tsopt, freq and irc, setting --hessian-calc-mode Analytical is strongly recommended when you have enough VRAM.


Quick reference

Common command patterns:

# Basic MEP search (2+ structures)
pdb2reaction -i R.pdb P.pdb -c 'SUBSTRATE' --ligand-charge 'SUB:-1'

# Full workflow with post-processing
pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' --ligand-charge 'SAM:1,GPP:-3' \
    --tsopt True --thermo True --dft True

# Single structure with staged scan
pdb2reaction -i SINGLE.pdb -c 'LIG' --scan-lists '[("RES1,100,CA","LIG,200,C1",2.0)]'

# TS-only optimization
pdb2reaction -i TS.pdb -c 'LIG' --tsopt True --thermo True

Essential options:

Option

Purpose

-i

Input structure(s)

-c

Substrate definition for pocket extraction

--ligand-charge

Substrate charges (e.g., 'SAM:1,GPP:-3')

--tsopt True

Enable TS optimization + IRC

--thermo True

Run vibrational analysis

--dft True

Run single-point DFT

--out-dir

Output directory


Getting help

For any subcommand:

pdb2reaction <subcommand> --help

This prints the available options, defaults, and a short description. For detailed UMA calculator options, see UMA Calculator.

If you encounter any issues, please open an Issue on the GitHub repository.