Getting Started

Overview

pdb2reaction workflow overview

pdb2reaction is a Python CLI toolkit for modeling enzymatic reaction pathways from PDB structures using machine-learning interatomic potentials (MLIPs).

In many workflows, a single command like the one below is enough to generate a useful initial reaction path:

pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3'

You can also run Minimum Energy Path (MEP) search → Transition State (TS) optimization → Intrinsic Reaction Coordinate (IRC) → thermochemistry → single-point DFT in a single run by adding --tsopt --thermo --dft:

pdb2reaction -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' --tsopt --thermo --dft

Working examples: The examples/ directory contains complete all workflow scripts for GPP C6-methyltransferase BezA (Tsutsumi et al., Angew. Chem. Int. Ed. 2022, 61, e202111217), covering both multi-structure MEP and scan-based pipelines.

Given (i) two or more full protein–ligand PDB files (R → … → P), or (ii) one PDB with --scan-lists/-s, or (iii) one TS candidate with --tsopt, pdb2reaction automatically:

  • extracts an active site model (binding pocket) around user-defined substrates to build a cluster model,

  • explores minimum-energy paths (MEPs) with path optimization methods such as the Growing String Method (GSM) and Direct Max Flux (DMF),

  • optionally optimizes transition states, runs vibrational analysis, IRC calculations, and single-point DFT calculations.

Calculations use machine-learning interatomic potentials (MLIPs). The default backend is Meta’s UMA, but ORB, MACE, and AIMNet2 are also supported via -b/--backend. Typical use cases include:

  • Trial-and-error exploration of reaction mechanisms at a scale where DFT-level verification would be too slow

  • Generating initial geometries (reactant/TS/product cluster models) for subsequent quantum-chemistry refinement

  • High-throughput screening of reaction pathways across substrate variants or enzyme mutants

The CLI generates multi-step enzymatic reaction mechanisms with minimal manual setup. The same workflow also works for small-molecule systems. When you skip active site model extraction (omit --center/-c and --ligand-charge/-l), you can also use .xyz or .gjf inputs.

On HPC clusters or multi-GPU workstations, pdb2reaction can scale to large cluster models (and optionally full protein–ligand complexes) by parallelizing UMA inference across nodes. Set workers and workers_per_node to enable multi-worker inference; see MLIP Calculator for configuration details. Alternative backends (ORB, MACE, AIMNet2) can be selected with -b/--backend.

Pipeline overview

The all subcommand runs the following stages automatically:

PDB (R, P)
  |
  v
[extract]  Active site model extraction (cluster model)
  |
  v
[scan]  (optional, --scan-lists/-s) Staged distance-restraint scans
  |
  v
[path-search]  MEP search (recursive path-search, default; --refine-path False switches to path-opt)
  |
  v
[tsopt]  TS optimization (RS-I-RFO; Dimer as alternative)
  |
  v
[irc]  Intrinsic Reaction Coordinate
  |
  v
[freq]  Vibrational analysis + thermochemistry (R, TS, P)
  |
  v
[dft]  Single-point DFT energy (optional, --dft)

Each stage can also be run as a standalone subcommand. The all command orchestrates them and produces a unified summary.json and summary.log.

Key output files

File

Description

summary.json

Machine-readable results (barriers, energies, bond changes, environment)

summary.log

Human-readable text summary with directory tree

seg_XX/

IRC-optimized R/TS/P structures per reaction step

mep.pdb

Merged MEP trajectory viewable in PyMOL/VMD

energy_diagram_*.png

Energy profile plots (electronic / Gibbs-corrected)

Important

  • Input PDB files must already contain hydrogen atoms.

  • When you provide multiple PDBs, they must contain the same atoms in the same order (only coordinates may differ); otherwise an error is raised.

Tip

For symptom-first diagnosis, start with Common Error Recipes. If you encounter an error during setup or runtime, refer to Troubleshooting.

CLI conventions

Convention

Example

Notes

Residue selectors

'SAM,GPP' or 'A:123,B:456'

Quote multi-value strings to prevent shell expansion

Charge mapping

-l 'SAM:1,GPP:-3'

Colon separates name and charge; comma separates entries

Atom selectors

'TYR,285,CA' or 'TYR 285 CA'

Delimiters: space, comma, slash, backtick, backslash

For full details, see CLI Conventions.



Command line basics

The main entry point is the pdb2reaction command, installed via pip. A shorthand alias p2r is also registered by the pdb2reaction package (same setuptools entry point; you get both after pip install pdb2reaction) — all commands can be run with either name. Internally it uses the Click library, and the default subcommand is all.

That means:

pdb2reaction [OPTIONS]...
# is equivalent to
pdb2reaction all [OPTIONS]...

The all command runs the full pipeline—cluster extraction, MEP search, TS optimization, vibrational analysis, and optional DFT—in a single invocation.

All high-level workflows share two important options when you use cluster extraction:

  • -i/--input: one or more full structures (reactant, intermediate(s), product).

  • -c/--center: how to define the substrate / extraction center (e.g., residue names or residue IDs).

If you omit --center/-c, cluster extraction is skipped and the full input structure is used directly.


Main workflow modes

Mode

Description

Quickstart

Multi-structure MEP (≥ 2 PDBs)

Take full PDBs along a putative reaction coordinate (R → … → P), extract cluster models, run recursive MEP search, and optionally refine with TS / IRC / freq / DFT per segment.

Quickstart: pdb2reaction all

Single-structure + staged scan (1 PDB + --scan-lists/-s)

Drive one PDB through staged distance-restraint scans on the cluster model; each stage seeds the recursive path-search (or single-pass path-opt with --refine-path False).

Quickstart: single-structure staged scan

Single-structure TSOPT-only (1 PDB + --tsopt)

Skip MEP entirely; optimize a TS candidate, run IRC in both directions, optimize endpoints, and optionally run freq / DFT on R/TS/P.

Quickstart: TS optimization

Important

Single-input runs require either --scan-lists/-s (staged scan → GSM) or --tsopt (TSOPT-only). Supplying only a single -i without one of these will not trigger a full workflow.


Important CLI options and behaviors

Below are the most commonly used options across workflows.

Option

Description

-i, --input PATH...

Input structures. ≥ 2 PDBs → MEP search; 1 PDB + --scan-lists/-s → staged scan → GSM; 1 PDB + --tsopt → TSOPT-only mode.

-c, --center TEXT

Defines the substrate / extraction center. Supports residue names ('SAM,GPP'), residue IDs (A:123,B:456), or PDB paths.

-l, --ligand-charge TEXT

Charge info: mapping ('SAM:1,GPP:-3') or single integer.

-q, --charge INT

Hard override of net system charge.

-m, --multiplicity INT

Spin multiplicity (e.g., 1 for singlet).

--tsopt/--no-tsopt

Enable TS optimization and IRC.

-b, --backend TEXT

Select MLIP backend (uma, orb, mace, aimnet2).

For the complete option matrix, see CLI Conventions and the generated CLI reference.


Run summaries

Every pdb2reaction all run writes:

  • summary.log – text summary, and

  • summary.json – JSON summary.

They typically contain:

  • the exact CLI command invoked,

  • global MEP statistics (e.g. maximum barrier, path length),

  • per-segment barrier heights and key bond changes,

  • energies from the MLIP backend, thermochemistry, and DFT post-processing (where enabled).

Each segment directory under path_search/ (or path_opt/ when --refine-path False is used) also gets its own summary.log and summary.json, so you can inspect local refinements independently.


CLI commands

Most users will primarily call pdb2reaction all. The CLI also exposes individual subcommands; each supports -h/--help and (for the calculation/scan/extract/utility commands) --help-advanced for the full list. For the subcommand index with per-command documentation links, see the documentation home.


Agent Skills

pdb2reaction ships AI-agent instructions under .claude/skills/ covering the CLI subcommands, structure I/O (PDB / XYZ / GJF), backend installation (UMA / Orb / MACE / AIMNet2 / DFT / xtb), canonical workflows, output parsing, and HPC operation. Copy .claude/skills/ into your project repository or ~/.claude/skills/ for agent platforms like Claude Code or Cursor.


Getting help

For any subcommand:

pdb2reaction <subcommand> --help
pdb2reaction <subcommand> --help-advanced
pdb2reaction all --help-advanced

For detailed MLIP backend options, see MLIP Calculator.

If you encounter any issues, please open an Issue on the GitHub repository.