all¶
pdb2reaction all runs the entire workflow end-to-end so you can go from structures to a validated mechanism in one command, instead of chaining extract → scan / path-search → tsopt → irc / freq / dft by hand. Starting from one or more PDB inputs, it extracts an active-site cluster model, runs an optional staged scan, performs an MEP search (single-pass path-opt by default; --refine-path True runs recursive path-search), and optionally chains TS optimization, IRC, vibrational analysis, and single-point DFT. The default MLIP backend is UMA; choose an alternative with -b/--backend.
all runs in one of three modes, chosen by what you pass:
Multi-structure MEP (
[mode] all (mep)) — give ≥ 2 structures in reaction order plus a substrate definition.allextracts active-site models, runs GSM / DMF MEP search, merges the optimized path back into the full-system template(s), and optionally runs TSOPT + IRC / freq / DFT per reactive segment.Single-structure staged scan (
[mode] all (scan-lists)) — give one structure plus one or more--scan-lists/-sliterals, each defining a scan stage; the staged scan produces the ordered intermediates that drive the MEP step.TSOPT-only — give a single input and set
--tsopt(no--scan-lists).allskips the MEP / merge stages, runstsopt+ EulerPC IRC on the active-site model (or the full input if extraction is skipped), and identifies the higher-energy endpoint as the reactant.
Important
Without --tsopt, the workflow produces TS candidates (highest-energy images from MEP search). Adding --tsopt refines them into optimized TS structures validated by an imaginary-frequency check, followed by IRC for endpoint validation. Always inspect the results (imaginary-frequency count and IRC endpoint connectivity) before mechanistic interpretation.
Examples¶
Working examples for GPP C6-methyltransferase BezA (Tsutsumi et al., Angew. Chem. Int. Ed. 2022, 61, e202111217) covering both multi-structure MEP and scan-based pipelines: examples/.
Command form:
pdb2reaction all -i INPUT1 [INPUT2 ...] -c SUBSTRATE [-b uma|orb|mace|aimnet2] [--solvent SOLVENT] [--solvent-model alpb|cpcmx] [options]
Multi-structure MEP with TS + thermo + DFT:
# Multi-structure MEP with TS + thermo + DFT
pdb2reaction all -i 1.R.pdb 3.P.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
--tsopt --thermo --dft --out-dir ./result_mep
Single-structure staged scan (two stages):
# Single-structure staged scan (two stages)
pdb2reaction all -i 1.R.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
-s '[("CS1 SAM 320","GPP 321 C7",1.60)]' '[("GPP 321 H11","GLU 186 OE2",0.90)]' \
--tsopt --thermo --out-dir ./result_scan
TSOPT-only validation of a single TS candidate:
# TSOPT-only validation of a single TS candidate
pdb2reaction all -i TS_candidate.pdb -c 'SAM,GPP,MG' -l 'SAM:1,GPP:-3' \
--tsopt --thermo --dft
PDB / GJF companion files are generated automatically when reference templates are available; control with --convert-files (on by default).
pdb2reaction all --help shows core options; pdb2reaction all --help-advanced shows the full option list.
Workflow¶
Full system(s) (PDB / XYZ / GJF)
├─ (optional) active-site extraction `extract` — requires PDB when `-c` is used
│ └─ active-site cluster model(s)
│ ├─ (optional) staged scan `scan` — single-structure workflows
│ │ └─ ordered intermediates
│ └─ MEP search `path-opt` (default) or `path-search` (recursive, `--refine-path True`)
│ └─ MEP trajectory `mep_trj.xyz` + energy diagrams
└─ (optional) TS optimization + IRC `tsopt` → `irc`
├─ (optional) thermochemistry `freq`
└─ (optional) single-point DFT `dft`
all runs the following stages in order. Stages 0 and 1 are automatic preprocessing; the rest fire based on the flags you pass.
Preflight (automatic) —
add-elem-infofills missing element symbols and runs only on PDB inputs that lack them (an empty element field in cols 77–78), andfix-altlocresolves alternate conformations and runs only on PDB inputs that contain alternate conformations (altLoc). When you invoke individual subcommands (e.g.extract,opt) you must run these manually if needed.Active-site model (binding-pocket) extraction (when
-c/--centeris set) — runsextractto build the active-site cluster from the substrate selection (see Inputs for the-c/--centersyntax). Forwarded extractor toggles:--radius,--radius-het2het,--include-h2o,--exclude-backbone,--add-linkh,--selected-resn,--verbose. Per-input PDBs are saved under<out-dir>/_work/models/; when multiple structures are supplied, the active-site models are unioned per residue selection. The first active-site model’s net charge is propagated to scan / MEP / TSOPT.Optional staged scan (single-input only) — each
--scan-lists/-sliteral is a list of(i, j, target_Å)tuples. Atom indices use the original input ordering (1-based) and are remapped to the active-site model ordering. PDB selector strings like'TYR,285,CA'are also accepted (space / comma / slash / backtick / backslash delimiters; token order is flexible). Stages run sequentially (stage 2 starts from stage 1’s result), and the stage endpoints become the ordered intermediates that feed the MEP step.MEP search — by default runs single-pass
path-optGSM / DMF on each adjacent pair.--refine-path Trueswitches to recursivepath-search, which automatically detects multistep reactions and builds a detailed MEP per elementary step (complex multistep mechanisms may need manual trial-and-error to converge a satisfactory pathway). The raw engine output is written under<out-dir>/_work/path_opt/(or_work/path_search/with--refine-path True); the merged products (mep.pdb,mep_trj.xyz,energy_diagram_MEP.png) are promoted to the top level. For multi-input runs, full-system PDB templates are forwarded automatically for reference merging.Merge to full systems (with
--refine-path True) — when reference templates exist, the mergedmep_w_ref.pdbis promoted to<out-dir>/, and per-segmentmep_w_ref_seg_NN.pdbfiles remain under<out-dir>/_work/path_search/. The default single-passpath-optrun skips the full-system merge.Per-segment post-processing (reactive segments only — bridge segments without bond changes are skipped):
--tsopt— TS optimization on each HEI active-site model, followed by EulerPC IRC, then IRC-endpoint re-optimization with--thresh-post(defaultbaker). The endpoint optimization working directory is deleted automatically after completion.--thermo—freqon (R, TS, P) for vibrational + thermochemistry data and an MLIP Gibbs diagram.--dft— single-point DFT on (R, TS, P) and a DFT diagram. With--thermo, a DFT//MLIP Gibbs diagram (DFT energies + MLIP thermal correction) is also produced.Shared overrides:
--opt-mode,--opt-mode-post,--flatten,--hessian-calc-mode,--tsopt-max-cycles,--tsopt-out-dir,--freq-*,--dft-*,--dft-engine(GPU-first by default). For Hessian evaluation modes see Hessian evaluation mode.
TSOPT-only mode (single input +
--tsopt, no--scan-lists) — skips MEP / merge; runstsopt+ EulerPC IRC and generates the same energy diagrams plus optional freq / DFT outputs.
Outputs¶
The tree has three top-level zones: deliverables at the root, per-segment deliverables under segments/seg_NN/, and pipeline scratch under _work/ (safe to rm -rf once you have the results you need).
out_dir/ (default: ./result_all/)
├─ summary.log # Text summary (authored at the root)
├─ summary.json # JSON results
├─ mep.pdb # Merged MEP path (promoted from the engine)
├─ mep_w_ref.pdb # MEP merged into the full-system template (multi-input runs)
├─ mep_trj.xyz # Full MEP trajectory
├─ energy_diagram_MEP.png # All-segment MEP barriers
├─ energy_diagram_*.png # Aggregated post-processing diagrams (UMA / Gibbs / DFT, with --tsopt etc.)
├─ segments/ # Per-reactive-segment deliverables (bridge segments are skipped)
│ └─ seg_NN/ # 2-digit index, e.g. seg_01, seg_02
│ ├─ reactant.{pdb,xyz,gjf} # Canonical R/TS/P (output format matches input format)
│ ├─ ts.{pdb,xyz,gjf}
│ ├─ product.{pdb,xyz,gjf}
│ ├─ ts/ # TS optimization output + vibrational analysis (--tsopt)
│ ├─ irc/ # IRC trajectories + plots (--tsopt)
│ ├─ freq/{R,TS,P}/ # frequencies_cm-1.txt + thermoanalysis.yaml (--thermo)
│ └─ dft/ # DFT single-point results (--dft)
└─ _work/ # Pipeline scratch (safe to delete)
├─ models/ # Extracted active-site model PDBs (model_<input_stem>.pdb, when extraction runs)
├─ scan/ # Staged scan results (with --scan-lists)
├─ add_elem_info/ # Preflight element-symbol fills
├─ fix_altloc/ # Preflight altLoc resolution
└─ path_opt/ # Raw MEP-engine output (path_search/ with --refine-path True)
In TSOPT-only mode (single input + --tsopt, no --scan-lists) there is no MEP stage: the optimized R/TS/P plus ts/, irc/, freq/, and dft/ land directly under segments/seg_01/, and the MEP work directory (_work/path_opt/) is absent.
Note
The canonical structures are segments/seg_NN/reactant.*, ts.*, product.* — cite these when reporting mechanisms. The ts/, irc/, freq/, and dft/ subdirectories inside the same seg_NN/ hold the per-stage working files (e.g. ts/vib/imag_*_trj.xyz, irc/*_trj.xyz) for debugging a single stage. The raw MEP-search engine output under _work/path_opt/ is scratch — the products you need (mep.pdb, mep_trj.xyz, energy_diagram_MEP.png) are already promoted to the root.
At -v 2 the console summarises active-site charge resolution, YAML contents, scan stages, MEP progress (GSM / DMF), and per-stage timing; see Verbosity levels.
Plot file naming¶
Energy-diagram filenames encode method and scope:
File |
Generated when |
Content |
|---|---|---|
|
|
All-segment MEP barriers (raw GSM / DMF values) |
|
per-segment |
R → TS → P (MLIP energy) |
|
per-segment thermo completes |
R → TS → P (MLIP Gibbs) |
|
per-segment DFT completes |
R → TS → P (DFT energy) |
|
per-segment DFT + thermo |
R → TS → P (DFT energy + MLIP thermal correction) |
|
all segments aggregated (variants for MLIP / Gibbs / DFT / DFT//MLIP Gibbs) |
Combined across all segments |
|
per-segment IRC completes |
IRC profile (MLIP energy along the trajectory) |
|
all segments aggregated |
IRC profiles concatenated across segments |
Reading summary.log¶
The log is organised into numbered sections:
[1] Global MEP overview — image / segment counts, MEP trajectory plot paths, aggregate MEP energy diagram.
[2] Segment-level MEP summary (UMA path) — per-segment barriers (ΔE‡), reaction energies (ΔE), bond-change summaries.
[3] Per-segment post-processing (TSOPT / Thermo / DFT) — TS imaginary-frequency checks, IRC outputs, MLIP / thermo / DFT energy tables.
[4] Energy diagrams (overview) — diagram tables for MEP / MLIP / Gibbs / DFT plus an optional cross-method summary.
[5] Output directory structure — a compact tree of generated files with inline annotations.
Reading summary.json¶
Top-level keys: out_dir, n_images, n_segments (run metadata and counts); segments (per-segment entries with index, tag, kind, barrier_kcal, delta_kcal, bond_changes); energy_diagrams (optional payloads with labels, energies_kcal, energies_au, ylabel, image paths). summary.json intentionally omits the formatted tables and filesystem tree from summary.log.
CLI options¶
Defaults shown are used when the option is not specified. The full flag list is in the generated command reference; the tables below cover the options that need explanation.
Input expectations:
Extraction enabled (
-c/--center): inputs must be PDB so residues can be located.Extraction skipped: inputs may be PDB / XYZ / GJF.
Multi-structure runs require ≥ 2 structures. For full input-file requirements (hydrogens, element columns, atom-order parity), see CLI Conventions.
Charge is resolved via the standard priority chain (see CLI Conventions: Charge specification). In all, the charge derivation from active-site model extraction (when -c is set) acts as an additional priority layer. Spin resolution: --multiplicity CLI → .gjf template → default 1. Always provide --ligand-charge/-l for non-standard substrates so the correct net charge propagates to scan / MEP / TSOPT / DFT.
Input / output¶
Option |
Description |
Default |
|---|---|---|
|
Two or more full structures in reaction order (single input allowed only with |
Required |
|
Reference PDB for topology when |
None |
|
Top-level output directory. |
|
|
Global toggle for XYZ / TRJ → PDB / GJF companions when templates are available. |
|
|
Dump MEP (GSM / DMF) trajectories. Always forwarded to |
|
|
Base YAML applied first. |
None |
|
Print resolved configuration before execution. |
|
|
Validate and print plan without running stages. |
|
Charge / spin¶
Option |
Description |
Default |
|---|---|---|
|
Net charge or per-resname mapping used when |
None |
|
Force the net system charge (overrides |
None |
|
Spin multiplicity forwarded to all downstream steps. |
|
Extraction¶
Option |
Description |
Default |
|---|---|---|
|
Substrate specification (PDB path, residue IDs, or residue names). |
Required for extraction |
|
Active-site model inclusion cutoff (Å). |
|
|
Independent hetero–hetero cutoff (Å). |
|
|
Include waters (HOH / WAT / TIP3 / SOL). |
|
|
Remove backbone atoms on non-substrate amino acids. |
|
|
Add cap hydrogens for severed bonds. |
|
|
Residues to force include. Despite the name, accepts residue IDs (colon-separated integers with optional chains / insertion codes, e.g. |
|
|
Comma-separated residue names (with optional charge) to treat as amino acids for backbone truncation and charge assignment (e.g. |
|
|
Freeze cap parents in active-site model PDBs. |
|
MEP search¶
Option |
Description |
Default |
|---|---|---|
|
MEP algorithm: GSM (Growing String Method) or DMF (Direct Max Flux). |
|
|
MEP internal nodes per segment. GSM: total images = |
|
|
MEP maximum optimization cycles. |
|
|
Enable climbing image for standard GSM segments (bridge segments always disable climbing). |
|
|
Workflow preset ( |
|
|
Convergence preset ( |
|
|
Pre-optimize active-site model endpoints before MEP search. Standalone |
|
|
|
|
MLIP calculator¶
Option |
Description |
Default |
|---|---|---|
|
MLIP predictor parallelism (workers > 1 disables analytic Hessians; UMA backend only). See workers > 1 disables analytical Hessians (UMA backend). |
|
|
Shared MLIP Hessian engine. |
|
|
MLIP backend. |
|
|
Implicit solvent name for xTB correction (e.g. |
|
|
xTB solvent model. |
|
Post-processing¶
Option |
Description |
Default |
|---|---|---|
|
Run TS optimization + IRC per reactive segment. |
|
|
Run vibrational analysis ( |
|
|
Run single-point DFT on R / TS / P. |
|
|
Optimizer preset for TSOPT + post-IRC ( |
|
|
Convergence preset for post-IRC endpoint optimizations. |
|
|
Enable surplus-imaginary-mode flattening in |
|
Warning
--dft single-point calculations (PySCF / GPU4PySCF) are very expensive for models above ~300 atoms — HPC clusters with high-end GPUs (e.g. A100, H200) are typically required.
TSOPT optimizer selection order: --opt-mode-post (if set) → --opt-mode (only when explicitly provided) → TSOPT default (hess → rsirfo). Example: --opt-mode grad --opt-mode-post hess uses L-BFGS for path optimization and RS-I-RFO for TS refinement.
TSOPT / freq / DFT / scan overrides¶
Option |
Description |
Default |
|---|---|---|
|
Override |
|
|
Custom tsopt subdirectory. |
None |
|
Base directory override for freq outputs. |
None |
|
Maximum modes to write. |
|
|
Mode animation amplitude (Å). |
|
|
Frames per mode animation. |
|
|
Mode sorting behavior. |
|
|
Thermochemistry temperature (K). |
|
|
Thermochemistry pressure (atm). |
|
|
DFT backend (GPU4PySCF or PySCF). In |
|
|
DFT outputs base directory override. |
None |
|
Functional / basis pair. |
|
|
Maximum SCF iterations. |
|
|
SCF convergence tolerance. |
|
|
PySCF grid level. |
|
|
Staged scans: |
None |
|
Override the scan output directory. |
None |
|
Force scan indexing. |
None |
|
Maximum step size (Å). |
|
|
Harmonic bias strength (eV · Å⁻²). |
|
|
Relaxation max cycles per step. |
|
|
Override the scan preoptimization toggle. |
None |
|
Override the scan end-of-stage optimization toggle. |
None |
YAML configuration¶
all supports layered YAML — base settings via --config FILE, with the precedence defaults < config < CLI. The effective YAML is forwarded to every invoked subcommand, and each subcommand reads its own sections:
Subcommand |
YAML sections |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Minimal example
calc:
model: uma-s-1p1 # uma-s-1p1 | uma-m-1p1
hessian_calc_mode: Analytical # recommended when VRAM permits
gs:
max_nodes: 12
climb: true
dft:
grid_level: 6
Full schema: YAML Reference.
Notes¶
Tip
For large active-site models, the single-structure scan workflow tends to produce more reliable reaction barriers than the multi-structure MEP workflow. When multiple full PDB structures are provided, structural differences in regions unrelated to the reaction coordinate can accumulate and overestimate the barrier. The scan workflow avoids this by driving only the relevant coordinates from a single starting structure. The effect becomes more pronounced as the model size grows.
Reference PDB templates for merging are derived automatically from the original inputs; the explicit
--ref-full-pdboption ofpath-searchis hidden in this wrapper.Extraction radii: passing
0to--radiusor--radius-het2hetis internally clamped to0.001 Åby the extractor.Energies in diagrams are reported relative to the first state (reactant) in kcal/mol.
Omitting
-c/--centerskips extraction and feeds the entire input structures directly to MEP /tsopt/freq/dft; single-structure runs still require either--scan-lists/-sor--tsopt.
See Also¶
Installation · Getting Started · extract · scan · path-opt · path-search · tsopt · irc · freq · dft · Common Error Recipes · Troubleshooting · YAML Reference · Glossary.