`scan`¶

Overview¶

Summary: Drive a reaction coordinate by scanning bond distances with harmonic restraints. Use --scan-lists/-s to define targets as either a YAML/JSON spec file path (recommended) or inline Python literals.

At a glance¶

Use when: A single structure needs specific distances driven to explore a plausible path (often before path-search/path-opt). Input is one structure + -s/--scan-lists scan.yaml (recommended), or one or more --scan-lists/-s inline literals (each literal = one stage). YAML/JSON file paths avoid shell-quoting pitfalls and version better; inline literals are fine for simple single-stage scans.
Method: MLIP backend (UMA by default; selectable via -b/--backend) with harmonic restraints E = Σ ½ k (|ri − rj| − target)² and LBFGS (--opt-mode grad) or RFOptimizer (--opt-mode hess) per step.
Outputs: Per-stage result.xyz (+ optional .pdb/.gjf), and concatenated scan trajectories (scan_trj.xyz/scan.pdb). --dump controls per-step optimizer trajectory files only.
Defaults: --opt-mode grad (LBFGS), --no-preopt, --no-endopt, --max-step-size 0.20 Å, --bias-k 300 eV·Å⁻², --thresh gau, --out-dir ./result_scan/.
Next step: Feed the staged endpoints (stage_XX/result.pdb) to path-search/path-opt for MEP refinement, or use pdb2reaction all -s ... to chain scan → MEP → TSOPT/IRC/freq/DFT in one command.

pdb2reaction scan performs a staged, bond-length–driven scan using an MLIP backend (UMA by default) and harmonic restraints. At each step, the temporary targets are updated, restraint wells are applied, and the structure is relaxed with LBFGS (--opt-mode grad) or RFOptimizer (--opt-mode hess).

For XYZ/GJF inputs, --ref-pdb supplies a reference PDB topology while keeping XYZ coordinates, enabling format-aware PDB/GJF output conversion.

Minimal example¶

pdb2reaction scan -i input.pdb -q 0 -m 1 -s scan.yaml -o ./result_scan

Output checklist¶

result_scan/stage_01/result.pdb (or result.xyz)
result_scan/stage_02/result.pdb (or result.xyz)
result_scan/stage_*/scan_trj.xyz and scan.pdb (always written; --dump controls per-step optimizer trajectory files only)

Common examples¶

Run from a YAML spec.

pdb2reaction scan -i input.pdb -q 0 -m 1 -s scan.yaml

Use literal input.

pdb2reaction scan -i input.pdb -q 0 -m 1 -s '[("TYR,285,CA","SAM,309,C10",1.35)]'

Dump trajectories for stage-by-stage inspection.

pdb2reaction scan -i input.pdb -q 0 -m 1 -s scan.yaml --dump -o ./result_scan_dump

Note: Add --print-parsed when you want to verify parsed stage targets from --scan-lists/-s.

Usage¶

pdb2reaction scan -i INPUT.{pdb|xyz|trj|...} [-q CHARGE] [-l, --ligand-charge <number|'RES:Q,...'>] [-m MULT] \
 [-b/--backend uma|orb|mace|aimnet2] [--solvent SOLVENT] [--solvent-model alpb|cpcmx] \
 [-s/--scan-lists scan.yaml | '[(i,j,targetÅ),...]'] [options] \
 [--convert-files/--no-convert-files] [--ref-pdb FILE]

Examples¶

# Recommended: YAML/JSON spec file
cat > scan.yaml << 'YAML'
one_based: true
stages:
 - [["TYR,285,CA", "SAM,309,C10", 1.35]]
 - [["TYR,285,CA", "SAM,309,C10", 2.20], ["TYR,285,CB", "SAM,309,C11", 1.80]]
YAML
pdb2reaction scan -i input.pdb -q 0 -s scan.yaml

# Alternative: inline Python literal
pdb2reaction scan -i input.pdb -q 0 -s '[("TYR,285,CA","SAM,309,C10",1.35)]'

# Two stages, LBFGS relaxations, and trajectory dumping
pdb2reaction scan -i input.pdb -q 0 -s \
 '[("TYR,285,CA","SAM,309,C10",1.35)]' \
 '[("TYR,285,CA","SAM,309,C10",2.20),("TYR,285,CB","SAM,309,C11",1.80)]' \
 --max-step-size 0.20 --dump -o ./result_scan/ --opt-mode grad \
 --preopt --endopt

# Supply multiple stage literals after a single -s/--scan-lists
pdb2reaction scan -i input.pdb -q 0 -s \
 '[("TYR,285,CA","SAM,309,C10",1.35)]' \
 '[("TYR,285,CA","SAM,309,C10",2.20),("TYR,285,CB","SAM,309,C11",1.80)]'

Scan-list spec¶

For the YAML/JSON file format, inline Python literal syntax, atom selectors, and quoting rules, see CLI Conventions: Scan-list spec.

Multiple stages¶

Pass multiple literals after a single --scan-lists/-s flag. Each literal becomes one stage:

# Stage 1: drive one bond to 1.35 Å
# Stage 2: drive two bonds simultaneously
-s \
 '[("TYR,285,CA","SAM,309,C10",1.35)]' \
 '[("TYR,285,CA","SAM,309,C10",2.20),("TYR,285,CB","SAM,309,C11",1.80)]'

Stages run sequentially; each starts from the previous stage’s relaxed result.

Bidirectional scan (4-tuple)¶

Instead of a 3-tuple (i, j, target), you can pass a 4-tuple (i, j, start, end) to scan in both directions from the current geometry. The CLI automatically expands each 4-tuple into two stages:

Pass 1: Drive i–j from the current distance toward start.
Pass 2: Restore the initial geometry and drive i–j toward end.

The concatenated trajectory is assembled as start → initial → end, giving a continuous path through the starting structure.

# Bidirectional scan: drive bond 12--45 from current geometry
# toward 1.35 Å (pass 1) and toward 2.50 Å (pass 2)
pdb2reaction scan -i input.pdb -q 0 -s '[(12, 45, 1.35, 2.50)]'

This is equivalent to two manual stages with a geometry reset between them, but avoids the need to script it yourself. Mixed 3-tuples and 4-tuples are accepted in the same literal.

Note

Stage counter with 4-tuples. A 4-tuple expands into two stages in the output tree: the start pass is written under stage_NN/ and the end pass under stage_NN+1/. So if you pass a single 4-tuple as your first literal, you will see stage_01/ and stage_02/, not one combined stage_01/. When mixing 3-tuples and 4-tuples, the counter advances by +1 per 3-tuple and +2 per 4-tuple.

Workflow¶

Load the structure through geom_loader. Charge is resolved via the standard priority chain (see CLI Conventions: Charge specification for details).
Optionally run an unbiased preoptimization (--preopt) before any biasing so the starting point is relaxed.
Parse stage targets from --scan-lists/-s (YAML/JSON file or inline literal), then normalize the (i, j) indices (1-based by default). When the input is a PDB, each entry may be either an integer index or an atom selector string like 'TYR,285,CA'; selector fields can be separated by spaces, commas, slashes, backticks, or backslashes and may be in any order (fallback assumes resname, resseq, atom). Compute the per-bond displacement Δ = target − current and split it into N = ceil(max(|Δ|) / h) steps using h = --max-step-size. Every bond receives its own δ = Δ / N increment.
March through all steps, updating the temporary targets, applying the harmonic wells E = Σ ½ k (|ri − rj| − target)², and minimizing with the MLIP backend. Optimizer cycles are capped by --relax-max-cycles unless YAML specifies opt.max_cycles.
After the last step of each stage, optionally run an unbiased relaxation (--endopt) before reporting covalent bond changes and writing the result.* files.
Repeat for every stage. Concatenated scan trajectories (scan_trj.xyz and scan.pdb) are always written; --dump controls per-step optimizer trajectory files only.

CLI options¶

Option	Description	Default
`-i, --input PATH`	Structure file accepted by `geom_loader`.	Required
`-q, --charge INT`	Total charge (CLI > template). When omitted, charge can be inferred from `--ligand-charge/-l`; explicit `-q` overrides any derived value.	Required unless a `.gjf` template or `--ligand-charge/-l` supplies it
`-l, --ligand-charge TEXT`	Per-residue charge mapping (e.g., `GPP:-3,SAM:1`). Automatically derives the total system charge from PDB residue charges — no manual counting needed. Used when `-q` is omitted (PDB inputs or XYZ/GJF with `--ref-pdb`).	None
`--workers`, `--workers-per-node`	MLIP predictor parallelism (workers > 1 disables analytic Hessians; UMA backend only; `workers_per_node` forwarded to the parallel predictor). See workers > 1 silent FD downgrade for diagnostic notes.	`1`, `1`
`-m, --multiplicity INT`	Spin multiplicity 2S+1. Inherits the `.gjf` template value when available; defaults to `1` when omitted.	`.gjf` template value or `1`
`-s, --scan-lists TEXT`	Scan targets: a YAML/JSON spec file path (recommended) or inline Python literal with `(i,j,targetÅ)` triples or `(i,j,start,end)` 4-tuples for bidirectional scans. Each inline literal is one stage; supply multiple literals after a single flag. `i`/`j` can be integer indices or PDB atom selectors like `'TYR,285,CA'`.	Required
`--one-based/--zero-based`	Interpret atom indices as 1- or 0-based. These are mutually exclusive toggle aliases for the same flag (`--one-based` sets it to `True`, `--zero-based` sets it to `False`).	`True`
`--print-parsed/--no-print-parsed`	Print parsed stage tuples after `--scan-lists/-s` resolution.	`False`
`--max-step-size FLOAT`	Maximum change in any scanned bond per step (Å). Controls the number of integration steps.	`0.20`
`--bias-k FLOAT`	Harmonic bias strength `k` in eV·Å⁻².	`300`
`--relax-max-cycles INT`	Cap on optimizer cycles during preopt, each biased step, and end-of-stage cleanups. Used unless YAML sets `opt.max_cycles`.	`10000`
`--opt-mode TEXT`	`grad` → LBFGS, `hess` → RFOptimizer. See --opt-mode (subcommand-dependent) for how the same token maps to different optimizers under `tsopt`.	`grad`
`--freeze-links/--no-freeze-links`	When the input is PDB, freeze the parents of link hydrogens.	`True`
`--freeze-atoms TEXT`	Comma-separated 1-based atom indices to freeze explicitly (e.g., `'1,3,5'`). Complements `--freeze-links`; applies to any input format.	None
`--dump/--no-dump`	Dump per-step optimizer trajectories. Note: `scan_trj.xyz`/`scan.pdb` are always written regardless of this flag.	`False`
`--convert-files/--no-convert-files`	Toggle XYZ/TRJ → PDB/GJF companions for PDB/Gaussian inputs (trajectory conversion only writes PDB).	`True`
`--ref-pdb FILE`	Reference PDB topology to use when the input is XYZ/GJF (keeps XYZ coordinates).	None
`-o, --out-dir TEXT`	Output directory root.	`./result_scan/`
`--thresh TEXT`	Convergence preset override (`gau_loose`, `gau`, `gau_tight`, `gau_vtight`, `baker`, `never`).	`gau`
`--config FILE`	Base YAML configuration file (applied first).	None
`-b, --backend {uma,orb,mace,aimnet2}`	MLIP backend.	`uma`
`--solvent TEXT`	Implicit solvent name for xTB correction (e.g. `water`). `none` to disable.	`none`
`--solvent-model {alpb,cpcmx}`	xTB solvent model.	`alpb`
`--preopt/--no-preopt`	Run an unbiased optimization before scanning. Scope-dependent default: `False` standalone; flipped to `True` when invoked via `pdb2reaction all` (see `all` → Scan Options).	`False`
`--endopt/--no-endopt`	Run an unbiased optimization after each stage.	`False`
`--out-json/--no-out-json`	Write a machine-readable `result.json` to `out_dir`. See JSON Output Schema for the schema.	`False`

Shared YAML sections¶

geom, calc, opt, lbfgs, rfo: identical keys to those documented in YAML Reference. opt.dump can be set in YAML for optimizer dumps; use --dump to control scan-stage trajectories.
--relax-max-cycles applies only when explicitly provided and YAML does not set opt.max_cycles (default 10000).

Section `bias`¶

k (300): Harmonic strength in eV·Å⁻².

Section `bond`¶

MLIP-based bond-change detection shared with path-search:

device ("auto"): MLIP device for bond analysis.
bond_factor (1.20): Covalent-radius scaling for cutoff.
margin_fraction (0.05): Fractional tolerance for comparisons.
delta_fraction (0.05): Minimum relative change to flag formation/breaking.

Outputs¶

out_dir/ (default:./result_scan/)
├─ preopt/ # Present when --preopt is True
│ ├─ result.xyz
│ ├─ result.pdb # PDB companion for PDB inputs when conversion is enabled
│ └─ result.gjf # When a Gaussian template exists and conversion is enabled
├─ stage_XX/ # One folder per stage
│ ├─ result.xyz
│ ├─ result.pdb # PDB mirror of the final structure (conversion enabled)
│ ├─ result.gjf # Gaussian mirror when templates exist and conversion is enabled
│ ├─ scan_trj.xyz # Always written (concatenated biased trajectory)
│ └─ scan.pdb # Always written for PDB inputs when conversion is enabled (no scan.gjf is produced)
├─ scan_trj.xyz # Combined trajectory across all stages
└─ scan.pdb # Combined PDB trajectory (when conversion is enabled)

Console summaries of the resolved geom, calc, opt, bias, bond, and optimizer blocks plus per-stage bond-change reports.

Notes¶

For symptom-first diagnosis, start with Common Error Recipes, then use Troubleshooting for detailed fixes.
Provide multiple literals after a single --scan-lists/-s flag. Tuples must have positive targets. Atom indices are normalized to 0-based internally for computation. For PDB inputs, i/j can be selector strings with flexible delimiters (space/comma/slash/backtick/backslash) and unordered tokens.
When --freeze-links is active, link-hydrogen parent atoms are automatically frozen (see Link hydrogen and frozen atoms).
Stage results (result.xyz plus optional PDB/GJF companions) are always written. Concatenated scan trajectories (scan_trj.xyz and scan.pdb for PDB inputs with conversion enabled) are also always written. The --dump flag controls only per-step optimizer trajectory files.

geom:
 coord_type: cart # coordinate type: cartesian vs dlc internals
 freeze_atoms: [] # 1-based frozen atoms merged with CLI/link detection
calc:
 charge: 0 # total charge (CLI/template override)
 spin: 1 # spin multiplicity 2S+1
 model: uma-s-1p1 # uma-s-1p1 | uma-m-1p1
 task_name: omol # UMA task name
 device: auto # MLIP device selection
 max_neigh: null # maximum neighbors for graph construction
 radius: null # cutoff radius for neighbor search
 r_edges: false # store radial edges
 out_hess_torch: true # request torch-form Hessian
 freeze_atoms: null # calculator-level frozen atoms
 hessian_calc_mode: FiniteDifference # Hessian mode selection
 return_partial_hessian: true  # partial Hessian over active DOFs
opt:
 thresh: gau # convergence preset (Gaussian/Baker-style)
 max_cycles: 10000 # optimizer cycle cap
 print_every: 100 # logging stride
 min_step_norm: 1.0e-08 # minimum norm for step acceptance
 assert_min_step: true # stop if steps fall below threshold
 rms_force: null # explicit RMS force target
 rms_force_only: false # rely only on RMS force convergence
 max_force_only: false # rely only on max force convergence
 force_only: false # skip displacement checks
 converge_to_geom_rms_thresh: 0.05 # geom RMS threshold when converging to ref
 overachieve_factor: 0.0 # factor to tighten thresholds
 check_eigval_structure: false # validate Hessian eigenstructure
 line_search: true # enable line search
 dump: false # dump trajectory/restart data
 dump_restart: false # dump restart checkpoints
 prefix: "" # filename prefix
 out_dir: ./result_scan/ # output directory
lbfgs:
 thresh: gau # LBFGS convergence preset
 max_cycles: 10000 # iteration limit
 print_every: 100 # logging stride
 min_step_norm: 1.0e-08 # minimum accepted step norm
 assert_min_step: true # assert when steps stagnate
 rms_force: null # explicit RMS force target
 rms_force_only: false # rely only on RMS force convergence
 max_force_only: false # rely only on max force convergence
 force_only: false # skip displacement checks
 converge_to_geom_rms_thresh: 0.05 # RMS threshold when targeting geometry
 overachieve_factor: 0.0 # tighten thresholds
 check_eigval_structure: false # validate Hessian eigenstructure
 line_search: true # enable line search
 dump: false # dump trajectory/restart data
 dump_restart: false # dump restart checkpoints
 prefix: "" # filename prefix
 out_dir: ./result_scan/ # output directory
 keep_last: 7 # history size for LBFGS buffers
 beta: 1.0 # initial damping beta
 gamma_mult: false # multiplicative gamma update toggle
 max_step: 0.3 # maximum step length
 control_step: true # control step length adaptively
 double_damp: true # double damping safeguard
 mu_reg: null # regularization strength
 max_mu_reg_adaptions: 10 # cap on mu adaptations
rfo:
 thresh: gau # RFOptimizer convergence preset
 max_cycles: 10000 # iteration cap
 print_every: 100 # logging stride
 min_step_norm: 1.0e-08 # minimum accepted step norm
 assert_min_step: true # assert when steps stagnate
 rms_force: null # explicit RMS force target
 rms_force_only: false # rely only on RMS force convergence
 max_force_only: false # rely only on max force convergence
 force_only: false # skip displacement checks
 converge_to_geom_rms_thresh: 0.05 # RMS threshold when targeting geometry
 overachieve_factor: 0.0 # tighten thresholds
 check_eigval_structure: false # validate Hessian eigenstructure
 line_search: true # enable line search
 dump: false # dump trajectory/restart data
 dump_restart: false # dump restart checkpoints
 prefix: "" # filename prefix
 out_dir: ./result_scan/ # output directory
 trust_radius: 0.10 # trust-region radius
 trust_update: true # enable trust-region updates
 trust_min: 0.0001 # minimum trust radius
 trust_max: 0.10 # maximum trust radius
 max_energy_incr: null # allowed energy increase per step
 hessian_update: bfgs # Hessian update scheme
 hessian_init: calc # Hessian initialization source
 hessian_recalc: 500 # rebuild Hessian every N steps
 hessian_recalc_adapt: null # adaptive Hessian rebuild factor
 small_eigval_thresh: 1.0e-08 # eigenvalue threshold for stability
 alpha0: 1.0 # initial micro step
 max_micro_cycles: 50 # micro-iteration limit
 rfo_overlaps: false # enable RFO overlaps
 gediis: false # enable GEDIIS
 gdiis: true # enable GDIIS
 gdiis_thresh: 0.0025 # GDIIS acceptance threshold
 gediis_thresh: 0.01 # GEDIIS acceptance threshold
 gdiis_test_direction: true # test descent direction before DIIS
 adapt_step_func: true # adaptive step scaling toggle
bias:
 k: 300 # harmonic bias strength (eV·Å⁻²)
bond:
 device: auto # MLIP device for bond analysis
 bond_factor: 1.2 # covalent-radius scaling
 margin_fraction: 0.05 # tolerance margin for comparisons
 delta_fraction: 0.05 # minimum relative change to flag bonds