scan

Overview

Summary: Drive a reaction coordinate by scanning bond distances with harmonic restraints. Use --scan-lists to define target distances. Multiple stages run sequentially, each starting from the previous stage’s relaxed result.

At a glance

  • Use when: You have a single structure and want to push specific distances to explore a plausible path (often before path-search/path-opt).

  • Input: One structure + one or more --scan-lists literals (each literal = one stage).

  • Defaults: --opt-mode light (LBFGS), --preopt True, --endopt True, --max-step-size 0.20 Å.

  • Outputs: Per-stage result.xyz (+ optional .pdb/.gjf), and optional concatenated trajectories when --dump True.

  • Note: --scan-lists is parsed as a Python literal; quoting/escaping matters (see examples).

pdb2reaction scan performs a staged, bond-length–driven scan using the UMA calculator and harmonic restraints. At each step, the temporary targets are updated, restraint wells are applied, and the structure is relaxed with LBFGS (--opt-mode light) or RFOptimizer (--opt-mode heavy).

When you provide multiple --scan-lists literals after a single flag, stages run sequentially and each stage starts from the previous stage’s relaxed structure. After the biased walk, optional unbiased pre-/post-optimizations (--preopt, --endopt) can clean up geometries before writing result.* to disk.

For XYZ/GJF inputs, --ref-pdb supplies a reference PDB topology while keeping XYZ coordinates, enabling format-aware PDB/GJF output conversion.

Usage

pdb2reaction scan -i INPUT.{pdb|xyz|trj|...} [-q CHARGE] [--ligand-charge <number|'RES:Q,...'>] [-m MULT] \
                  --scan-lists '[(i,j,targetÅ), ...]' [options]
                  [--convert-files {True\|False}] [--ref-pdb FILE]

Examples

# Single-stage, minimal inputs
pdb2reaction scan -i input.pdb -q 0 --scan-lists '[("TYR,285,CA","MMT,309,C10",1.35)]'

# Two stages, LBFGS relaxations, and trajectory dumping
pdb2reaction scan -i input.pdb -q 0 --scan-lists \
    '[("TYR,285,CA","MMT,309,C10",1.35)]' \
    '[("TYR,285,CA","MMT,309,C10",2.20),("TYR,285,CB","MMT,309,C11",1.80)]' \
    --max-step-size 0.20 --dump True --out-dir ./result_scan/ --opt-mode light \
    --preopt True --endopt True

# Supply multiple stage literals after a single --scan-lists
pdb2reaction scan -i input.pdb -q 0 --scan-lists \
    '[("TYR,285,CA","MMT,309,C10",1.35)]' \
    '[("TYR,285,CA","MMT,309,C10",2.20),("TYR,285,CB","MMT,309,C11",1.80)]'

--scan-lists format

--scan-lists accepts Python literal strings evaluated by the CLI. Shell quoting matters.

Basic structure

Each literal is a Python list of triples (atom1, atom2, target_Å):

--scan-lists '[(atom1, atom2, target_Å), ...]'
  • Wrap the entire literal in single quotes so the shell does not interpret parentheses or spaces.

  • Each triple drives the distance between atom1atom2 toward target_Å.

  • One literal = one stage. For multiple stages, pass multiple literals after a single --scan-lists flag (do not repeat the flag).

Specifying atoms

Atoms can be given as integer indices or PDB selector strings:

Method

Example

Notes

Integer index

(1, 5, 2.0)

1-based by default (--one-based True)

PDB selector

("TYR,285,CA", "MMT,309,C10", 2.0)

Residue name, residue number, atom name

PDB selector tokens can be separated by any of: comma ,, space, slash /, backtick `, or backslash \. Token order is flexible.

# All of these specify the same atom:
"TYR,285,CA"
"TYR 285 CA"
"TYR/285/CA"
"285,TYR,CA"   # order is flexible

Quoting rules

# Correct: single-quote the list, double-quote selector strings inside
--scan-lists '[("TYR,285,CA","MMT,309,C10",1.35)]'

# Correct: integer indices need no inner quotes
--scan-lists '[(1, 5, 2.0)]'

# Avoid: double-quoting the outer literal requires escaping inner quotes
--scan-lists "[(\"TYR,285,CA\",\"MMT,309,C10\",1.35)]"

Multiple stages

Pass multiple literals after a single --scan-lists flag. Each literal becomes one stage:

# Stage 1: drive one bond to 1.35 Å
# Stage 2: drive two bonds simultaneously
--scan-lists \
    '[("TYR,285,CA","MMT,309,C10",1.35)]' \
    '[("TYR,285,CA","MMT,309,C10",2.20),("TYR,285,CB","MMT,309,C11",1.80)]'

Stages run sequentially; each starts from the previous stage’s relaxed result. Do not repeat the --scan-lists flag — supply all stage literals after a single flag.

Workflow

  1. Load the structure through geom_loader, resolving charge/spin from the CLI overrides, the embedded Gaussian template (if present), or defaults. If -q is omitted but --ligand-charge is provided, the input is treated as an enzyme–substrate complex and extract.py’s charge summary derives the total charge before any scans.

  2. Optionally run an unbiased preoptimization (--preopt True) before any biasing so the starting point is relaxed.

  3. For each stage literal supplied via --scan-lists, parse and normalize the (i, j) indices (1-based by default). When the input is a PDB, each entry may be either an integer index or an atom selector string like 'TYR,285,CA'; selector fields can be separated by spaces, commas, slashes, backticks, or backslashes and may be in any order (fallback assumes resname, resseq, atom). Compute the per-bond displacement Δ = target current and split it into N = ceil(max(|Δ|) / h) steps using h = --max-step-size. Every bond receives its own δ = Δ / N increment.

  4. March through all steps, updating the temporary targets, applying the harmonic wells E = Σ ½ k (|ri rj| target)², and minimizing with UMA. Optimizer cycles are capped by --relax-max-cycles unless YAML specifies opt.max_cycles.

  5. After the last step of each stage, optionally run an unbiased relaxation (--endopt True) before reporting covalent bond changes and writing the result.* files.

  6. Repeat for every stage; optional trajectories are dumped only when --dump is True.

CLI options

Option

Description

Default

-i, --input PATH

Structure file accepted by geom_loader.

Required

-q, --charge INT

Total charge (CLI > template). When omitted, charge can be inferred from --ligand-charge; explicit -q overrides any derived value.

Required unless a .gjf template or --ligand-charge supplies it

--ligand-charge TEXT

Total charge or per-resname mapping used when -q is omitted. Triggers extract-style charge derivation on the full complex (PDB inputs or XYZ/GJF with --ref-pdb).

None

--workers, --workers-per-node

UMA predictor parallelism (workers > 1 disables analytic Hessians; workers_per_node forwarded to the parallel predictor).

1, 1

-m, --multiplicity INT

Spin multiplicity 2S+1. Inherits the .gjf template value when available; defaults to 1 when omitted.

.gjf template value or 1

--scan-lists, --scan-list TEXT

Python literal with (i,j,targetÅ) tuples. Each literal is one stage; supply multiple literals after a single flag. i/j can be integer indices or PDB atom selectors like 'TYR,285,CA'.

Required

--one-based {True|False}

Interpret atom indices as 1- or 0-based.

True

--max-step-size FLOAT

Maximum change in any scanned bond per step (Å). Controls the number of integration steps.

0.20

--bias-k FLOAT

Harmonic bias strength k in eV·Å⁻².

300

--relax-max-cycles INT

Cap on optimizer cycles during preopt, each biased step, and end-of-stage cleanups. Used unless YAML sets opt.max_cycles.

10000

--opt-mode TEXT

light → LBFGS, heavy → RFOptimizer.

light

--freeze-links {True|False}

When the input is PDB, freeze the parents of link hydrogens.

True

--dump {True|False}

Dump concatenated biased trajectories (scan.trj/scan.pdb).

False

--convert-files {True|False}

Toggle XYZ/TRJ → PDB/GJF companions for PDB/Gaussian inputs (trajectory conversion only writes PDB).

True

--ref-pdb FILE

Reference PDB topology to use when the input is XYZ/GJF (keeps XYZ coordinates).

None

--out-dir TEXT

Output directory root.

./result_scan/

--thresh TEXT

Convergence preset override (gau_loose, gau, gau_tight, gau_vtight, baker, never).

gau

--args-yaml FILE

YAML overrides for geom, calc, opt, lbfgs, rfo, bias, bond.

None

--preopt {True|False}

Run an unbiased optimization before scanning.

True

--endopt {True|False}

Run an unbiased optimization after each stage.

True

Shared YAML sections

  • geom, calc, opt, lbfgs, rfo: identical keys to those documented in YAML Reference. opt.dump can be set in YAML for optimizer dumps; use --dump to control scan-stage trajectories.

  • --relax-max-cycles applies only when explicitly provided and YAML does not set opt.max_cycles (default 10000).

Section bias

  • k (300): Harmonic strength in eV·Å⁻².

Section bond

UMA-based bond-change detection shared with path-search:

  • device ("cuda"): UMA device for graph analysis.

  • bond_factor (1.20): Covalent-radius scaling for cutoff.

  • margin_fraction (0.05): Fractional tolerance for comparisons.

  • delta_fraction (0.05): Minimum relative change to flag formation/breaking.

Outputs

out_dir/ (default: ./result_scan/)
├─ preopt/                   # Present when --preopt is True
│  ├─ result.xyz
│  ├─ result.pdb             # PDB companion for PDB inputs when conversion is enabled
│  └─ result.gjf             # When a Gaussian template exists and conversion is enabled
└─ stage_XX/                 # One folder per stage
    ├─ result.xyz
    ├─ result.pdb             # PDB mirror of the final structure (conversion enabled)
    ├─ result.gjf             # Gaussian mirror when templates exist and conversion is enabled
    ├─ scan.trj               # Written when --dump is True
    └─ scan.pdb               # Trajectory companion for PDB inputs when conversion is enabled (no scan.gjf is produced)
  • Console summaries of the resolved geom, calc, opt, bias, bond, and optimizer blocks plus per-stage bond-change reports.

Notes

  • Provide multiple literals after a single --scan-lists flag; repeated flags are not accepted. Tuples must have positive targets. Atom indices are normalized to 0-based internally. For PDB inputs, i/j can be selector strings with flexible delimiters (space/comma/slash/backtick/backslash) and unordered tokens.

  • --freeze-links augments user freeze_atoms by adding parents of link-H atoms in PDB files so pockets stay rigid.

  • Charge inherits Gaussian template metadata when available. For non-.gjf inputs, -q/--charge is required unless --ligand-charge is provided (supported for PDB inputs or XYZ/GJF with --ref-pdb); explicit -q still overrides. Multiplicity inherits .gjf metadata when available, otherwise defaults to 1.

  • Stage results (result.xyz plus optional PDB/GJF companions) are written regardless of --dump; trajectories are written only when --dump is True and converted to scan.pdb (PDB inputs only) when conversion is enabled.

YAML configuration (--args-yaml)

The YAML root must be a mapping. YAML parameters override CLI. Shared sections reuse the definitions documented for YAML Reference.

geom:
  coord_type: cart           # coordinate type: cartesian vs dlc internals
  freeze_atoms: []           # 0-based frozen atoms merged with CLI/link detection
calc:
  charge: 0                  # total charge (CLI/template override)
  spin: 1                    # spin multiplicity 2S+1
  model: uma-s-1p1           # UMA model tag
  task_name: omol            # UMA task name
  device: auto               # UMA device selection
  max_neigh: null            # maximum neighbors for graph construction
  radius: null               # cutoff radius for neighbor search
  r_edges: false             # store radial edges
  out_hess_torch: true       # request torch-form Hessian
  freeze_atoms: null         # calculator-level frozen atoms
  hessian_calc_mode: FiniteDifference   # Hessian mode selection
  return_partial_hessian: false         # full Hessian (avoids shape mismatches)
opt:
  thresh: gau                # convergence preset (Gaussian/Baker-style)
  max_cycles: 10000          # optimizer cycle cap
  print_every: 100           # logging stride
  min_step_norm: 1.0e-08     # minimum norm for step acceptance
  assert_min_step: true      # stop if steps fall below threshold
  rms_force: null            # explicit RMS force target
  rms_force_only: false      # rely only on RMS force convergence
  max_force_only: false      # rely only on max force convergence
  force_only: false          # skip displacement checks
  converge_to_geom_rms_thresh: 0.05   # geom RMS threshold when converging to ref
  overachieve_factor: 0.0    # factor to tighten thresholds
  check_eigval_structure: false   # validate Hessian eigenstructure
  line_search: true          # enable line search
  dump: false                # dump trajectory/restart data
  dump_restart: false        # dump restart checkpoints
  prefix: ""                 # filename prefix
  out_dir: ./result_scan/    # output directory
lbfgs:
  thresh: gau                # LBFGS convergence preset
  max_cycles: 10000          # iteration limit
  print_every: 100           # logging stride
  min_step_norm: 1.0e-08     # minimum accepted step norm
  assert_min_step: true      # assert when steps stagnate
  rms_force: null            # explicit RMS force target
  rms_force_only: false      # rely only on RMS force convergence
  max_force_only: false      # rely only on max force convergence
  force_only: false          # skip displacement checks
  converge_to_geom_rms_thresh: 0.05   # RMS threshold when targeting geometry
  overachieve_factor: 0.0    # tighten thresholds
  check_eigval_structure: false   # validate Hessian eigenstructure
  line_search: true          # enable line search
  dump: false                # dump trajectory/restart data
  dump_restart: false        # dump restart checkpoints
  prefix: ""                 # filename prefix
  out_dir: ./result_scan/    # output directory
  keep_last: 7               # history size for LBFGS buffers
  beta: 1.0                  # initial damping beta
  gamma_mult: false          # multiplicative gamma update toggle
  max_step: 0.3              # maximum step length
  control_step: true         # control step length adaptively
  double_damp: true          # double damping safeguard
  mu_reg: null               # regularization strength
  max_mu_reg_adaptions: 10   # cap on mu adaptations
rfo:
  thresh: gau                # RFOptimizer convergence preset
  max_cycles: 10000          # iteration cap
  print_every: 100           # logging stride
  min_step_norm: 1.0e-08     # minimum accepted step norm
  assert_min_step: true      # assert when steps stagnate
  rms_force: null            # explicit RMS force target
  rms_force_only: false      # rely only on RMS force convergence
  max_force_only: false      # rely only on max force convergence
  force_only: false          # skip displacement checks
  converge_to_geom_rms_thresh: 0.05   # RMS threshold when targeting geometry
  overachieve_factor: 0.0    # tighten thresholds
  check_eigval_structure: false   # validate Hessian eigenstructure
  line_search: true          # enable line search
  dump: false                # dump trajectory/restart data
  dump_restart: false        # dump restart checkpoints
  prefix: ""                 # filename prefix
  out_dir: ./result_scan/    # output directory
  trust_radius: 0.1          # trust-region radius
  trust_update: true         # enable trust-region updates
  trust_min: 0.0             # minimum trust radius
  trust_max: 0.1             # maximum trust radius
  max_energy_incr: null      # allowed energy increase per step
  hessian_update: bfgs       # Hessian update scheme
  hessian_init: calc         # Hessian initialization source
  hessian_recalc: 200        # rebuild Hessian every N steps
  hessian_recalc_adapt: null # adaptive Hessian rebuild factor
  small_eigval_thresh: 1.0e-08   # eigenvalue threshold for stability
  alpha0: 1.0                # initial micro step
  max_micro_cycles: 50       # micro-iteration limit
  rfo_overlaps: false        # enable RFO overlaps
  gediis: false              # enable GEDIIS
  gdiis: true                # enable GDIIS
  gdiis_thresh: 0.0025       # GDIIS acceptance threshold
  gediis_thresh: 0.01        # GEDIIS acceptance threshold
  gdiis_test_direction: true # test descent direction before DIIS
  adapt_step_func: true      # adaptive step scaling toggle
bias:
  k: 300                    # harmonic bias strength (eV·Å⁻²)
bond:
  device: cuda               # UMA device for bond analysis
  bond_factor: 1.2           # covalent-radius scaling
  margin_fraction: 0.05      # tolerance margin for comparisons
  delta_fraction: 0.05       # minimum relative change to flag bonds

See Also

  • all — End-to-end workflow with --scan-lists for single-structure inputs

  • path-search — MEP search using scan endpoints as intermediates

  • extract — Generate pocket PDBs before scanning

  • YAML Reference — Full bias and bond configuration options

  • Glossary — Definitions of MEP, Segment