scan

Overview

Summary: Drive a reaction coordinate by scanning bond distances with harmonic restraints using the ML/MM calculator. Use -s/--scan-lists to define targets as a YAML/JSON spec file (recommended) or as inline Python literals.

mlmm scan performs a staged, bond-length-driven scan using the ML/MM calculator (mlmm.mlmm_calc.mlmm) with harmonic restraints. At each step, the temporary targets are updated, restraint wells are applied, and the structure is relaxed with LBFGS. The ML/MM calculator couples an MLIP backend (selected via -b/--backend; default: UMA) and hessian_ff.

Minimal example

mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s scan.yaml --print-parsed -o ./result_scan

Output checklist

  • result_scan/stage_01/result.pdb (or result.xyz)

  • result_scan/stage_02/result.pdb (or result.xyz)

  • result_scan/stage_*/scan_trj.xyz and scan.pdb (always generated)

Common examples

  1. First validate parsed scan targets from YAML spec.

mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s scan.yaml --print-parsed
  1. Use inline literal input.

mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s "[(12,45,2.20)]"
  1. Dump trajectories for stage-by-stage inspection.

mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s scan.yaml --dump -o ./result_scan_dump

Note: Add --print-parsed when you want to verify parsed stage targets from -s/--scan-lists.

Usage

mlmm scan -i INPUT.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q CHARGE [-m MULT] \
 [-s scan.yaml | -s "[(I,J,TARGET_ANG)]"] [options]

Examples

# Recommended: YAML/JSON spec
cat > scan.yaml << 'YAML'
one_based: true
stages:
 - [[12, 45, 2.20]]
 - [[10, 55, 1.35], [23, 34, 1.80]]
YAML
mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s scan.yaml --print-parsed

# Alternative: inline Python literal
mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s "[(12,45,2.20)]"

# Two stages with dumps, frozen atoms, and YAML overrides
mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q -1 -m 1 --freeze-atoms "1,3,5" -s "[(12,45,2.20)]" \
 "[(10,55,1.35),(23,34,1.80)]" --max-step-size 0.20 --dump

Inline literal format

When -s/--scan-lists receives a value that is not a file path, it is treated as a Python literal string evaluated by the CLI. Shell quoting matters.

Basic structure

Each literal is a Python list of triples (atom1, atom2, target_A):

-s '[(atom1, atom2, target_A),...]'
  • Wrap the entire literal in single quotes so the shell does not interpret parentheses or spaces.

  • Each triple drives the distance between atom1atom2 toward target_A.

  • One literal = one stage. For multiple stages, pass multiple literals after a single -s/--scan-lists flag (do not repeat the flag).

Specifying atoms

Atoms can be given as integer indices or PDB selector strings:

Method

Example

Notes

Integer index

(1, 5, 2.0)

1-based by default (--one-based)

PDB selector

("TYR,285,CA", "MMT,309,C10", 2.0)

Residue name, residue number, atom name

PDB selector tokens can be separated by any of: comma ,, space, slash /, backtick `, or backslash \. Token order is flexible.

# All of these specify the same atom:
"TYR,285,CA"
"TYR 285 CA"
"TYR/285/CA"
"285,TYR,CA" # order is flexible

Quoting rules

# Correct: single-quote the list, double-quote selector strings inside
-s '[("TYR,285,CA","MMT,309,C10",1.35)]'

# Correct: integer indices need no inner quotes
-s '[(1, 5, 2.0)]'

# Avoid: double-quoting the outer literal requires escaping inner quotes
-s "[(\"TYR,285,CA\",\"MMT,309,C10\",1.35)]"

Multiple stages

Pass multiple literals after a single -s/--scan-lists flag. Each literal becomes one stage:

# Stage 1: drive one bond to 1.35 Å
# Stage 2: drive two bonds simultaneously
-s \
 '[("TYR,285,CA","MMT,309,C10",1.35)]' \
 '[("TYR,285,CA","MMT,309,C10",2.20),("TYR,285,CB","MMT,309,C11",1.80)]'

Stages run sequentially; each starts from the previous stage’s relaxed result. Do not repeat the -s/--scan-lists flag – supply all stage literals after a single flag.

Bidirectional scan (4-tuple)

Instead of a 3-tuple (i, j, target), you can pass a 4-tuple (i, j, start, end) to scan in both directions from the current geometry. The CLI automatically expands each 4-tuple into two stages:

  1. Pass 1: Drive ij from the current distance toward start.

  2. Pass 2: Restore the initial geometry and drive ij toward end.

The concatenated trajectory is assembled as start initial end, giving a continuous path through the starting structure.

# Bidirectional scan: drive bond 12--45 from current geometry
# toward 1.35 Å (pass 1) and toward 2.50 Å (pass 2)
mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s '[(12, 45, 1.35, 2.50)]'

This is equivalent to two manual stages with a geometry reset between them, but avoids the need to script it yourself. Mixed 3-tuples and 4-tuples are accepted in the same literal.

Workflow

  1. Load the structure through geom_loader, resolving charge/spin from the CLI or defaults. Provide --parm, --model-pdb, -q/--charge, and optionally -m/--multiplicity for the ML/MM calculator.

  2. Optionally run an unbiased preoptimization (--preopt) before any biasing so the starting point is relaxed.

  3. Parse stage targets from -s/--scan-lists (YAML/JSON spec file or inline literal), then normalize the (i, j) indices (1-based by default). When the input is a PDB, each entry may be either an integer index or an atom selector string like 'TYR,285,CA'; selector fields can be separated by spaces, commas, slashes, backticks, or backslashes and may be in any order.

  4. Compute the per-bond displacement and split into steps:

  • For scan tuples [(i, j, target_A)], compute delta = target - current_distance_A.

  • With --max-step-size = h, the stage takes N = ceil(max(|delta|) / h) biased relaxations.

  • Each pair’s incremental change is delta_k = delta_k / N (Å). At step s, the temporary target is r_k(s) = r_k(0) + s * delta_k.

  1. March through all steps, applying the harmonic wells E_bias = sum 1/2 * k * (|r_i - r_j| - target_k)^2 and minimizing with LBFGS. k comes from --bias-k (eV/Ų) and is converted once to Hartree/Bohr^2. Coordinates are stored in Bohr for PySisyphus and converted internally for reporting.

  2. After the last step of each stage, optionally run an unbiased relaxation (--endopt) before reporting covalent bond changes and writing the result.* files.

  3. Repeat for every stage; optional trajectories are dumped only when --dump is True.

CLI options

Option

Description

Default

-i, --input PATH

Input PDB (or XYZ with --ref-pdb for topology).

Required

--parm PATH

Amber prmtop for the full REAL system.

Required

--model-pdb PATH

PDB defining the ML region (atom IDs). Optional when --detect-layer is enabled or --model-indices is provided.

None

--model-indices TEXT

Comma-separated ML-region atom indices (ranges allowed).

None

--model-indices-one-based / --model-indices-zero-based

Interpret --model-indices as 1-based or 0-based.

True (1-based)

--detect-layer / --no-detect-layer

Detect ML/MM layers from input PDB B-factors.

True

-q, --charge INT

Total ML-region charge.

None (required unless -l is given)

-l, --ligand-charge TEXT

Per-resname charge mapping (e.g., GPP:-3,SAM:1). Derives total charge when -q is omitted.

None

-m, --multiplicity INT

Spin multiplicity (2S+1).

1

--freeze-atoms TEXT

Comma-separated 1-based atom indices to freeze (merged with YAML geom.freeze_atoms).

None

--hess-cutoff FLOAT

Distance cutoff (Å) from ML region for MM atoms to include in Hessian calculation. Can be combined with --detect-layer.

None

--movable-cutoff FLOAT

Movable-MM distance cutoff (Å); providing this disables --detect-layer.

None

-s, --scan-lists TEXT

Scan targets: a YAML/JSON spec file path (auto-detected) or inline Python literal(s) with (i, j, target_A) triples or (i, j, start, end) 4-tuples for bidirectional scans. Each literal is one stage; supply multiple literals after a single flag. i/j can be integer indices or PDB atom selectors like "TYR,285,CA".

Required

--one-based/--zero-based

Interpret atom indices as 1-based (default) or 0-based.

True (1-based)

--print-parsed/--no-print-parsed

Print parsed stage tuples after -s/--scan-lists resolution.

False

--max-step-size FLOAT

Maximum change in any scanned bond per step (Å). Controls the number of integration steps.

0.20

--bias-k FLOAT

Harmonic bias strength k in eV/Ų.

300

--opt-mode {grad,hess,lbfgs,rfo,light,heavy}

Compatibility option for mlmm all forwarding. Current scan relaxations use LBFGS regardless of mode.

None

--max-cycles INT

Maximum LBFGS cycles per biased step and per pre/end optimization stage.

10000

--relax-max-cycles INT

Compatibility alias of --max-cycles (overrides it when provided).

None

--preopt/--no-preopt

Run an unbiased optimization before scanning.

False

--endopt/--no-endopt

Run an unbiased optimization after each stage.

False

--dump/--no-dump

Dump per-step optimizer trajectory files. Note: scan_trj.xyz/scan.pdb are always written regardless of this flag.

False

--out-dir TEXT

Output directory root.

./result_scan/

--thresh TEXT

Convergence preset (gau_loose|gau|gau_tight|gau_vtight|baker|never).

None (inherits gau)

--config FILE

Base YAML configuration file (applied first).

None

--ref-pdb FILE

Reference PDB topology when --input is XYZ.

None

-b, --backend CHOICE

MLIP backend for the ML region: uma (default), orb, mace, aimnet2.

None (uma applied internally)

--embedcharge/--no-embedcharge

Enable xTB point-charge embedding correction for MM-to-ML environmental effects.

False

--embedcharge-cutoff FLOAT

Cutoff radius (Å) for embed-charge MM atoms.

12.0

--dry-run/--no-dry-run

Validate options and print the execution plan without running the scan. Shown in --help-advanced.

False

--convert-files/--no-convert-files

Toggle XYZ/TRJ to PDB companions when a PDB template is available.

True

Outputs

out_dir/ (default: ./result_scan/)
├─ scan_trj.xyz              # Combined trajectory across all stages (always written)
├─ scan.pdb                  # Combined PDB companion (PDB inputs only; always written)
├─ preopt/                   # Present when --preopt is True
│  ├─ result.xyz
│  └─ result.pdb             # Only for PDB inputs
└─ stage_XX/                 # One folder per stage (k = 01..K)
   ├─ result.xyz             # Final (possibly endopt) geometry
   ├─ result.pdb             # If input was PDB
   ├─ scan_trj.xyz           # Per-stage biased step frames (always written)
   └─ scan.pdb               # PDB version of scan_trj.xyz (PDB inputs only; always written)

YAML configuration

  • coord_type: Coordinate type (cartesian vs dlc internals).

  • freeze_atoms: 0-based frozen atoms merged with CLI --freeze-atoms.

Section calc / mlmm

  • ML/MM calculator setup: charge, spin, backend, embedcharge, UMA-specific model/task_name, device, neighbor radii, Hessian options, etc.

Section opt / lbfgs

  • Optimizer settings: thresh, max_cycles, print_every, step controls, line search, dumping flags.

Section bias

  • k (300): Harmonic strength in eV/Ų.

Section bond

  • MLIP-based bond-change detection:

  • device ("cuda"): MLIP device for graph analysis.

  • bond_factor (1.20): Covalent-radius scaling for cutoff.

  • margin_fraction (0.05): Fractional tolerance for comparisons.

  • delta_fraction (0.05): Minimum relative change to flag formation/breaking.

See Also

  • Common Error Recipes – Symptom-first failure routing

  • Troubleshooting – Detailed troubleshooting guide

  • scan2d – 2D distance grid scan

  • scan3d – 3D distance grid scan

  • opt – Single-structure geometry optimization

  • all – End-to-end workflow with --scan-lists for single-structure inputs

  • path-search – MEP search using scan endpoints as intermediates