mlmm-toolkit Documentation¶
Version: v0.2.8
mlmm-toolkit is a Python CLI toolkit for automated enzymatic reaction-path modeling using ML/MM (machine learning / molecular mechanics) methods.
Where to start¶
Goal |
Page |
|---|---|
Install and run a first end-to-end pipeline |
|
Concepts (3-layer ONIOM, microiteration, link atoms) |
|
End-to-end pipeline from a PDB ( |
|
Single-structure staged scan ( |
|
TS validation ( |
|
CLI conventions and input requirements |
|
Symptom-first failure routing |
|
Common errors and fixes |
|
Abbreviations and terminology |
CLI Subcommands¶
Main Workflow¶
Subcommand |
Description |
|---|---|
End-to-end workflow: ML/MM model setup -> MEP search -> TS optimization -> IRC -> freq -> DFT |
Structure Preparation¶
Subcommand |
Description |
|---|---|
Define ML region (QM region) from protein-ligand complex |
|
Repair PDB element columns (77-78) |
|
Build AMBER topology (parm7/rst7) with tleap + GAFF2 |
|
Define 3-layer ML/MM regions via B-factor annotation |
Geometry Optimization¶
Path Search & Optimization¶
Subcommand |
Description |
|---|---|
MEP optimization via GSM or DMF (two structures) |
|
Recursive MEP search with automatic refinement (2+ structures) |
Scans¶
Analysis & Post-processing¶
Subcommand |
Description |
|---|---|
Intrinsic Reaction Coordinate calculation |
|
Vibrational frequency analysis & thermochemistry |
|
Single-point DFT calculations (GPU4PySCF / PySCF) |
|
Plot energy profiles from XYZ trajectories |
|
Build an energy diagram from numeric input values |
|
Compare structures and report bond changes |
Utilities¶
Subcommand |
Description |
|---|---|
Resolve PDB alternate conformations (altloc) |
|
Check GPU device information on HPC |
Export¶
Subcommand |
Description |
|---|---|
Export to Gaussian ONIOM / ORCA QM/MM ( |
|
Import Gaussian/ORCA ONIOM input and reconstruct XYZ + layered PDB |
Configuration & Reference¶
Topic |
Page |
|---|---|
CLI command reference |
|
YAML schema |
|
YAML configuration options |
|
ML/MM calculator architecture |
|
Terminology |
System Requirements¶
Hardware¶
OS: Linux (Ubuntu 20.04+ or CentOS 8+ tested)
GPU: CUDA 12.x compatible
VRAM: Minimum 8 GB (16 GB+ recommended for 1000+ atoms)
RAM: 16 GB+ recommended
Software¶
Python >= 3.11
PyTorch with CUDA support
CUDA 12.x toolkit
AmberTools (for
mm-parm)
Quick Examples¶
Basic ML/MM MEP search¶
mlmm -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'
Full workflow with TS optimization¶
mlmm -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
--tsopt --thermo --dft
Single-structure scan mode¶
mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q 0 -s scan.yaml --print-parsed
TS-only optimization¶
mlmm -i TS_candidate.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
--tsopt
Key Concepts¶
ML/MM 3-Layer System¶
mlmm uses a 3-layer partitioning scheme encoded via PDB B-factors:
ML region (B=0.0): Treated with the selected MLIP backend (default: UMA)
Movable-MM (B=10.0): MM atoms that move during optimization
Frozen (B=20.0): Fixed MM atoms that provide a static potential field. Their coordinates do not change during optimization, but their non-bonded interactions (electrostatics and van der Waals) with the Movable-MM and ML regions are still included in the MM energy evaluation.
Hessian-target MM atoms are selected by calculator options (hess_cutoff / explicit lists), not by a dedicated B-factor layer.
Charge and spin¶
Use
--ligand-chargeto specify unknown residue charges:'SAM:1,GPP:-3'Use
-q/--chargeto set the ML-region net chargeSpin multiplicity is set with
-m/--multiplicity(default1)
Boolean options¶
Boolean CLI options use toggle form (--flag / --no-flag):
--tsopt --thermo --no-dft
YAML configuration¶
See the YAML Reference for detailed options.
Output Structure¶
Typical mlmm all output:
result_all/
├── ml_region.pdb # ML-region definition
├── summary.log # Human-readable summary
├── summary.json # Machine-readable summary
├── pockets/ # ML region structures determined by extract
├── mm_parm/ # AMBER topology files
├── scan/ # (Optional) scan results
├── path_search/ # MEP trajectories and diagrams (--no-refine-path uses path_opt/ instead)
│ ├── mep_trj.xyz # MEP trajectory
│ ├── mep.pdb # MEP in PDB format
│ └── seg_*/ # Per-segment details
└── path_search/post_seg_*/ # Post-processing outputs (--no-refine-path uses path_opt/post_seg_*/)
├── tsopt/ # TS optimization results
├── irc/ # IRC trajectories
├── freq/ # Vibrational modes
└── dft/ # DFT results
Citation¶
If you use this software in your research, please cite:
[1] Ohmura, T., Inoue, S., Terada, T. (2025). ML/MM toolkit – Towards Accelerated Mechanistic Investigation of Enzymatic Reactions. ChemRxiv. https://doi.org/10.26434/chemrxiv-2025-jft1k
License¶
mlmm-toolkit is distributed under the GNU General Public License version 3 (GPL-3.0).
Getting Help¶
# General help
mlmm --help
# Command help
mlmm <subcommand> --help
Agent Skills¶
mlmm-toolkit ships AI-agent instructions under .claude/skills/ so your agent can drive ML/MM ONIOM mechanism studies via Claude Code, Cursor, etc. The bundle covers design overview, the 22 CLI subcommands with canonical recipes, structure I/O (PDB B-factor encoding, XYZ / GJF / Amber parm7+rst7), backend installation (UMA / Orb / MACE / AIMNet2 / AmberTools / DFT / xTB), workflows and summary.json parsing, and HPC operation (PBS / SLURM, dynamic dispatch). Copy .claude/skills/ into your project repository or home directory.