mlmm-toolkit Documentation¶
Version: v0.2.4
mlmm-toolkit is a Python CLI toolkit for automated enzymatic reaction-path modeling using ML/MM (machine-learning / molecular mechanics) methods.
Quick Start by Goal¶
Objectives |
Command |
Guide |
|---|---|---|
First run (end-to-end) |
|
|
Single-structure staged scan ( |
|
|
TS validation ( |
|
|
Run complete reaction path search from PDB |
|
|
View current configuration |
|
|
Extract QM region from protein-ligand complex |
|
|
Build MM topology (parm7/rst7) |
|
|
Define ML/MM layers |
|
|
Optimize a single structure |
|
|
Find and optimize a transition state |
|
|
Search for minimum energy path |
|
|
Run IRC from a transition state |
|
|
Visualize energy profile |
|
|
Export to Gaussian ONIOM / ORCA QM/MM |
|
|
Rebuild XYZ/layered PDB from ONIOM input |
|
|
Draw state energy diagram from numeric values |
|
|
Follow worked tutorials |
– |
|
Diagnose failures by symptom |
– |
|
Understand the big picture (concepts & terms) |
– |
|
Resolve common errors |
– |
|
Look up abbreviations and terms |
– |
Documentation Guide¶
Topic |
Page |
|---|---|
Installation & first run |
|
Key terms & workflow overview |
|
Symptom-first failure routing |
|
Common errors & fixes |
|
CLI conventions & input requirements |
CLI Subcommands¶
Main Workflow¶
Structure Preparation¶
Subcommand |
Description |
|---|---|
Extract active-site pocket (cluster model) from protein-ligand complex |
|
Repair PDB element columns (77-78) |
|
Build AMBER topology (parm7/rst7) with tleap + GAFF2 |
|
Define 3-layer ML/MM regions via B-factor annotation |
Geometry Optimization¶
Path Search & Optimization¶
Subcommand |
Description |
|---|---|
MEP optimization via GSM or DMF (two structures) |
|
Recursive MEP search with automatic refinement (2+ structures) |
Scans¶
Analysis & Post-processing¶
Subcommand |
Description |
|---|---|
Intrinsic Reaction Coordinate calculation |
|
Vibrational frequency analysis & thermochemistry |
|
Single-point DFT calculations (GPU4PySCF / PySCF) |
|
Plot energy profiles from XYZ trajectories |
|
Build an energy diagram from numeric input values |
Export¶
Subcommand |
Description |
|---|---|
Export to Gaussian ONIOM / ORCA QM/MM (`–mode g16 |
|
Import Gaussian/ORCA ONIOM input and reconstruct XYZ + layered PDB |
Configuration & Reference¶
Topic |
Page |
|---|---|
CLI command reference |
|
YAML schema |
|
YAML configuration options |
|
ML/MM calculator architecture |
|
Terminology |
System Requirements¶
Hardware¶
OS: Linux (Ubuntu 20.04+ or CentOS 8+ tested)
GPU: CUDA 12.x compatible
VRAM: Minimum 8 GB (16 GB+ recommended for 1000+ atoms)
RAM: 16 GB+ recommended
Software¶
Python >= 3.11
PyTorch with CUDA support
CUDA 12.x toolkit
AmberTools (for
mm-parm)
Quick Examples¶
Basic ML/MM MEP search¶
mlmm -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'
Full workflow with TS optimization¶
mlmm -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
--tsopt --thermo --dft
Single-structure scan mode¶
mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q 0 -s scan.yaml --print-parsed
TS-only optimization¶
mlmm -i TS_candidate.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
--tsopt
Key Concepts¶
ML/MM 3-Layer System¶
mlmm uses a 3-layer partitioning scheme encoded via PDB B-factors:
ML region (B=0.0): Treated with the selected MLIP backend (default: UMA)
Movable-MM (B=10.0): MM atoms that move during optimization
Frozen (B=20.0): Fixed MM atoms
Hessian-target MM atoms are selected by calculator options (hess_cutoff / explicit lists), not by a dedicated B-factor layer.
Charge and spin¶
Use
--ligand-chargeto specify unknown residue charges:'SAM:1,GPP:-3'Use
-q/--chargeto set the ML-region total chargeSpin multiplicity is set with
-m/--multiplicity(default1)
Boolean options¶
Boolean CLI options use toggle form (--flag / --no-flag):
--tsopt --thermo --no-dft
YAML configuration¶
See the YAML Reference for all options.
Output Structure¶
Typical mlmm all output:
result_all/
├── ml_region.pdb # ML-region definition
├── summary.log # Human-readable summary
├── summary.yaml # Machine-readable summary
├── pockets/ # Extracted cluster models
├── mm_parm/ # AMBER topology files
├── scan/ # (Optional) scan results
├── path_search/ # MEP trajectories and diagrams
│ ├── mep_trj.xyz # MEP trajectory
│ ├── mep.pdb # MEP in PDB format
│ └── seg_*/ # Per-segment details
└── path_search/post_seg_*/ # Post-processing outputs
├── tsopt/ # TS optimization results
├── irc/ # IRC trajectories
├── freq/ # Vibrational modes
└── dft/ # DFT results
Citation¶
If you use this software in your research, please cite:
[1] Ohmura, T., Inoue, S., Terada, T. (2025). ML/MM toolkit – Towards Accelerated Mechanistic Investigation of Enzymatic Reactions. ChemRxiv. https://doi.org/10.26434/chemrxiv-2025-jft1k
License¶
mlmm-toolkit is distributed under the GNU General Public License version 3 (GPL-3.0).
Getting Help¶
# General help
mlmm --help
# Command help
mlmm <subcommand> --help
Note: This documentation is under active development. Some sections may be incomplete or subject to change.