dft¶
Overview¶
Summary: Run a single-point DFT calculation on the ML region using PySCF/GPU4PySCF, then recombine with MM energies to obtain the ML(dft)/MM total energy. Results include energy and population analysis (Mulliken, meta-Lowdin, IAO charges).
mlmm dft extracts the ML region from the full enzyme PDB, appends link hydrogens, and runs a single-point PySCF (or GPU4PySCF) calculation. After the DFT evaluation, the script recomputes the ML(dft)/MM total energy by combining the PySCF high-level energy with MM evaluations of the full system (REAL-low) and the ML subset (MODEL-low):
E_total = E_REAL_low + E_ML(DFT) - E_MODEL_low
The default --engine is gpu (GPU4PySCF); use --engine cpu for CPU-only PySCF. The gpu engine raises an error if GPU4PySCF is unavailable. The default functional/basis is wb97m-v/def2-tzvpd. Closed-shell GPU runs use GPU4PySCF’s low-memory rks_lowmem.RKS by default (--lowmem/--no-lowmem); open-shell or CPU paths fall back to standard RKS/UKS automatically.
Minimal example¶
mlmm dft -i enzyme.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q 0 -m 1 --out-dir ./result_dft
Output checklist¶
result_dft/ml_region_with_linkH.xyzresult_dft/result.yamlStandard output block with ML(dft)/MM combined energy
Common examples¶
Change functional/basis for a higher-level single point.
mlmm dft -i enzyme.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q 0 -m 1 --func-basis "wb97m-v/def2-tzvpd" --out-dir ./result_dft_tz
Freeze selected atoms in the ML/MM setup before DFT.
mlmm dft -i enzyme.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q -1 -m 2 --freeze-atoms "1,3,5" --out-dir ./result_dft_freeze
Tighten SCF convergence and allow more cycles.
mlmm dft -i enzyme.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q 0 -m 1 --conv-tol 1e-10 --max-cycle 200 --out-dir ./result_dft_tight
Workflow¶
Input handling – The full enzyme PDB (
-i), Amber topology (--parm), and ML-region definition (--model-pdbor--model-indicesor B-factor detection via--detect-layer) are loaded. Link hydrogens are appended automatically (C/N parents within 1.7 Å) unless explicitlink_mlmmpairs are provided via YAML.SCF build –
--func-basisis parsed into functional and basis. The GPU4PySCF backend is used when available; closed-shell GPU runs additionally use the low-memorygpu4pyscf.dft.rks_lowmem.RKSSCF when--lowmemis on (default). Use--engine cputo force CPU mode.mlmm dftdoes not calldensity_fit()on the SCF object; on the standard GPU/CPU paths the SCF inherits whatever default JK pipeline the backend ships with, and on the lowmem path the SCF uses the memory-efficient direct-JK pipeline ofrks_lowmem.RKS. When--embedchargeis enabled, MM point charges from the Amber topology are embedded into the QM Hamiltonian viapyscf.qmmm.mm_charge(), so the DFT wavefunction is self-consistently polarized by the MM environment.ML(dft)/MM recombination – After the DFT converges, MM evaluations of the full system (REAL-low) and the ML subset (MODEL-low) are computed. The combined energy is reported in Hartree and kcal/mol.
Population analysis & outputs – Mulliken, meta-Lowdin, and IAO charges and spin densities (UKS only) are written alongside the combined energy block in
result.yaml.
CLI options¶
Option |
Description |
Default |
|---|---|---|
|
Full enzyme structure (PDB or XYZ). If XYZ, use |
Required |
|
Reference PDB topology when input is XYZ. |
None |
|
Amber parm7 topology for the full system. |
Required |
|
PDB defining the ML region (atom IDs must match the enzyme PDB). Optional when |
None |
|
Comma-separated atom indices for the ML region (ranges allowed, e.g. |
None |
|
Interpret |
|
|
Detect ML/MM layers from input PDB B-factors (B=0/10/20). |
|
|
Charge of the ML region. |
Required |
|
Spin multiplicity (2S+1) for the ML region. |
|
|
Comma-separated 1-based indices to freeze (e.g. |
None |
|
Functional/basis pair as |
|
|
Maximum SCF iterations. |
|
|
SCF convergence tolerance (Hartree). |
|
|
DFT integration grid level (0=coarse, 3=default, 5=fine, 9=very fine). |
|
|
Force GPU4PySCF ( |
|
|
Use |
|
|
Output directory. |
|
|
Base YAML configuration file applied before explicit CLI options. |
None |
|
Print resolved configuration and continue execution. |
|
|
MLIP backend used for the low-level ONIOM recombination: |
|
|
Enable electrostatic embedding: MM point charges from the Amber topology are added to the PySCF QM Hamiltonian so the DFT wavefunction is polarized by the MM environment. |
|
|
Cutoff radius (Å) for embed-charge MM atoms. |
|
|
Link-atom position mode: |
|
|
MM backend used by the ONIOM low-level evaluation: |
|
|
Enable CMAP (backbone cross-map dihedral correction) in model parm7. Default: disabled (consistent with Gaussian ONIOM). |
|
|
Write a machine-readable |
|
|
Validate options and print execution plan without running DFT. Shown in |
|
|
Toggle XYZ/TRJ to PDB companions when a PDB template is available. |
|
Outputs¶
out_dir/ (default: ./result_dft/)
├── ml_region_with_linkH.xyz # ML-region coordinates (with link-H) used for DFT
├── result.yaml # DFT + ML(dft)/MM energy summary, charges, spin densities
├── result.json # only when --out-json is passed
└── (stdout) # Pretty-printed configuration blocks and energies
result.yamlexpands to:energy: Hartree/kcal/mol values, convergence flag, wall time, backend info (engine:gpu4pyscf(rks_lowmem)/gpu4pyscf/pyscf(cpu);used_gpu;used_lowmem).mlmm_energy: REAL-low / MODEL-low MM evaluations and the recombinedE_total = E_REAL_low + E_ML(DFT) - E_MODEL_lowin Hartree and kcal/mol.charges: Mulliken, meta-Lowdin, and IAO atomic charges (nullwhen a method fails).spin_densities: Mulliken, meta-Lowdin, and IAO spin densities (UKS-only for spins).
It also summarizes charge, multiplicity, spin (2S), functional, basis, convergence knobs, and resolved output directory.
YAML configuration¶
Accepts a mapping root; the dft section (and optional geom, calc/mlmm) is applied when present. Merge order is:
defaults
--configexplicit CLI options
dft keys (defaults in parentheses):
func_basis("wb97m-v/def2-tzvpd"): CombinedFUNC/BASISstring.conv_tol(1e-9): SCF convergence threshold (Hartree).max_cycle(100): Maximum SCF iterations.grid_level(3): PySCFgrids.level.verbose(4): PySCF verbosity (0-9).out_dir("./result_dft/"): Output directory root.
geom:
coord_type: cart # optional geom_loader settings
calc:
model_charge: 0 # ML region charge
model_mult: 1 # spin multiplicity 2S+1
mlmm:
real_parm7: real.parm7 # Amber parm7 topology
model_pdb: ml_region.pdb # ML-region definition
dft:
func_basis: wb97m-v/def2-tzvpd # exchange-correlation functional / basis set
conv_tol: 1.0e-09 # SCF convergence tolerance (Hartree)
max_cycle: 100 # maximum SCF iterations
grid_level: 3 # PySCF grid level
verbose: 4 # PySCF verbosity (0-9)
out_dir: ./result_dft/ # output directory root
Notes¶
Blackwell-architecture GPUs (RTX 50xx): GPU4PySCF may fail with out-of-memory errors even for small systems (~100 atoms). Use
--engine cpuor an external DFT program (ORCA, Gaussian) for production calculations on these GPUs.Out-of-memory with def2-TZVPD: The default basis set
def2-tzvpdis large and may cause OOM for systems with >150 atoms on 16–24 GB GPUs. Use--func-basis 'wb97m-v/def2-svp'as a practical alternative; barrier height errors between def2-SVP and def2-TZVPD are typically 1–3 kcal/mol.Compiled GPU4PySCF wheels may not support non-x86 systems; build from source in that case (see https://github.com/pyscf/gpu4pyscf).
See Also¶
Common Error Recipes – Symptom-first failure routing
Troubleshooting – Detailed troubleshooting guide
freq – Vibrational frequency analysis (often precedes DFT refinement)
opt – Single-structure geometry optimization
all – End-to-end workflow with
--dftYAML Reference – Full
dftconfiguration optionsGlossary – Definitions of DFT, SP (Single Point)