add-elem-info

mlmm add-elem-info adds or repairs PDB element symbols (columns 77-78) using Biopython. Use it to add element columns to a PDB that lacks them, or to correct existing ones, before downstream tools that require them; the --overwrite flag re-infers and overwrites unreliable element fields. It parses the input PDB with Biopython (PDBParser), assigns atom.element using residue context and atom-name heuristics, and writes via PDBIO to populate columns 77-78. It supports ATOM and HETATM records across all models/chains/residues without altering coordinates.

Examples

Command form:

mlmm add-elem-info -i INPUT [-o OUTPUT] [--overwrite]

Add or repair element columns, overwriting the input file in place:

mlmm add-elem-info -i 1abc.pdb

Write the result to a separate output file:

mlmm add-elem-info -i 1abc.pdb -o 1abc_fixed.pdb

Re-infer and overwrite existing element fields:

mlmm add-elem-info -i 1abc.pdb --overwrite

Workflow

  1. Parse the input PDB with Bio.PDB.PDBParser, mirroring the residue definitions used in extract.py (AMINO_ACIDS, WATER_RES, ION).

  2. For each atom, guess the element by combining the atom name, residue name, and whether the record is HETATM:

  • Ion residues: Prefers residue-derived elements; polyatomic ions (e.g., NH4, H3O+) are assigned per atom (H/N/O).

  • Proteins, nucleic acids, water: Maps H/D to H; water atoms to O/H; first-letter mapping for P/N/O/S; recognizes Se; carbon labels (CA/CB/CG/…) to C.

  • Ligands/cofactors: Uses atom-name prefixes (C*/P*, excluding CL) and two-letter/one-letter normalization; recognizes halogens (Cl/Br/I/F).

  1. Write the structure through PDBIO:

  • No -o/--out given: overwrites the input file.

  • -o/--out given: writes to the specified path.

  1. Print a summary reporting total atoms, newly assigned, kept existing, overwritten (when --overwrite), per-element counts, and up to 50 unresolved atoms (model/chain/residue/atom/serial).

Outputs

  • PDB file with element columns (77-78) populated or corrected

  • Console report with totals for processed/assigned atoms, per-element counts, and up to 50 unresolved atoms

CLI options

Option

Description

Default

-i, --input PATH

Input PDB file.

Required

-o, --out PATH

Output PDB path. When omitted, the input file is overwritten.

None (overwrites input)

--overwrite/--no-overwrite

Re-infer and overwrite element fields even if already present (by default, existing values are preserved).

False

The full flag list is in the generated command reference.

See Also