Status (2026-04-10)

Three files in coot-utils/:

coot-coord-utils-gemmi.hh — header
coot-coord-utils-gemmi.cc — implementations
test-coord-utils-gemmi.cc — 26 comparison tests

Added to Makefile.am: .cc in library sources, .hh in installed headers, test binary in check_PROGRAMS.

All 26 tests pass. Build playground: build-for-claude/coot-utils/

Critical discovery: PDBv3 atom names

mmdb uses 4-character padded atom names ( CA , ` N , C ). gemmi uses unpadded names (CA, N, C). The codebase has ~90 // PDBv3 FIXME` markers at sites using padded names.

The problem with copy_from_mmdb(): gemmi’s copy_from_mmdb() copies m_atom.label_atom_id into atom.name. Despite being documented as “not aligned” in mmdb_atom.h, mmdb pads label_atom_id identically to name when reading PDB files. Verified with pure mmdb (no coot code involved) — both fields are 4 chars padded.

Current workaround: coot::trim_atom_names(gemmi::Structure &st) in coot-coord-utils-gemmi.{hh,cc}. Strips leading/trailing spaces from all atom names. Must be called after every copy_from_mmdb(). The test file wraps this in a helper:

gemmi::Structure gemmi_from_mmdb(mmdb::Manager *mol) {
   gemmi::Structure st = gemmi::copy_from_mmdb(mol);
   coot::trim_atom_names(st);
   return st;
}

Proper fix (TODO): This should be fixed in gemmi’s copy_from_mmdb() itself — propose the change upstream to Marcin.

All gemmi functions use unpadded atom names ("CA", "N", "C", "CB", etc.). This is the correct style going forward. The // PDBv3 FIXME markers in the codebase (~90 sites) map the work needed when migrating each file.

The hardest areas for the PDBv3 transition will be ideal/ and geometry/ — padded atom names are deeply embedded in the restraint matching system.

mmdb symmetry setup

mmdb needs syminfo.lib to resolve spacegroup operations. Without it, GetTMatrix() silently fails and reports no symmetry. The test calls setup_syminfo() (from utils/setup-syminfo.hh) in main() before running tests. Without this, a P 21 21 21 structure was incorrectly reported as having no symmetry by the mmdb version.

What was converted (25 functions + trim_atom_names)

Utility

Atom-level (4, in `coot::`)

Function	Notes
`distance(gemmi::Atom&, gemmi::Atom&)`	Manual sqrt, matches mmdb overload
`angle(gemmi::Atom&, gemmi::Atom&, gemmi::Atom&)`	Returns degrees, clamps cos_theta
`co(gemmi::Atom&)`	Returns `clipper::Coord_orth`
`is_hydrogen_atom(gemmi::Atom&)`	Delegates to `at.is_hydrogen()`

Residue-level (6, in `coot::util::`)

Function	Notes
`get_residue_centre`	Mean of all atom positions
`get_CA_position_in_residue`	Finds atom named `"CA"` (unpadded)
`get_CB_position_in_residue`	Finds atom named `"CB"` (unpadded)
`is_nucleotide`	Hardcoded residue name list
`residue_has_hydrogens_p`	Uses `at.is_hydrogen()`
`residue_has_hetatms`	Checks `res.het_flag == 'H'`

Chain-level (5, in `coot::util::`)

Function	Notes
`min_and_max_residues`	Uses `res.seqid.num.value`
`min_resno_in_chain`	Pair<bool,int> return
`max_resno_in_chain`	Pair<bool,int> return
`residue_types_in_chain`	Returns sorted unique names via set
`get_number_of_protein_or_nucleotides`	Nucleotide via name; protein via `entity_type == Polymer`

Structure-level (15, mix of `coot::` and `coot::util::`)

Function	Notes
`centre_of_molecule`	First model only
`radius_of_gyration`	First model only
`mol_has_symmetry`	Uses `find_spacegroup()`, checks `operations().order() > 1`
`mol_is_anisotropic`	Checks any `at.aniso.nonzero()`
`get_position_hash`	Sum of x-differences (matches mmdb algorithm, no TER atoms)
`residue_types_in_molecule`	Set-based unique types
`non_standard_residue_types_in_molecule`	Uses `standard_residue_types()` from coot-coord-utils.hh
`chains_in_molecule`	Chain name list
`number_of_residues_in_molecule`	Sum of chain.residues.size()
`max_number_of_residues_in_chain`	Max chain length
`number_of_chains`	models[0].chains.size()
`max_resno_in_molecule`	Max seqid across all chains
`max_min_max_residue_range`	Max (max-min+1) across chains (inclusive, matching mmdb)
`alt_confs_in_molecule`	Includes `""` for non-alt atoms (matching mmdb)
`extents`	Min/max bounding box
`median_position`	Sort-based median per axis
`count_cis_peptides`	Omega torsion from geometry, C-N distance < 3.0 A check

Bugs found and fixed during testing

Issue	Root cause	Fix
`get_CA_position_in_residue` failed	Atom names padded by mmdb, gemmi searched for `"CA"` not `" CA "`	`trim_atom_names()`
`get_position_hash` mismatch (57 vs 178516)	Gemmi version summed absolute positions, mmdb sums x-differences	Rewrote to match mmdb algorithm
`mol_has_symmetry` mismatch (mmdb=0, gemmi=1)	mmdb lacked syminfo.lib; gemmi correctly found P 21 21 21	Added `setup_syminfo()`, changed gemmi impl to use `find_spacegroup()`
`max_min_max_residue_range` off by 1	mmdb uses inclusive range (max-min+1), gemmi used (max-min)	Added +1
`alt_confs_in_molecule` mismatch	mmdb includes empty string for non-alt atoms, gemmi skipped them	Include `""` for `altloc == '\0'`
`count_cis_peptides` mismatch	Gemmi used `\\|omega\\| < 30 deg`, mmdb uses `\\|180-pos_torsion\\| > 90` + distance check	Matched mmdb’s distortion/distance logic
Segfault on all tests	`replace_all` on `copy_from_mmdb` caught the helper’s own body → infinite recursion	Fixed `gemmi_from_mmdb()` to call `gemmi::copy_from_mmdb()`
Missing includes	`<fstream>`, `<set>`, `"coot-coord-utils.hh"`, `"utils/setup-syminfo.hh"`	Added

Remaining issues to watch

get_number_of_protein_or_nucleotides: Uses res.entity_type == gemmi::EntityType::Polymer — may not be populated by copy_from_mmdb(). No test for this yet.
residue_has_hetatms: Checks res.het_flag == 'H'. copy_from_mmdb() does set het_flag (verified in gemmi source line 345), so this should be OK.
is_nucleotide: Hardcoded name list. The mmdb version may use a different heuristic. Should verify they agree on a nucleotide-containing structure.
get_position_hash: mmdb and gemmi hashes differ numerically (TER atoms affect mmdb’s x-difference chain). Test verifies gemmi self-consistency rather than cross-comparing. This is fine — the hash just needs to detect when atoms move.
First-model-only pattern: Structure-level functions use for (...) { ... break; }. Works but could use st.models[0] directly with an empty check.

What was NOT converted (~75+ functions remaining in coot-coord-utils.hh)

Residue operations (high value for migration)

get_residue(chain_id, resno, ins_code, mol) — residue lookup
get_atom(atom_spec, mol) — atom lookup by spec
deep_copy_residue — residue duplication
get_residue_name(chain_id, resno, ins_code, mol) — name lookup
residues_near_residue — spatial queries
residues_near_position — spatial queries
get_waters — water identification

Atom selection / searching

atoms_with_zero_occ — find zero-occupancy atoms
fill_residue_alt_confs — enumerate alt confs per residue
contact_info — atom contacts

Whole-molecule operations

sort_chains, sort_residues — reordering
shift_to_origin — coordinate transformation
copy_and_delete_hydrogens — hydrogen stripping
move_waters_to_last_chain

Selection API (~1,450 calls across codebase)

NewSelection / SelectAtoms / GetSelIndex / DeleteSelection — hardest migration target
gemmi equivalent: range-based for loops with conditionals, or gemmi::Selection

Functions in coot-map-utils.hh — separate from coord-utils migration

Approach notes

gemmi::Structure created on the fly via gemmi::copy_from_mmdb(mol) + trim_atom_names() — no persistent dual representation
Tests compare mmdb and gemmi results on the same molecule, tolerance-based for floats
copy_from_mmdb() will need extension for full migration (missing: secondary structure, title/header, resolution, biomolecule info)
The mmdb selection API is the biggest migration challenge — ~1,450 call sites
SSM (structure superposition) may not have a gemmi equivalent — can be side-lined
TER atoms: gemmi doesn’t have them. They only matter for PDB file output. Skip them in all logic.

Next steps: Prioritize converting residue-lookup functions (get_residue, get_atom) and spatial queries (residues_near_residue/position) — most commonly used across the codebase.