MoleculesToTriangles → gemmi migration plan
MoleculesToTriangles → gemmi migration plan
A focused conversion plan for the MoleculesToTriangles/ rendering subsystem, as a
self-contained pilot of the larger mmdb2 → gemmi backend migration. This subsystem is a
good test bed precisely because it is standalone: nothing inside it reaches into other
Coot code, and it returns no mmdb objects — only coot::simple_mesh_t meshes.
Reference trees: gemmi at ~/gemmi/include/gemmi/, mmdb at ~/compile/mmdb2-2.0.22/.
1. Scope by the numbers
~70 files reference mmdb across two libraries:
| Library | Dir | Role |
|---|---|---|
libMoleculesToTrianglesCXXClasses |
CXXClasses/ |
molecule wrapper, selections, colour, representations, primitives |
libMoleculesToTrianglesCXXSurface |
CXXSurface/ |
molecular / electrostatic surface generation |
mmdb symbol frequency (top): mmdb::Atom ×264, mmdb::Manager ×132, mmdb::Residue ×34,
mmdb::Chain ×21, SELECTION_KEY ×16, mmdb::Model ×13, SSE_* ×27, UDR_ATOM ×6,
getVdWaalsRadius ×5.
mmdb-heaviest files: CXXClasses/MolecularRepresentation.cpp (107), CXXClasses/MyMolecule.cpp
(72), CXXClasses/CompoundSelection.cpp (61), CXXSurface/CXXSurfaceMaker.cpp (37),
CXXClasses/tubes.cc (37), CXXSurface/CXXUtils.cpp (21), CXXSurface/CXXCreator.cpp (21).
2. The external API contract (must keep working)
The rest of Coot calls into M2T from api/coot-molecule-moltris.cc,
src/molecular-mesh-generator.cc, src/molecular-mesh-generator-mol-tris.cc,
src/molecule-class-info.h, docking/intermolecular-energy.cc. The surface it depends on:
| Entry point | Currently takes | Returns |
|---|---|---|
MyMolecule(mmdb::Manager*, int ssFlag) |
mmdb manager | wrapped molecule |
MolecularRepresentationInstance::create(MyMolecule, ColorScheme, selStr, style) |
selection string | representation → primitives |
ColorScheme::colorByElementScheme() / …Secondary / …BFactor / …ChainsScheme… |
— | colour scheme |
CompoundSelection(std::string) |
mmdb-style CID string | selection object |
CXXCreator(mmdb::Manager*, int selHnd) |
manager + mmdb selHnd | clipper NXmap (electrostatics) |
CXXUtils::assignCharge(mmdb::Manager*, int selHnd, CXXChargeTable*) |
manager + mmdb selHnd | writes charges |
| output | — | coot::simple_mesh_t (pos/normal/colour + triangles) |
Two of these — CXXCreator and CXXUtils::assignCharge — take an mmdb selection handle
from the caller (atom_sel.SelectionHandle). They are the most entangled with the caller and
are sequenced last (the electrostatics path). Everything else takes a selection string or no
mmdb at all, so the boundary is clean.
Selection strings in use are mmdb CID, e.g. "//A/*.*/*:*", "/*/*/*.*/N,CA,C,O,H",
"/*/*/(WAT,HOH)".
3. Strategy
Gemmi-native internals, mmdb adapter at the boundary. MyMolecule becomes the owner of a
gemmi::Structure. The existing MyMolecule(mmdb::Manager*) constructor stays as a thin adapter
that calls gemmi::copy_from_mmdb() once; a new MyMolecule(gemmi::Structure) constructor is
added for the future when callers feed gemmi directly. Every internal consumer (selections,
colour, representations, surface) is rewritten against gemmi. The external signatures above are
preserved; only the insides change.
This matches the established migration pattern (on-the-fly copy_from_mmdb, see the coot-utils
plan) and keeps the rest of Coot building unchanged throughout.
4. The three hard subsystems (and their gemmi replacements)
mmdb gives M2T three things “for free” that gemmi must be made to provide. All three have a gemmi path — confirmed present in the headers — so the migration is feasible end-to-end.
4A. Selection engine — built on gemmi::Selection
mmdb provides: CID string parsing (Select(h, STYPE_ATOM, "/*/*/...", SKEY_NEW)), numbered
selection handles (int selHnd), per-atom membership (atom->isInSelection(h)), boolean
combination (SKEY_NEW/AND/OR/CLR), and inversion. The whole colour/representation machinery is
built on this: ColorScheme::prepareForMMDB turns each colour rule into a handle, then
colorForAtom asks every atom isInSelection(ruleHandle); representation draw-loops gate each
atom/bond on isInSelection(selHnd).
gemmi already supplies most of this — use it rather than rebuild it:
gemmi::Selection(select.hpp) parses a CID string in its constructor (Selection(const std::string&), CCP4 pdbcur syntax — the same family as mmdb’s) and exposesmatches(const Atom&)/matches(const CRA&)plus filtered iteration (models()/chains()/residues()/atoms()). It supports comma-separated name lists with*and!inversion for chain ids, residue names (the/mdl/chn/*(res).ic/at[el]:alocform, i.e.(WAT,HOH)and(!ALA,…)), atom names, elements and altlocs, sequence ranges, and occupancy/B-factor inequalities (q/b).- Per-atom membership needs no handle.
isInSelection(selHnd)becomessel.matches(atom). A colour rule simply holds agemmi::Selectionand tests each atom — noNewSelection,MakeSelIndex, orDeleteSelectionlifecycle to manage. pymol_select.hppadditionally provides a full boolean AST (AndNode/OrNode/NotNode) overFlatAtom, if a PyMOL-style path is ever wanted.
The only genuine residual is that a single gemmi::Selection is conjunctive (its fields AND
together) — it cannot OR two different selections. So CompoundSelection’s &/|/!/{}
combinator stays, but its job shrinks to combining per-atom booleans:
CompoundSelection’s string tokeniser (setSelectionString, the& | ! {}parser) is already mmdb-independent — keep it.- Each leaf becomes a
gemmi::Selection:MMDBStringPrimitive(CID) →gemmi::Selection(selectionString)with no translation. Empirically verified (probe overreference-structures/1e9g.pdb, 10448 atoms): gemmi parses the raw mmdb CID strings M2T uses and returns identical counts —//A/*.*/*:*→ 4672 (chain A),/*/*/(WAT,HOH,OH2,H2O)→ 1023 (= HOH atoms),/*/*/*/*/""→ 10448. gemmi accepts the*.*residue wildcard and(resname,…)group lists; list inversion!N,C,O,Hworks too. (The only forms that fail are gemmi-native shapes with an empty residue field, e.g.///N,CA,C,O,H.)MMDBSubsetTypePrimitiveMAIN/SIDE/WATER/AMINOACIDS/NUCLEICACIDS/MONOMERS/ALL → the existing CID strings inCompoundSelection.cpp:388feed straight intogemmi::Selection, unchanged.MMDBSecondaryTypePrimitiveSSE_Helix/Strand/None → the one leaf not expressible in CID; test our gemmi per-residue SS (4B) in a small custom leaf predicate.
- The leaf
handleInMMDBbecomesevaluate(structure) -> std::vector<bool>(membership keyed by the stablegemmi::Atom::serialthatcopy_from_mmdbpopulates); NEW/AND/OR/CLR become bitset ops, invert flips the bitset. Where laziness is preferred, hold thegemmi::Selection(s) and evaluatematches()per atom on demand instead of materialising the bitset.
This layer is self-contained and unit-testable in isolation against a gemmi::Structure
(per-clause atom counts must match the current mmdb path), so it is built and verified first.
4B. Secondary structure
mmdb: model->CalcSecStructure(true) fills residue->SSE (SSE_Helix/Strand/Bulge/None); the
HELIX/SHEET header path is secondary_structure_header_to_residue_sse() in MyMolecule.cpp.
drawRibbon switches ribbon width/primitive on residue->SSE.
gemmi: dssp.hpp DsspCalculator::calculate_secondary_structure(NeighborSearch&,
Topo::ChainInfo&) and the free calculate_dssp(...), producing gemmi::SecondaryStructure
(Helix/Strand/Loop/…). Heavier than mmdb (needs a Topo and a NeighborSearch, optionally
hydrogens) but gives better assignments.
Plan: add a gemmi_sse.hh/.cc helper storing per-residue SS in a side-map keyed by
{chain, seqid} (or in gemmi::Residue::flag, a spare char). Provide the same three buckets
drawRibbon needs (helix / strand / other). Keep the HELIX/SHEET header path by reading
gemmi::Structure::helices / sheets (gemmi parses these). USE_HEADER_INFO /
CALC_SECONDARY_STRUCTURE / DONT_USE flag semantics are preserved.
4C. Bonds and VdW radius
mmdb: MakeBonds(true) then atom->GetBonds(AtomBond*, n); MyMolecule::identifyBonds also adds
explicit peptide bonds. mmdb::getVdWaalsRadius(element) for radii.
gemmi: vdw_radius(El) / Element::vdw_r() is a direct radius replacement. For connectivity,
neighbor.hpp NeighborSearch + a covalent-radius distance test reproduces mmdb’s distance-based
MakeBonds; bond_idx.hpp BondIndex gives are_linked. (mmdb MakeBonds is distance+element
based, so a NeighborSearch + covalent_radius sum tolerance is the faithful equivalent and does
not need the monomer library.)
Plan: a gemmi_bonds.hh/.cc helper building a std::vector<std::pair<int,int>> (atom-serial
pairs) via NeighborSearch, plus a bonded_atoms(serial) adjacency for the stick/cylinder draw
loops. Peptide-bond augmentation is ported directly (backbone CA/N/C distance test).
5. Smaller corners
| mmdb usage | site | gemmi replacement |
|---|---|---|
AtomStat centre of selection |
MyMolecule::centreOfSelectionHandle |
mean of selected atom pos |
SeekContacts / SelectNeighbours |
MolecularRepresentation H-bonds; CXXCreator 35 Å context |
NeighborSearch / ContactSearch (contact.hpp) |
UDD float "PerAtomRadius" |
CXXUtils.cpp:57/103/114 etc. |
side-table serial → float, or gemmi::Atom::fraction spare |
UDD int "tmp_atom_int" |
CXXSurface.cpp:509 |
use atom.serial directly |
atom->charge write (partial charge) |
CXXUtils::assignCharge |
side-table serial → float (gemmi charge is signed char formal charge — cannot hold it) |
atom->serNum ordering |
CXXNewHood, CXXCircleNode, CXXQADSurface |
atom.serial |
segID |
MyMolecule::identifySegments |
gemmi has no segID; use subchain or empty |
isAminoacid/isNucleotide/isNucleotideChain |
several | gemmi::find_tabulated_residue / ResidueInfo::is_amino_acid() / is_nucleic_acid() |
WritePDBASCII / ReadCoorFile |
MyMolecule |
gemmi::write_pdb / gemmi::read_structure |
6. Phase plan (dependency-ordered)
Phase 0 — leaf replacements (no architecture commitment). gemmi_atom_radius (vdw +
united-atom table port of CXXUtils::assignUnitedAtomRadius), residue-type predicates, centre-of-
atoms. Unit-testable.
Phase 1 — selection engine (4A). New gemmi-native CompoundSelection::evaluate(structure)
returning membership bitsets; CID-dialect translator; MAIN/SIDE/WATER/… predicates. Unit test
against a known PDB (counts per clause must match the current mmdb path exactly).
Phase 2 — MyMolecule core. Hold gemmi::Structure; mmdb-adapter constructor via
copy_from_mmdb; port processCoords, identifyBonds (4C), identifySegments,
identifyDishyBases, centre/write. SS via 4B.
Phase 3 — colour. ColorScheme / ColorRule / AtomPropertyRampColorRule /
SecondaryColorScheme / SolidColorRule: colorForAtom(const gemmi::Atom*), prepareForMMDB
→ prepare(structure) building rule bitsets.
Phase 4 — MolecularRepresentation. All draw* methods onto gemmi: spheres, sticks,
cylinders, new-sticks, ribbon (uses 4B SS), calphas, dishy/stick bases, H-bonds, surface chunking.
Phase 5 — surfaces (CXXSurface/). CXXSurfaceMaker, CXXUtils, CXXSurface::assignAtom,
CXXQADSurface, CXXNewHood/CXXCircle* (use atom.serial, pos, element). Replace the
“PerAtomRadius” UDD with the side-table.
Phase 6 — electrostatics (most entangled, last). CXXCreator + CXXUtils::assignCharge
take an mmdb selHnd from the caller (coot-molecule-moltris.cc). Either (a) keep these on mmdb
initially behind a converted façade, or (b) change the two call sites to pass a gemmi selection.
Partial charges go to the serial → float side-table.
Each phase keeps the build green; the mmdb constructor remains the live entry until Phase 6.
7. Decisions (settled)
- Boundary strategy — “feed gemmi now” (decided).
MyMoleculetakes agemmi::Structuredirectly; the external callers (api/coot-molecule-moltris.cc,src/molecular-mesh-generator*.cc) change to pass gemmi. No mmdb left at the seam; the mmdb→gemmi conversion (if any) lives in the caller, not inside M2T. - Secondary structure — stub for now (decided). Do not implement an SS calculator yet.
Replace
model->CalcSecStructure(true)/residue->SSEwith a clearly-marked stub + TODO note;drawRibbonand the SSE selection leaf consume that placeholder until the real source (gemmidssp.hppor a CA-geometry calc) is filled in later. See// TODO GEMMI-SSEmarkers. - Selection membership — evaluate
gemmi::Selection::matches(CRA)per atom during the draw iteration (noselHnd, no stored bit); materialise astd::vector<bool>keyed byatom.serialonly where a whole-model membership set is convenient. - Build —
CMakeLists-gemmi.txtonly (not Makefile.am). New M2T sources are added to thecootmoleculestotriangleslibrary;gemmi::gemmi_cppmust be added to that target’starget_link_libraries(currently gemmi is linked only at thecootapilevel).-DUSE_GEMMIis already global. - Electrostatics path (Phase 6) — convert the two mmdb-
selHndcall sites (CXXCreator,CXXUtils::assignCharge) as part of “feed gemmi now”; partial charges →serial → floatside-table.