Coot Structure Validation Best Practices
Coot Structure Validation Best Practices
Overview
When performing structure validation in Coot with both a model and a map, you need to analyze the structure from three complementary perspectives:
- Model-to-Map Validation: How well does the existing model fit the density?
- Map-to-Model Validation: Where is there significant density that is NOT explained by the model?
- Atom Overlap Validation: Are there steric clashes between atoms in the model?
All three perspectives are essential for comprehensive validation.
The Three Types of Validation
Model-to-Map: Finding Problems in Your Model
These functions analyze how well your current model fits the density. They lead you to places in the model that need attention:
- Poor density correlation
- Ramachandran outliers
- Rotamer outliers
- Geometry violations
Atom Overlaps: Finding Steric Clashes
Atom overlap detection identifies clashes between atoms that may not be caught by local geometry validation. These reveal packing problems such as:
- Clashes between distant residues
- Side-chain/side-chain clashes
- Backbone/side-chain clashes
- Clashes with symmetry mates
Critical insight: Ramachandran and rotamer validation catch local geometry problems (within a residue or its immediate neighbors), while atom overlap detection catches global packing problems between any atoms in the structure.
Map-to-Model: Finding Missing Features
The blob-finding function identifies regions of significant density that are not explained by your current model. It leads you to places in the map where you might be missing:
- Waters
- Ligands
- Alternative conformations
- Metal ions
- Other small molecules
- Missing residues or loops
Critical Function: find_blobs_py()
Always include blob detection when performing structure validation with a map.
blobs = coot.find_blobs_py(
imol_model=0, # your protein model
imol_map=1, # the map to search (often difference map)
cut_off_density_level=3.0 # sigma threshold (typically 2.5-4.0)
)
# Returns: list of (position, score) tuples
# [(clipper::Coord_orth, float), ...]
Parameters
imol_model: The model molecule - density explained by this model will be excludedimol_map: The map to search for blobs (usually a difference map, but can be regular map)cut_off_density_level: Sigma threshold for blob detection- 3.0 sigma: Standard threshold for significant features
- 2.5 sigma: More sensitive, finds weaker features
- 4.0 sigma: Conservative, only strong features
Understanding the Results
for position, score in blobs:
x = position.x()
y = position.y()
z = position.z()
print(f"Blob at ({x:.2f}, {y:.2f}, {z:.2f}) - score: {score:.2f}")
The score represents the strength/volume of the unmodeled density. Higher scores indicate more significant features that should be investigated.
Recentering the View
The user likes to see what you are considering and how you change the model, so, if you can, try to use coot.set_rotation_centre() or coot.set_go_to_atom_chain_residue_atom_name() or some such to bring the currently interesting issue to the centre of the screen.
Complete Validation Workflow
1. Model-to-Map Validation
# Ramachandran outliers
rama_outliers = coot.all_molecule_ramachandran_score_py(0)
# Rotamer outliers
rotamer_outliers = coot.rotamer_graphs_py(0)
# Per-residue density correlation
correlation_stats = coot.map_to_model_correlation_stats_per_residue_range_py(
0, # imol_model
"A", # chain_id
1, # start_resno
100, # end_resno
1 # imol_map
)
# Geometry validation
chiral = coot.chiral_volume_errors_py(0)
2. Atom Overlap Validation
# Get worst 30 atom overlaps
overlaps = coot.molecule_atom_overlaps_py(0, 30)
# Check for severe clashes
severe_clashes = [o for o in overlaps if o['overlap-volume'] > 5.0]
if severe_clashes:
print(f"WARNING: {len(severe_clashes)} severe clashes found!")
# For full analysis (caution: can be very large!)
# all_overlaps = coot.molecule_atom_overlaps_py(0, -1)
3. Map-to-Model Validation (Blobs)
# Find unmodeled density in difference map
diff_map_blobs = coot.find_blobs_py(
imol_model=0,
imol_map=2, # difference map
cut_off_density_level=3.0
)
# Find features in regular map (alternative approach)
regular_map_blobs = coot.find_blobs_py(
imol_model=0,
imol_map=1, # 2mFo-DFc map
cut_off_density_level=1.0 # Lower threshold for fitted map
)
3. Comprehensive Validation Report
def comprehensive_validation(imol_model, imol_map, imol_diff_map=None):
"""
Perform complete structure validation combining model and map analysis.
Returns dictionary with all validation metrics.
"""
results = {}
# Model-to-map validation
results['ramachandran'] = coot.all_molecule_ramachandran_score_py(imol_model)
results['rotamers'] = coot.rotamer_graphs_py(imol_model)
# Atom overlap validation
results['atom_overlaps'] = coot.molecule_atom_overlaps_py(imol_model, 30)
severe_clashes = [o for o in results['atom_overlaps'] if o['overlap-volume'] > 5.0]
results['severe_clash_count'] = len(severe_clashes)
# Per-residue correlation (requires chain info)
import coot_utils
chains = coot_utils.chain_ids(imol_model)
results['correlation_by_chain'] = {}
for chain in chains:
n_residues = coot.chain_n_residues(chain, imol_model)
if n_residues > 0:
stats = coot.map_to_model_correlation_stats_per_residue_range_py(
imol_model, chain, 1, 9999, imol_map
)
results['correlation_by_chain'][chain] = stats
# Map-to-model validation (blobs)
if imol_diff_map is not None:
results['diff_map_blobs'] = coot.find_blobs_py(
imol_model, imol_diff_map, 3.0
)
results['map_blobs'] = coot.find_blobs_py(
imol_model, imol_map, 1.0
)
return results
# Usage
validation = comprehensive_validation(
imol_model=0,
imol_map=1,
imol_diff_map=2
)
Interpreting Blob Results
What Different Maps Tell You
Difference Map (mFo-DFc) Blobs:
- Positive blobs (>3σ): Missing atoms/features - something should be added here
- Negative blobs (<-3σ): Incorrectly modeled atoms - something should be removed/moved
- Most reliable for finding genuine missing features
Regular Map (2mFo-DFc) Blobs:
- Less sensitive to model bias
- Good for finding larger missing features (domains, ligands)
- Use lower sigma threshold (0.5-1.5σ)
Common Blob Interpretations
blobs = coot.find_blobs_py(0, 2, 3.0) # diff map, 3 sigma
# Large score (>50): Likely missing ligand, metal, or several waters
# Medium score (10-50): Likely 1-3 waters or alternative conformation
# Small score (3-10): Likely single water or weak alternative conformation
for position, score in blobs:
if score > 50:
print(f"Large feature at {position} - investigate for ligand/metal")
elif score > 10:
print(f"Medium feature at {position} - likely waters")
else:
print(f"Small feature at {position} - check carefully")
Critical Function: molecule_atom_overlaps_py()
Always include atom overlap checking when validating structure geometry.
# Get worst 30 atom overlaps (default behavior after API update)
overlaps = coot.molecule_atom_overlaps_py(
imol=0,
n_pairs=30 # Number of worst overlaps to return (default: 30)
)
# Get ALL overlaps (use with caution - can be hundreds!)
all_overlaps = coot.molecule_atom_overlaps_py(
imol=0,
n_pairs=-1 # -1 means return all overlaps
)
# Each overlap is a dict with:
# {
# 'atom-1-spec': [imol, chain, resno, inscode, atom_name, altconf],
# 'atom-2-spec': [imol, chain, resno, inscode, atom_name, altconf],
# 'overlap-volume': float, # in Ų
# 'radius-1': float,
# 'radius-2': float
# }
Understanding Overlap Results
Overlap volume indicates severity:
- >5.0 Ų: Severe clash - atoms are deeply interpenetrating
- 2.0-5.0 Ų: Moderate clash - needs immediate attention
- 0.5-2.0 Ų: Minor clash - may be acceptable in some contexts
- <0.5 Ų: Very minor overlap - often acceptable
Common clash patterns:
overlaps = coot.molecule_atom_overlaps_py(0, 30)
for overlap in overlaps:
atom1 = overlap['atom-1-spec']
atom2 = overlap['atom-2-spec']
volume = overlap['overlap-volume']
chain1, res1, atom_name1 = atom1[1], atom1[2], atom1[4]
chain2, res2, atom_name2 = atom2[1], atom2[2], atom2[4]
if volume > 5.0:
print(f"SEVERE: {chain1}/{res1} {atom_name1} ↔ {chain2}/{res2} {atom_name2}: {volume:.2f} Ų")
elif volume > 2.0:
print(f"MODERATE: {chain1}/{res1} {atom_name1} ↔ {chain2}/{res2} {atom_name2}: {volume:.2f} Ų")
Why Overlaps Are Essential
Example from tutorial data:
- Ramachandran validation found outliers at A/41-42
- Overlap validation revealed A/41 O ↔ A/43 N: 2.07 Ų backbone clash
- BUT also found A/2 ↔ A/89 clashes (7.45, 6.40 Ų) between distant residues that had PERFECT local geometry!
Key lesson: A model can have perfect Ramachandran and rotamer scores but catastrophic packing problems. You need both local geometry validation (Rama/rotamer) AND global packing validation (overlaps).
Prioritizing Validation Fixes
1. Address High-Confidence Issues First
- Severe atom overlaps (>5 Ų) - atoms deeply interpenetrating, fix immediately
- Rotamer score = 0% with poor density correlation - side-chain is almost certainly wrong
- Ramachandran outliers with poor density correlation - likely wrong
- Large difference map blobs (>4σ) - definitely missing something
- Moderate atom overlaps (2-5 Ų) between distant residues - packing problems
Note on 0% rotamer scores: A rotamer score of 0% is a severe issue - the side-chain is in a conformation rarely seen in nature. However, if the density correlation for that residue is good, it may be a genuine unusual conformation. Always check the density fit before “fixing” a 0% rotamer with good correlation.
2. Investigate Moderate Issues
- Medium difference map blobs (3-4σ) - probably real features
- Rotamer outliers with poor correlation - likely wrong rotamer
- Minor atom overlaps (0.5-2 Ų) - may need adjustment
- Moderate geometry outliers - may need refinement
3. Review Low-Priority Items
- Small blobs near model - might be noise or minor adjustments
- Very minor overlaps (<0.5 Ų) - often acceptable
- Isolated geometry outliers with good density - may be genuine
- Borderline Ramachandran outliers - check context
Automated Validation Example
def validate_and_fix_chain(imol_model, chain_id, imol_map, imol_diff_map):
"""
Automated validation and suggested fixes for a chain.
"""
issues = []
# 1. Check for atom overlaps
overlaps = coot.molecule_atom_overlaps_py(imol_model, 50)
for overlap in overlaps:
atom1 = overlap['atom-1-spec']
atom2 = overlap['atom-2-spec']
volume = overlap['overlap-volume']
# Only report if at least one atom is in this chain
if atom1[1] == chain_id or atom2[1] == chain_id:
severity = 'high' if volume > 5.0 else ('medium' if volume > 2.0 else 'low')
issues.append({
'type': 'atom_overlap',
'atom1': f"{atom1[1]}/{atom1[2]} {atom1[4]}",
'atom2': f"{atom2[1]}/{atom2[2]} {atom2[4]}",
'severity': severity,
'value': volume
})
# 2. Check correlation for each residue
stats = coot.map_to_model_correlation_stats_per_residue_range_py(
imol_model, chain_id, 1, 9999, imol_map
)
for residue_spec, correlation in stats:
if correlation < 0.7: # Poor fit threshold
issues.append({
'type': 'poor_correlation',
'residue': residue_spec,
'severity': 'high',
'value': correlation
})
# 3. Find nearby blobs that might explain poor correlation
blobs = coot.find_blobs_py(imol_model, imol_diff_map, 3.0)
for position, score in blobs:
issues.append({
'type': 'unmodeled_density',
'position': (position.x(), position.y(), position.z()),
'severity': 'high' if score > 50 else 'medium',
'score': score
})
# 4. Check Ramachandran
rama = coot.all_molecule_ramachandran_score_py(imol_model)
for outlier in rama:
if outlier[4] == 'OUTLIER': # Ramachandran region
issues.append({
'type': 'ramachandran_outlier',
'residue': outlier[0:3], # chain, resno, inscode
'severity': 'high'
})
return sorted(issues, key=lambda x: {'high': 0, 'medium': 1, 'low': 2}[x['severity']])
# Usage
issues = validate_and_fix_chain(0, "A", 1, 2)
for issue in issues[:10]: # Top 10 issues
print(f"{issue['type']}: {issue}")
Common Patterns
Water Placement from Blobs
# Find blobs in difference map
blobs = coot.find_blobs_py(0, 2, 3.0)
# Add waters at blob positions
for position, score in blobs:
if 5 < score < 30: # Typical water blob size
# Check if appropriate for water
x, y, z = position.x(), position.y(), position.z()
# Add water at this position
coot.place_typed_atom_at_pointer("HOH")
Missing Residue Detection
# Look for large blobs that might be missing residues
blobs = coot.find_blobs_py(0, 2, 3.0)
missing_residue_candidates = [
(pos, score) for pos, score in blobs
if score > 100 # Large feature
]
for position, score in missing_residue_candidates:
print(f"Large unmodeled density at {position} - check for missing residues")
Key Takeaways
- Always check atom overlaps - local geometry can be perfect while global packing is catastrophic
- Always run blob detection when you have both model and map
- Use difference maps (mFo-DFc) for most sensitive blob detection
- Combine all three validation types (model-to-map, overlaps, map-to-model) for complete picture
- Prioritize by severity - fix severe clashes and high-confidence issues first
- Iterate - fixing one issue may reveal others
- Document - keep track of what you fixed and why
Function Reference
Essential Validation Functions
# Atom overlap detection
overlaps = coot.molecule_atom_overlaps_py(imol, n_pairs=30) # Default: 30 worst
all_overlaps = coot.molecule_atom_overlaps_py(imol, n_pairs=-1) # All overlaps
# Blob detection (map-to-model)
blobs = coot.find_blobs_py(imol_model, imol_map, sigma_cutoff)
# Ramachandran validation
rama = coot.all_molecule_ramachandran_score_py(imol)
# Rotamer validation
rotamers = coot.rotamer_graphs_py(imol)
# Density correlation (model-to-map)
corr = coot.map_to_model_correlation_stats_per_residue_range_py(
imol, chain, start, end, imol_map
)
# Geometry validation
chiral = coot.chiral_volume_errors_py(imol)
Remember: Model-to-map tells you what’s wrong with your model. Atom overlaps tell you about packing problems. Map-to-model tells you what you’re missing.