Coot Structure Validation Best Practices
Coot Structure Validation Best Practices
Overview
When performing structure validation in Coot with both a model and a map, you need to analyze the structure from three complementary perspectives:
- Model-to-Map Validation: How well does the existing model fit the density?
- Map-to-Model Validation: Where is there significant density that is NOT explained by the model?
- Atom Overlap Validation: Are there steric clashes between atoms in the model?
All three perspectives are essential for comprehensive validation.
The Three Types of Validation
Model-to-Map: Finding Problems in Your Model
These functions analyze how well your current model fits the density. They lead you to places in the model that need attention:
- Poor density correlation
- Ramachandran outliers
- Rotamer outliers
- Geometry violations
Understanding Rotamer Validation: Two Different Scores
CRITICAL: rotamer_graphs_py() and score_rotamers_py() report different things
There are two functions that report rotamer information, and they measure fundamentally different aspects:
1. rotamer_graphs_py(imol) - Continuous Probability Density
Returns the probability density at the exact chi angles of the current conformation.
- Measures: “How likely is THIS specific chi1, chi2, chi3… combination?”
- Scale: 0-100%, where 100% = peak of the probability distribution
- Use for: Primary validation metric - this is what you should check
rotamers = coot.rotamer_graphs_py(0)
# Returns: [[chain_id, resno, ins_code, score_percentage, resname], ...]
# score_percentage is the continuous probability density at the actual chi angles
Interpretation guidelines:
- > 50%: Excellent - in high-density region
- 20-50%: Good - acceptable conformation
- 5-20%: Marginal - check density fit carefully
- < 5%: Poor - likely wrong (but check density!)
- < 1%: Very poor - almost certainly wrong
2. score_rotamers_py(...) - Discrete Bin Probabilities
Returns the discrete rotamer library showing what % of structures have each named rotamer.
- Measures: “How common is rotamer ‘m-85’ vs ‘t80’ vs ‘p90’ across all proteins?”
- Scale: Probabilities sum to ~100% across all discrete bins
- Use for: Understanding alternatives - what other conformations exist?
rotamers = coot.score_rotamers_py(0, "A", 42, "", "", 1, 1, 0.001)
# Returns: [[name, probability, density_score, atom_list, richardson_score], ...]
# probability is the discrete bin frequency (e.g., m-85 appears in 43% of structures)
Why the scores differ:
- Discrete bins: “43% of TYR have m-85 rotamer” (which bin?)
- Continuous density: “83% probability at chi1=-62.3°, chi2=-85.1°” (exact angles)
- The continuous score can exceed discrete bin probabilities!
Rotamer Complexity and Expected Scores
Number of chi angles determines score expectations:
| Residue Type | Chi Angles | Typical # Rotamers | Best Rotamer % | Good Score |
|---|---|---|---|---|
| VAL, THR, SER | 1 | 3 | 70-75% | > 40% |
| PHE, TYR, ASP, ASN | 2 | 4-9 | 35-45% | > 20% |
| GLU, GLN, MET, ILE, LEU | 3 | 9-15 | 20-35% | > 10% |
| LYS, ARG | 4 | 30-35 | 9-10% | > 5% |
Key insight: More chi angles = more rotamers = lower individual probabilities
Examples from validation:
- LEU (3 chi): Best possible = 59%, so 30% is good, 5% is poor
- ARG (4 chi): Best possible = 9%, so 5% is good, 0.5% is poor
- VAL (1 chi): Best possible = 73%, so 40% is good, 5% is terrible
Proper Rotamer Validation Workflow
DON’T: Use arbitrary cutoffs like “< 10% is bad”
DO: Use context-aware validation:
# Step 1: Get current rotamer scores
rotamers = coot.rotamer_graphs_py(0)
# Step 2: For flagged residues, get alternatives
for chain, resno, inscode, score, resname in rotamers:
if score < 20: # Preliminary flag
# Get all possible rotamers to understand context
alternatives = coot.score_rotamers_py(0, chain, resno, "", "", 1, 1, 0.001)
if len(alternatives) == 0:
continue # GLY, ALA - no rotamers
# Sort by density fit
sorted_alts = sorted(alternatives, key=lambda x: x[2], reverse=True)
best_density = sorted_alts[0][2]
current_density = sorted_alts[0][2] # Approximate
# Check how many rotamers exist
n_rotamers = len(alternatives)
# Decision logic
if n_rotamers < 5 and score < 10:
# Few rotamers (VAL, PHE, etc.) and low score = likely wrong
print(f"PROBLEM: {chain}/{resno} {resname}: {score:.1f}% (few alternatives)")
elif n_rotamers > 20 and score < 2:
# Many rotamers (LYS, ARG) and very low score = likely wrong
print(f"PROBLEM: {chain}/{resno} {resname}: {score:.1f}% (many alternatives)")
elif best_density - current_density > 3.0:
# Alternative has much better density fit
print(f"PROBLEM: {chain}/{resno} {resname}: better rotamer available")
Combined Validation: Rotamers + Density + Clashes
A residue needs fixing if:
- Rotamer score is low FOR THAT RESIDUE TYPE (bottom 10% of possibilities), AND
- Density correlation is poor (< 0.7), AND/OR
- Causes steric clashes (> 2.0 ų overlap)
A low rotamer score alone is NOT sufficient - always check:
- Is this low for this residue type? (compare to alternatives)
- Does the density support this conformation?
- Are there clashes that would be resolved by changing rotamers?
Atom Overlaps: Finding Steric Clashes
Atom overlap detection identifies clashes between atoms that may not be caught by local geometry validation. These reveal packing problems such as:
- Clashes between distant residues
- Side-chain/side-chain clashes
- Backbone/side-chain clashes
- Clashes with symmetry mates
Critical insight: Ramachandran and rotamer validation catch local geometry problems (within a residue or its immediate neighbors), while atom overlap detection catches global packing problems between any atoms in the structure.
Map-to-Model: Finding Missing Features
The blob-finding function identifies regions of significant density that are not explained by your current model. It leads you to places in the map where you might be missing:
- Waters
- Ligands
- Alternative conformations
- Metal ions
- Other small molecules
- Missing residues or loops
Critical Function: find_blobs_py()
Always include blob detection when performing structure validation with a map.
blobs = coot.find_blobs_py(
imol_model=0, # your protein model
imol_map=1, # the map to search (often difference map)
cut_off_density_level=3.0 # sigma threshold (typically 2.5-4.0)
)
# Returns: list of (position, score) tuples
# [(clipper::Coord_orth, float), ...]
Parameters
imol_model: The model molecule - density explained by this model will be excludedimol_map: The map to search for blobs (usually a difference map, but can be regular map)cut_off_density_level: Sigma threshold for blob detection- 3.0 sigma: Standard threshold for significant features
- 2.5 sigma: More sensitive, finds weaker features
- 4.0 sigma: Conservative, only strong features
Understanding the Results
for position, score in blobs:
x = position.x()
y = position.y()
z = position.z()
print(f"Blob at ({x:.2f}, {y:.2f}, {z:.2f}) - score: {score:.2f}")
The score represents the strength/volume of the unmodeled density. Higher scores indicate more significant features that should be investigated.
Recentering the View
The user likes to see what you are considering and how you change the model, so, if you can, try to use coot.set_rotation_centre() or coot.set_go_to_atom_chain_residue_atom_name() or some such to bring the currently interesting issue to the centre of the screen.
Complete Validation Workflow
1. Model-to-Map Validation
# Ramachandran outliers
rama_outliers = coot.all_molecule_ramachandran_score_py(0)
# Rotamer outliers
rotamer_outliers = coot.rotamer_graphs_py(0)
# Per-residue density correlation
correlation_stats = coot.map_to_model_correlation_stats_per_residue_range_py(
0, # imol_model
"A", # chain_id
1, # start_resno
100, # end_resno
1 # imol_map
)
# Geometry validation
chiral = coot.chiral_volume_errors_py(0)
2. Atom Overlap Validation
# Get worst 30 atom overlaps
overlaps = coot.molecule_atom_overlaps_py(0, 30)
# Check for severe clashes
severe_clashes = [o for o in overlaps if o['overlap-volume'] > 5.0]
if severe_clashes:
print(f"WARNING: {len(severe_clashes)} severe clashes found!")
# For full analysis (caution: can be very large!)
# all_overlaps = coot.molecule_atom_overlaps_py(0, -1)
3. Map-to-Model Validation (Blobs)
# Find unmodeled density in difference map
diff_map_blobs = coot.find_blobs_py(
imol_model=0,
imol_map=2, # difference map
cut_off_density_level=3.0
)
# Find features in regular map (alternative approach)
regular_map_blobs = coot.find_blobs_py(
imol_model=0,
imol_map=1, # 2mFo-DFc map
cut_off_density_level=1.0 # Lower threshold for fitted map
)
3. Comprehensive Validation Report
def comprehensive_validation(imol_model, imol_map, imol_diff_map=None):
"""
Perform complete structure validation combining model and map analysis.
Returns dictionary with all validation metrics.
"""
results = {}
# Model-to-map validation
results['ramachandran'] = coot.all_molecule_ramachandran_score_py(imol_model)
results['rotamers'] = coot.rotamer_graphs_py(imol_model)
# Atom overlap validation
results['atom_overlaps'] = coot.molecule_atom_overlaps_py(imol_model, 30)
severe_clashes = [o for o in results['atom_overlaps'] if o['overlap-volume'] > 5.0]
results['severe_clash_count'] = len(severe_clashes)
# Per-residue correlation (requires chain info)
import coot_utils
chains = coot_utils.chain_ids(imol_model)
results['correlation_by_chain'] = {}
for chain in chains:
n_residues = coot.chain_n_residues(chain, imol_model)
if n_residues > 0:
stats = coot.map_to_model_correlation_stats_per_residue_range_py(
imol_model, chain, 1, 9999, imol_map
)
results['correlation_by_chain'][chain] = stats
# Map-to-model validation (blobs)
if imol_diff_map is not None:
results['diff_map_blobs'] = coot.find_blobs_py(
imol_model, imol_diff_map, 3.0
)
results['map_blobs'] = coot.find_blobs_py(
imol_model, imol_map, 1.0
)
return results
# Usage
validation = comprehensive_validation(
imol_model=0,
imol_map=1,
imol_diff_map=2
)
Interpreting Blob Results
What Different Maps Tell You
Difference Map (mFo-DFc) Blobs:
- Positive blobs (>3σ): Missing atoms/features - something should be added here
- Negative blobs (<-3σ): Incorrectly modeled atoms - something should be removed/moved
- Most reliable for finding genuine missing features
Regular Map (2mFo-DFc) Blobs:
- Less sensitive to model bias
- Good for finding larger missing features (domains, ligands)
- Use lower sigma threshold (0.5-1.5σ)
Common Blob Interpretations
blobs = coot.find_blobs_py(0, 2, 3.0) # diff map, 3 sigma
# Large score (>50): Likely missing ligand, metal, or several waters
# Medium score (10-50): Likely 1-3 waters or alternative conformation
# Small score (3-10): Likely single water or weak alternative conformation
for position, score in blobs:
if score > 50:
print(f"Large feature at {position} - investigate for ligand/metal")
elif score > 10:
print(f"Medium feature at {position} - likely waters")
else:
print(f"Small feature at {position} - check carefully")
Critical Function: molecule_atom_overlaps_py()
Always include atom overlap checking when validating structure geometry.
# Get worst 30 atom overlaps (default behavior after API update)
overlaps = coot.molecule_atom_overlaps_py(
imol=0,
n_pairs=30 # Number of worst overlaps to return (default: 30)
)
# Get ALL overlaps (use with caution - can be hundreds!)
all_overlaps = coot.molecule_atom_overlaps_py(
imol=0,
n_pairs=-1 # -1 means return all overlaps
)
# Each overlap is a dict with:
# {
# 'atom-1-spec': [imol, chain, resno, inscode, atom_name, altconf],
# 'atom-2-spec': [imol, chain, resno, inscode, atom_name, altconf],
# 'overlap-volume': float, # in Ų
# 'radius-1': float,
# 'radius-2': float
# }
Understanding Overlap Results
Overlap volume indicates severity:
- >5.0 ų: Severe clash - atoms are deeply interpenetrating
- 2.0-5.0 ų: Moderate clash - needs immediate attention
- 0.5-2.0 ų: Minor clash - may be acceptable in some contexts
- <0.5 ų: Very minor overlap - often acceptable
Common clash patterns:
overlaps = coot.molecule_atom_overlaps_py(0, 30)
for overlap in overlaps:
atom1 = overlap['atom-1-spec']
atom2 = overlap['atom-2-spec']
volume = overlap['overlap-volume']
chain1, res1, atom_name1 = atom1[1], atom1[2], atom1[4]
chain2, res2, atom_name2 = atom2[1], atom2[2], atom2[4]
if volume > 5.0:
print(f"SEVERE: {chain1}/{res1} {atom_name1} ↔ {chain2}/{res2} {atom_name2}: {volume:.2f} Ų")
elif volume > 2.0:
print(f"MODERATE: {chain1}/{res1} {atom_name1} ↔ {chain2}/{res2} {atom_name2}: {volume:.2f} Ų")
Why Overlaps Are Essential
Example from tutorial data:
- Ramachandran validation found outliers at A/41-42
- Overlap validation revealed A/41 O ↔ A/43 N: 2.07 ų backbone clash
- BUT also found A/2 ↔ A/89 clashes (7.45, 6.40 ų) between distant residues that had PERFECT local geometry!
Key lesson: A model can have perfect Ramachandran and rotamer scores but catastrophic packing problems. You need both local geometry validation (Rama/rotamer) AND global packing validation (overlaps).
Prioritizing Validation Fixes
Understanding Rotamer Scores in Context
CRITICAL: Never use absolute rotamer score thresholds without considering residue type
Before prioritizing rotamer fixes, understand what’s “bad” for each residue:
def assess_rotamer_severity(chain, resno, score, resname):
"""
Determine if a rotamer score is actually problematic.
Returns: 'critical', 'moderate', 'acceptable', or 'good'
"""
# Get all possible rotamers to understand the distribution
alternatives = coot.score_rotamers_py(0, chain, resno, "", "", 1, 1, 0.001)
n_rotamers = len(alternatives)
# Context-aware thresholds based on number of possible rotamers
if n_rotamers <= 3: # VAL, THR, SER (1 chi)
if score < 10: return 'critical'
elif score < 30: return 'moderate'
else: return 'acceptable'
elif n_rotamers <= 9: # PHE, TYR, etc. (2 chi)
if score < 5: return 'critical'
elif score < 15: return 'moderate'
else: return 'acceptable'
elif n_rotamers <= 15: # GLU, GLN, MET (3 chi)
if score < 3: return 'critical'
elif score < 10: return 'moderate'
else: return 'acceptable'
else: # LYS, ARG (4 chi, 30+ rotamers)
if score < 1: return 'critical'
elif score < 5: return 'moderate'
else: return 'acceptable'
1. Address High-Confidence Issues First
Priority 1: Combined problems (multiple red flags)
- Severe atom overlaps (>5 ų) - atoms deeply interpenetrating
- Poor rotamer + poor density + clashes - triple failure
- Example: Score < 5% for 2-chi residue, correlation < 0.5, clashes > 2 ų
- Ramachandran outliers with poor density correlation - backbone is wrong
- Large difference map blobs (>4σ) - definitely missing features
Priority 2: Single severe issues
- Context-aware rotamer outliers with poor density:
- VAL/THR/SER < 10% AND correlation < 0.7
- PHE/TYR/ASP < 5% AND correlation < 0.7
- GLU/GLN/MET < 3% AND correlation < 0.7
- LYS/ARG < 1% AND correlation < 0.7
- Moderate atom overlaps (2-5 ų) between distant residues
Important: A low rotamer score with GOOD density correlation (>0.8) may be correct - it could be a genuine unusual but real conformation. Don’t “fix” it unless there’s supporting evidence (clashes, poor density, chemical implausibility).
2. Investigate Moderate Issues
- Medium difference map blobs (3-4σ) - probably real features
- Context-appropriate moderate rotamer scores with marginal density:
- Check if alternative rotamer has much better density fit
- Compare alternatives with
score_rotamers_py()
- Minor atom overlaps (0.5-2 ų) - may need adjustment
- Moderate geometry outliers - may need refinement
3. Review Low-Priority Items
- Small blobs near model - might be noise or minor adjustments
- Very minor overlaps (<0.5 ų) - often acceptable
- Isolated geometry outliers with good density - may be genuine
- Borderline Ramachandran outliers - check context
Automated Validation Example
def validate_and_fix_chain(imol_model, chain_id, imol_map, imol_diff_map):
"""
Automated validation and suggested fixes for a chain.
"""
issues = []
# 1. Check for atom overlaps
overlaps = coot.molecule_atom_overlaps_py(imol_model, 50)
for overlap in overlaps:
atom1 = overlap['atom-1-spec']
atom2 = overlap['atom-2-spec']
volume = overlap['overlap-volume']
# Only report if at least one atom is in this chain
if atom1[1] == chain_id or atom2[1] == chain_id:
severity = 'high' if volume > 5.0 else ('medium' if volume > 2.0 else 'low')
issues.append({
'type': 'atom_overlap',
'atom1': f"{atom1[1]}/{atom1[2]} {atom1[4]}",
'atom2': f"{atom2[1]}/{atom2[2]} {atom2[4]}",
'severity': severity,
'value': volume
})
# 2. Check correlation for each residue
stats = coot.map_to_model_correlation_stats_per_residue_range_py(
imol_model, chain_id, 1, 9999, imol_map
)
for residue_spec, correlation in stats:
if correlation < 0.7: # Poor fit threshold
issues.append({
'type': 'poor_correlation',
'residue': residue_spec,
'severity': 'high',
'value': correlation
})
# 3. Find nearby blobs that might explain poor correlation
blobs = coot.find_blobs_py(imol_model, imol_diff_map, 3.0)
for position, score in blobs:
issues.append({
'type': 'unmodeled_density',
'position': (position.x(), position.y(), position.z()),
'severity': 'high' if score > 50 else 'medium',
'score': score
})
# 4. Check Ramachandran
rama = coot.all_molecule_ramachandran_score_py(imol_model)
for outlier in rama:
if outlier[4] == 'OUTLIER': # Ramachandran region
issues.append({
'type': 'ramachandran_outlier',
'residue': outlier[0:3], # chain, resno, inscode
'severity': 'high'
})
return sorted(issues, key=lambda x: {'high': 0, 'medium': 1, 'low': 2}[x['severity']])
# Usage
issues = validate_and_fix_chain(0, "A", 1, 2)
for issue in issues[:10]: # Top 10 issues
print(f"{issue['type']}: {issue}")
Common Patterns
Water Placement from Blobs
# Find blobs in difference map
blobs = coot.find_blobs_py(0, 2, 3.0)
# Add waters at blob positions
for position, score in blobs:
if 5 < score < 30: # Typical water blob size
# Check if appropriate for water
x, y, z = position.x(), position.y(), position.z()
# Add water at this position
coot.place_typed_atom_at_pointer("HOH")
Missing Residue Detection
# Look for large blobs that might be missing residues
blobs = coot.find_blobs_py(0, 2, 3.0)
missing_residue_candidates = [
(pos, score) for pos, score in blobs
if score > 100 # Large feature
]
for position, score in missing_residue_candidates:
print(f"Large unmodeled density at {position} - check for missing residues")
Key Takeaways
- Always check atom overlaps - local geometry can be perfect while global packing is catastrophic
- Always run blob detection when you have both model and map
- Use difference maps (mFo-DFc) for most sensitive blob detection
- Combine all three validation types (model-to-map, overlaps, map-to-model) for complete picture
- Context-aware rotamer validation - never use absolute thresholds:
- Understand the difference between
rotamer_graphs_py()(continuous density) andscore_rotamers_py()(discrete bins) - Use
rotamer_graphs_py()scores for validation (measures actual chi angles) - Consider number of possible rotamers: 5% is terrible for VAL but good for ARG
- Always check density correlation and available alternatives before “fixing”
- Understand the difference between
- Prioritize by severity AND confidence - fix issues with multiple red flags first
- Iterate - fixing one issue may reveal others
- Document - keep track of what you fixed and why
Function Reference
Essential Validation Functions
# Rotamer validation - primary metric (continuous probability density)
rotamers = coot.rotamer_graphs_py(imol)
# Returns: [[chain_id, resno, ins_code, score_percentage, resname], ...]
# Rotamer alternatives - for understanding context
alternatives = coot.score_rotamers_py(imol, chain, resno, "", "", imol_map, 1, 0.001)
# Returns: [[name, probability, density_score, atom_list, richardson_score], ...]
# Atom overlap detection
overlaps = coot.molecule_atom_overlaps_py(imol, n_pairs=30) # Default: 30 worst
all_overlaps = coot.molecule_atom_overlaps_py(imol, n_pairs=-1) # All overlaps
# Blob detection (map-to-model)
blobs = coot.find_blobs_py(imol_model, imol_map, sigma_cutoff)
# Ramachandran validation
rama = coot.all_molecule_ramachandran_score_py(imol)
# Density correlation (model-to-map)
corr = coot.map_to_model_correlation_stats_per_residue_range_py(
imol, chain, imol_map, n_per_range, exclude_NOC_flag
)
# Geometry validation
chiral = coot.chiral_volume_errors_py(imol)
Remember:
- Model-to-map tells you what’s wrong with your model
- Atom overlaps tell you about packing problems
- Map-to-model tells you what you’re missing
- Rotamer scores must be interpreted in context of residue type