Gene Scoring System¶
Custom Panel implements a sophisticated scoring algorithm that balances clinical evidence, source reliability, and expert judgment to determine which genes to include in the final panel.
Overview¶
The scoring system evaluates each gene across multiple dimensions:
- Evidence Quality - How strong is the gene-disease association?
- Source Reliability - How trustworthy is the data source?
- Source Consensus - How many independent sources support the gene?
- Clinical Priority - Are there clinical guidelines or expert overrides?
Core Scoring Formula¶
Where: - source_evidence_score: Base score from the data source (0.0-1.5) - internal_confidence: Confidence based on supporting evidence (0.0-1.0) - source_group_weight: Final multiplier for source priority (0.7-1.5)
Component Breakdown¶
1. Source Evidence Scores¶
Each source has a base evidence score reflecting its clinical reliability:
Source | Base Score | Rationale |
---|---|---|
ACMG Incidental Findings | 1.5 | Clinical guidelines with actionable interventions |
ClinGen/TheGenCC | 1.2 | Expert curation with validity classifications |
In-house Panels | 1.2 | Validated in clinical practice |
Manual Curation | 1.2 | Expert reviewed for specific context |
PanelApp | 1.0 | Community consensus standard |
COSMIC Germline | 0.9 | Established cancer gene catalog |
Commercial Panels | 0.8 | Market consensus, variable validation |
HPO Neoplasm | 0.7 | Automated associations, broader coverage |
2. Classification Multipliers¶
Sources with classification systems apply additional multipliers:
ClinGen Gene-Disease Validity¶
"Definitive": ×1.5 # Replicated evidence, expert consensus
"Strong": ×1.2 # Compelling evidence, multiple studies
"Moderate": ×1.0 # Several studies, moderate evidence
"Limited": ×0.3 # Few studies, emerging evidence
"Disputed": ×0.3 # Conflicting evidence
"Refuted": ×0.0 # Excluded from scoring
PanelApp Evidence Levels¶
Green (Level 3): ×1.0 # High confidence, clinical use
Amber (Level 2): ×0.5 # Moderate confidence, evaluation
Red (Level 1): ×0.0 # Low confidence, not recommended
COSMIC Tier System¶
Tier 1: ×1.0 # Well-documented cancer genes
Tier 2: ×0.8 # Strong evidence, fewer studies
Other: ×0.4 # Limited classification
3. Internal Confidence Scoring¶
Confidence increases with supporting evidence using normalization functions:
Logistic Normalization¶
Used for sources where confidence plateaus:
Where: - k: Steepness of curve (how quickly confidence increases) - x0: Midpoint (count at 50% confidence)
Examples: - Commercial Panels: k=0.25, x0=5 (50% confidence at 5 panels) - PanelApp: k=0.35, x0=5 (steeper curve, faster confidence)
Linear Normalization¶
Used for sources with proportional confidence:
Example: - In-house Panels: max_count=3 (100% confidence at 3 panels)
4. Source Group Weights¶
Final multipliers reflect clinical decision-making priorities:
ACMG_Incidental_Findings: 1.5 # Highest - clinical guidelines
ClinGen: 1.2 # Expert validity assessment
TheGenCC: 1.2 # Multi-group curation
Inhouse_Panels: 1.2 # Local clinical validation
Manual_Curation: 1.1 # Expert judgment
PanelApp: 1.0 # Reference standard
COSMIC_Germline: 0.9 # Research database
Commercial_Panels: 0.8 # Market consensus
HPO_Neoplasm: 0.7 # Automated associations
Scoring Examples¶
Example 1: Well-Supported Gene (BRCA1)¶
Sources:
- ACMG: 1.5 × 1.0 × 1.5 = 2.25
- PanelApp (Green): 1.0 × 1.0 × 1.0 = 1.0
- ClinGen (Definitive): 1.2 × 1.5 × 1.2 = 2.16
- Commercial (8 panels): 0.8 × 0.67 × 0.8 = 0.43
- In-house: 1.2 × 1.0 × 1.2 = 1.44
Total Score: 7.28 → INCLUDED (well above 1.5 threshold)
Example 2: Emerging Gene¶
Sources:
- Manual Curation: 1.2 × 1.0 × 1.1 = 1.32
- Commercial (2 panels): 0.8 × 0.12 × 0.8 = 0.08
- HPO: 0.7 × 1.0 × 0.7 = 0.49
Total Score: 1.89 → INCLUDED (above 1.5 threshold)
Example 3: Low Evidence Gene¶
Sources:
- Commercial (1 panel): 0.8 × 0.02 × 0.8 = 0.01
- HPO: 0.7 × 1.0 × 0.7 = 0.49
Total Score: 0.50 → EXCLUDED (below 1.5 threshold)
Decision Thresholds¶
Primary Threshold¶
Genes must achieve this score to be included in the panel.
Watch List Threshold¶
Genes between 1.0-1.5 can be tracked for future consideration.
Additional Criteria¶
min_sources: 1 # At least one source required
max_evidence_score: 5.0 # Cap to prevent score inflation
Veto System¶
The veto system allows critical sources to override scoring:
How Veto Works¶
if gene in veto_sources:
include_gene = True # Bypass score check
else:
include_gene = (score >= threshold)
Sources with Veto Power¶
ACMG Incidental Findings - Reason: Professional guidelines for reporting - Use Case: Ensure guideline compliance - Example: TP53 included even if only in ACMG
Manual Curation - Reason: Expert clinical judgment - Use Case: Include emerging or locally important genes - Example: Novel gene from recent literature
Veto Configuration¶
data_sources:
ACMG_Incidental_Findings:
veto:
enabled: true
reason: "ACMG recommended for reporting of incidental findings"
Manual_Curation:
veto:
enabled: true
reason: "Manually curated and reviewed by clinical experts"
Veto Statistics¶
The system tracks veto usage:
{
"veto_stats": {
"total_vetoed": 15,
"by_source": {
"ACMG_Incidental_Findings": 12,
"Manual_Curation": 3
}
}
}
Score Calculation Process¶
Step 1: Source Aggregation¶
For each gene, collect evidence from all sources:
gene_sources = {
"BRCA1": [
{"source": "ACMG", "evidence_score": 1.5},
{"source": "PanelApp", "evidence_score": 1.0, "classification": "Green"},
{"source": "Commercial", "panel_count": 8}
]
}
Step 2: Apply Classifications¶
Multiply base scores by classification factors:
Step 3: Calculate Confidence¶
Apply normalization based on source type:
if normalization == "logistic":
confidence = 1 / (1 + exp(-k * (count - x0)))
elif normalization == "linear":
confidence = min(count / max_count, 1.0)
Step 4: Apply Group Weights¶
Multiply by final source group weight:
Step 5: Sum and Decide¶
Sum all source contributions and check thresholds:
Customizing the Scoring¶
Adjusting Thresholds¶
For stricter panels:
For broader inclusion:
Modifying Weights¶
Prioritize specific sources:
scoring:
source_group_weights:
Inhouse_Panels: 2.0 # Double weight for local panels
Commercial_Panels: 0.5 # Reduce commercial influence
Changing Classifications¶
Adjust classification multipliers:
data_sources:
ClinGen:
classification_scores:
"Definitive": 2.0 # Increase definitive weight
"Limited": 0.1 # Decrease limited evidence
Scoring Outputs¶
Summary Statistics¶
{
"scoring_summary": {
"total_unique_genes": 450,
"genes_above_threshold": 285,
"genes_vetoed": 15,
"final_panel_size": 300,
"score_distribution": {
"0-1": 45,
"1-2": 120,
"2-3": 180,
"3+": 105
}
}
}
Gene-Level Details¶
Each gene includes:
- score
: Final calculated score
- source_count
: Number of supporting sources
- source_details
: Breakdown by source
- veto_status
: Whether veto was applied
- include_decision
: Final inclusion status
Quality Assurance¶
Score Validation¶
- Consistency Checks: Ensure scores match source data
- Range Validation: Verify scores within expected bounds
- Audit Trail: Track all scoring decisions
Regular Review¶
- Threshold Tuning: Adjust based on panel performance
- Weight Optimization: Refine based on clinical feedback
- Source Evaluation: Monitor source quality over time
Next Steps¶
- Configuration Guide - Customize scoring parameters
- Data Sources - Understand source contributions
- Running Pipeline - Execute with custom scoring