Running the Pipeline¶
Learn how to use Custom Panel's command-line interface to create gene panels.
Quick Start¶
1. Basic Pipeline Run¶
Run the complete pipeline with default settings:
This will: - Fetch data from all enabled sources - Merge and score genes using the default configuration - Generate output files in multiple formats
2. Check Configuration¶
Before running, verify your configuration:
3. Search Available Panels¶
Find relevant panels in PanelApp:
Main Commands¶
run
- Execute the Complete Pipeline¶
Key Options:
- -c, --config-file TEXT
- Custom configuration file
- -o, --output-dir TEXT
- Output directory (default: results)
- --score-threshold FLOAT
- Override score threshold
- --log-level TEXT
- Log level (DEBUG, INFO, WARNING, ERROR)
- --dry-run
- Preview what would be executed
Examples:
# Basic run with custom output directory
custom-panel run --output-dir my_results
# Use custom configuration
custom-panel run -c my_config.yml --output-dir results
# Debug mode with lower threshold
custom-panel run --log-level DEBUG --score-threshold 1.0
# Preview mode (no files created)
custom-panel run --dry-run
fetch
- Fetch Data from Individual Sources¶
Available Sources:
- panelapp
- UK Genomics England PanelApp
- inhouse
- Local gene panel files
- acmg
- ACMG incidental findings
- manual
- Manual curation lists
- hpo
- HPO/OMIM neoplasm genes
- cosmic
- COSMIC Cancer Gene Census
- clingen
- ClinGen gene validity
- gencc
- GenCC gene-disease associations
- commercial
- Commercial panel data
Examples:
# Fetch PanelApp data only
custom-panel fetch panelapp --output-dir results/panelapp
# Fetch with specific format
custom-panel fetch acmg --format csv --output-dir results
Output Files¶
The pipeline generates several output formats:
Final Panel Files¶
master_panel.xlsx
- Comprehensive Excel file with all datamaster_panel.csv
- CSV format for programmatic usemaster_panel.parquet
- Efficient binary formatgermline_panel.bed
- BED file for included genespanel_report.html
- Interactive HTML report
Intermediate Files (if enabled)¶
- Raw source data
- Standardized data
- Merged data before scoring
- Scored data before final filtering
Configuration Options¶
Score Thresholds¶
Adjust inclusion criteria:
# Lower threshold for more genes
custom-panel run --score-threshold 1.0
# Higher threshold for stricter filtering
custom-panel run --score-threshold 2.5
Output Formats¶
Control which formats are generated by editing your configuration file:
output:
formats:
- "excel" # Comprehensive Excel file
- "csv" # CSV for programmatic use
- "parquet" # Efficient binary format
- "bed" # BED files for genome browsers
Logging¶
Enable detailed logging:
Advanced Usage¶
Custom Configuration¶
Create a custom configuration file:
cp custom_panel/config/default_config.yml my_config.yml
# Edit my_config.yml as needed
custom-panel run -c my_config.yml
Batch Processing¶
Process multiple configurations:
#!/bin/bash
for config in configs/*.yml; do
output_dir="results/$(basename $config .yml)"
custom-panel run -c "$config" --output-dir "$output_dir"
done
Performance Tuning¶
Adjust performance settings in your configuration:
performance:
max_workers: 4 # Parallel workers for API calls
batch_size: 300 # Genes per batch for coordinate lookup
enable_caching: true # Cache API responses
Next Steps¶
- Configuration Guide - Customize scoring and data sources
- Data Source Setup - Configure external data access
- API Reference - Complete CLI documentation