Running the Pipeline¶

Learn how to use Custom Panel's command-line interface to create gene panels.

Quick Start¶

1. Basic Pipeline Run¶

Run the complete pipeline with default settings:

custom-panel run --output-dir results

This will: - Fetch data from all enabled sources - Merge and score genes using the default configuration - Generate output files in multiple formats

2. Check Configuration¶

Before running, verify your configuration:

custom-panel config-check

3. Search Available Panels¶

Find relevant panels in PanelApp:

custom-panel search-panels "cancer"

Main Commands¶

`run` - Execute the Complete Pipeline¶

custom-panel run [OPTIONS]

Key Options: - -c, --config-file TEXT - Custom configuration file - -o, --output-dir TEXT - Output directory (default: results) - --score-threshold FLOAT - Override score threshold - --log-level TEXT - Log level (DEBUG, INFO, WARNING, ERROR) - --dry-run - Preview what would be executed

Examples:

# Basic run with custom output directory
custom-panel run --output-dir my_results

# Use custom configuration
custom-panel run -c my_config.yml --output-dir results

# Debug mode with lower threshold
custom-panel run --log-level DEBUG --score-threshold 1.0

# Preview mode (no files created)
custom-panel run --dry-run

`fetch` - Fetch Data from Individual Sources¶

custom-panel fetch SOURCE [OPTIONS]

Available Sources: - panelapp - UK Genomics England PanelApp - inhouse - Local gene panel files - acmg - ACMG incidental findings - manual - Manual curation lists - hpo - HPO/OMIM neoplasm genes - cosmic - COSMIC Cancer Gene Census - clingen - ClinGen gene validity - gencc - GenCC gene-disease associations - commercial - Commercial panel data

Examples:

# Fetch PanelApp data only
custom-panel fetch panelapp --output-dir results/panelapp

# Fetch with specific format
custom-panel fetch acmg --format csv --output-dir results

Output Files¶

The pipeline generates several output formats:

Final Panel Files¶

master_panel.xlsx - Comprehensive Excel file with all data
master_panel.csv - CSV format for programmatic use
master_panel.parquet - Efficient binary format
germline_panel.bed - BED file for included genes
panel_report.html - Interactive HTML report

Intermediate Files (if enabled)¶

Raw source data
Standardized data
Merged data before scoring
Scored data before final filtering

Configuration Options¶

Score Thresholds¶

Adjust inclusion criteria:

# Lower threshold for more genes
custom-panel run --score-threshold 1.0

# Higher threshold for stricter filtering  
custom-panel run --score-threshold 2.5

Output Formats¶

Control which formats are generated by editing your configuration file:

output:
  formats:
    - "excel"    # Comprehensive Excel file
    - "csv"      # CSV for programmatic use
    - "parquet"  # Efficient binary format
    - "bed"      # BED files for genome browsers

Logging¶

Enable detailed logging:

# Debug mode
custom-panel run --log-level DEBUG

# Save logs to files
custom-panel run --log-to-file

Advanced Usage¶

Custom Configuration¶

Create a custom configuration file:

cp custom_panel/config/default_config.yml my_config.yml
# Edit my_config.yml as needed
custom-panel run -c my_config.yml

Batch Processing¶

Process multiple configurations:

#!/bin/bash
for config in configs/*.yml; do
    output_dir="results/$(basename $config .yml)"
    custom-panel run -c "$config" --output-dir "$output_dir"
done

Performance Tuning¶

Adjust performance settings in your configuration:

performance:
  max_workers: 4      # Parallel workers for API calls
  batch_size: 300     # Genes per batch for coordinate lookup
  enable_caching: true # Cache API responses

Next Steps¶

Configuration Guide - Customize scoring and data sources
Data Source Setup - Configure external data access
API Reference - Complete CLI documentation