Skip to content

Running the Pipeline

Learn how to use Custom Panel's command-line interface to create gene panels.

Quick Start

1. Basic Pipeline Run

Run the complete pipeline with default settings:

custom-panel run --output-dir results

This will: - Fetch data from all enabled sources - Merge and score genes using the default configuration - Generate output files in multiple formats

2. Check Configuration

Before running, verify your configuration:

custom-panel config-check

3. Search Available Panels

Find relevant panels in PanelApp:

custom-panel search-panels "cancer"

Main Commands

run - Execute the Complete Pipeline

custom-panel run [OPTIONS]

Key Options: - -c, --config-file TEXT - Custom configuration file - -o, --output-dir TEXT - Output directory (default: results) - --score-threshold FLOAT - Override score threshold - --log-level TEXT - Log level (DEBUG, INFO, WARNING, ERROR) - --dry-run - Preview what would be executed

Examples:

# Basic run with custom output directory
custom-panel run --output-dir my_results

# Use custom configuration
custom-panel run -c my_config.yml --output-dir results

# Debug mode with lower threshold
custom-panel run --log-level DEBUG --score-threshold 1.0

# Preview mode (no files created)
custom-panel run --dry-run

fetch - Fetch Data from Individual Sources

custom-panel fetch SOURCE [OPTIONS]

Available Sources: - panelapp - UK Genomics England PanelApp - inhouse - Local gene panel files - acmg - ACMG incidental findings - manual - Manual curation lists - hpo - HPO/OMIM neoplasm genes - cosmic - COSMIC Cancer Gene Census - clingen - ClinGen gene validity - gencc - GenCC gene-disease associations - commercial - Commercial panel data

Examples:

# Fetch PanelApp data only
custom-panel fetch panelapp --output-dir results/panelapp

# Fetch with specific format
custom-panel fetch acmg --format csv --output-dir results

Output Files

The pipeline generates several output formats:

Final Panel Files

  • master_panel.xlsx - Comprehensive Excel file with all data
  • master_panel.csv - CSV format for programmatic use
  • master_panel.parquet - Efficient binary format
  • germline_panel.bed - BED file for included genes
  • panel_report.html - Interactive HTML report

Intermediate Files (if enabled)

  • Raw source data
  • Standardized data
  • Merged data before scoring
  • Scored data before final filtering

Configuration Options

Score Thresholds

Adjust inclusion criteria:

# Lower threshold for more genes
custom-panel run --score-threshold 1.0

# Higher threshold for stricter filtering  
custom-panel run --score-threshold 2.5

Output Formats

Control which formats are generated by editing your configuration file:

output:
  formats:
    - "excel"    # Comprehensive Excel file
    - "csv"      # CSV for programmatic use
    - "parquet"  # Efficient binary format
    - "bed"      # BED files for genome browsers

Logging

Enable detailed logging:

# Debug mode
custom-panel run --log-level DEBUG

# Save logs to files
custom-panel run --log-to-file

Advanced Usage

Custom Configuration

Create a custom configuration file:

cp custom_panel/config/default_config.yml my_config.yml
# Edit my_config.yml as needed
custom-panel run -c my_config.yml

Batch Processing

Process multiple configurations:

#!/bin/bash
for config in configs/*.yml; do
    output_dir="results/$(basename $config .yml)"
    custom-panel run -c "$config" --output-dir "$output_dir"
done

Performance Tuning

Adjust performance settings in your configuration:

performance:
  max_workers: 4      # Parallel workers for API calls
  batch_size: 300     # Genes per batch for coordinate lookup
  enable_caching: true # Cache API responses

Next Steps