Skip to content

hardnormly

VCF normalization and hard filtering toolkit for whole-exome sequencing variant processing.

Quick Start

conda env create -f conda/hardnormly_environment.yml
conda activate hardnormly

./hardnormly.sh run-pipeline \
  -v input.vcf.gz \
  -f reference.fasta \
  --caller gatk \
  -o output.vcf.gz

This normalizes your VCF (multiallelic splitting, left-alignment) and applies GATK hard filters. Variants that fail a filter get a tag in the FILTER column — no variants are removed unless you add --only-pass.

How It Works

flowchart TD
    A[Input VCF] --> B[Region Annotation<br>BED files]
    B --> C[Strip Annotations<br>optional]
    C --> D[Normalize<br>bcftools norm]
    D --> E[Hard Filter<br>soft-filter tags]
    E --> F[Output VCF]
    F --> G[Stats & Plots<br>optional]

See Pipeline for a detailed explanation of each step.

Documentation

Guide What you'll learn
Pipeline What each step does and why
Options Reference Every flag, categorized by purpose
Filters Filter system, file format, writing custom filters
Examples Copy-paste recipes for common workflows
Snakemake Batch processing multiple VCFs
Architecture Module structure and development guide

Subcommands

Command Purpose
run-pipeline Normalize and filter a VCF (default)
generate-inclusion-bed Merge BED files into a combined inclusion region
generate-exclusion-bed Merge BED files into a combined exclusion region

Running hardnormly.sh with flags but no subcommand (e.g., hardnormly.sh -v input.vcf.gz ...) implicitly routes to run-pipeline. Running without any arguments shows help.