Examples¶
Copy-paste recipes for common workflows. All examples assume you have bcftools, bedtools, and htslib installed.
Basic GATK Filtering¶
The 80% use case — normalize and apply GATK hard filters:
./hardnormly.sh run-pipeline \
-v sample.vcf.gz \
-f reference.fasta \
--caller gatk \
-o sample.filtered.vcf.gz
Output: soft-filtered VCF with tags like DPu10het, gatkSNPhard, etc. in the FILTER column.
Freebayes with Exclusion Regions¶
Filter Freebayes calls while excluding problematic genomic regions:
./hardnormly.sh run-pipeline \
-v sample.freebayes.vcf.gz \
-f reference.fasta \
--caller freebayes \
-e ref/exclude_files/hg19_exclusion.bed \
-o sample.filtered.vcf.gz
Variants inside exclusion regions get the IN_EXCLUDE_REGION tag.
Include and Exclude Regions Together¶
Use a capture kit BED (include) and a blacklist BED (exclude):
./hardnormly.sh run-pipeline \
-v sample.vcf.gz \
-f reference.fasta \
-b capture_kit.bed \
-e blacklist.bed \
-g hg19.genome \
--caller gatk \
--slop 50 \
-o sample.filtered.vcf.gz
Variants outside the capture kit (padded by 50 bp) get NOT_IN_INCLUDE_REGION. Variants inside the blacklist get IN_EXCLUDE_REGION.
Custom Inline Filters¶
Add ad-hoc filters without creating a file:
./hardnormly.sh run-pipeline \
-v sample.vcf.gz \
-f reference.fasta \
--filters "lowDP e FORMAT/DP<10" \
--filters "lowQUAL e QUAL<30" \
--filters "highVAF i FORMAT/VAF>0.2" \
-o sample.filtered.vcf.gz
Strip VEP Annotations Before Filtering¶
Clean up INFO field bloat from VEP or SnpEff before filtering:
./hardnormly.sh run-pipeline \
-v annotated.vcf.gz \
-f reference.fasta \
--strip-annotations INFO/CSQ,INFO/ANN \
--caller gatk \
-o clean.filtered.vcf.gz
Keep Only PASS Variants¶
Remove all filtered variants, outputting only those that passed every filter:
./hardnormly.sh run-pipeline \
-v sample.vcf.gz \
-f reference.fasta \
--caller gatk \
--only-pass \
-o sample.pass_only.vcf.gz
Generate Stats and Plots¶
Produce QC summaries after filtering:
./hardnormly.sh run-pipeline \
-v sample.vcf.gz \
-f reference.fasta \
--caller gatk \
--generate-stats \
--plot-stats \
--plot-output-dir qc_plots/ \
-o sample.filtered.vcf.gz
Creates sample.filtered.stats.txt and visual plots in qc_plots/.
Generate Exclusion BED from Public Sources¶
Create a comprehensive exclusion BED file from ENCODE blacklist, segmental duplications, low complexity, and centromere/telomere regions:
bash scripts/generate_exclusion_bed.sh -b hg19 -o ref/exclude_files/hg19_exclusion.bed -v
bash scripts/generate_exclusion_bed.sh -b hg38 -o ref/exclude_files/hg38_exclusion.bed -v
Use the BED matching your reference build. UCSC sources use chr contig names; if your VCF/reference uses 1, 2, etc., strip the prefix before passing the BED to hardnormly.
Merge BED Files (Subcommands)¶
Pre-merge BED files for reuse across multiple samples:
# Merge inclusion BEDs with 50bp padding
./hardnormly.sh generate-inclusion-bed \
-b kit_v1.bed \
-b kit_v2.bed \
-g hg19.genome \
--slop 50 \
-o merged_targets.bed \
-v
# Merge exclusion BEDs
./hardnormly.sh generate-exclusion-bed \
-e blacklist.bed \
-e segdups.bed \
-o merged_exclusions.bed \
-v
Debug a Failed Run¶
When something goes wrong, use debug mode and preserve temp files:
./hardnormly.sh run-pipeline \
-v sample.vcf.gz \
-f reference.fasta \
--caller gatk \
--debug \
--no-cleanup \
--tmp-dir /tmp/hardnormly-debug \
--log-file debug.log \
-o sample.filtered.vcf.gz
--debugenablesset -xtracing (every command is printed)--no-cleanuppreserves intermediate files in the temp directory--log-filewrites all log messages to a file for later review--tmp-diruses a predictable path so you can find the files