Architecture¶
hardnormly is a modular bash application. The main script (hardnormly.sh) orchestrates the pipeline and subcommands by sourcing eight focused library modules.
Module Structure¶
| Module | Lines | Purpose | Key Functions |
|---|---|---|---|
lib/logging.sh |
~100 | Logging and command execution | log_msg, debug_msg, error_msg, run_cmd, run_cmd_with_retry |
lib/cli.sh |
~310 | Argument parsing and validation | parse_args, validate_args, parse_filter_args, show_help |
lib/genome.sh |
~50 | Genome file creation | create_genome_file |
lib/bed.sh |
~95 | BED file processing | normalize_bed, merge_include_beds, merge_exclude_beds, compress_index_bed, create_header_file |
lib/annotate.sh |
~35 | VCF region annotation | annotate_vcf_with_regions, strip_vcf_annotations |
lib/normalize.sh |
~45 | VCF normalization | normalize_vcf |
lib/filter.sh |
~90 | Filter pipeline construction | init_filter_pipeline, apply_filter_stages, write_filtered_output |
lib/stats.sh |
~45 | Stats generation and plotting | generate_stats, plot_stats_output |
Source Order¶
Modules are sourced in dependency order — logging.sh first because other modules depend on log_msg, error_msg, and run_cmd:
source "${_SCRIPT_DIR}/lib/logging.sh" # Must be first
source "${_SCRIPT_DIR}/lib/cli.sh"
source "${_SCRIPT_DIR}/lib/genome.sh"
source "${_SCRIPT_DIR}/lib/bed.sh"
source "${_SCRIPT_DIR}/lib/annotate.sh"
source "${_SCRIPT_DIR}/lib/normalize.sh"
source "${_SCRIPT_DIR}/lib/filter.sh"
source "${_SCRIPT_DIR}/lib/stats.sh"
Each module uses an include guard to prevent double-sourcing:
[[ -n "${_LIB_X_LOADED:-}" ]] && return 0
readonly _LIB_X_LOADED=1
Error Handling¶
The script runs with set -Eeuo pipefail:
-E— ERR traps are inherited by functions-e— Exit on error-u— Error on undefined variables-o pipefail— Pipeline fails if any command in the pipe fails
Patterns¶
Single external commands use run_cmd (captures stderr, reports command name on failure):
run_cmd bgzip -f "$bed_file"
run_cmd tabix -p bed "${bed_file}.gz"
Pipelines use || { error_msg; return 1 } guards (pipes can't be wrapped with run_cmd):
awk ... "$bed_file" | bedtools sort -i - >"$output_file" \
|| {
error_msg "normalize_bed: pipeline failed for $bed_file"
return 1
}
Cleanup is handled by a single EXIT trap:
trap cleanup_handler EXIT
The handler preserves the original exit code and removes the temp directory (unless --no-cleanup is set). In debug mode, files are listed before deletion.
Development¶
Lint and Format¶
make lint # ShellCheck + shfmt check
make format # Apply shfmt formatting
make setup-hooks # Install pre-commit hooks
Formatting uses tabs, binary ops at line start, case body indent (shfmt -i 0 -bn -ci).
Testing¶
make test # Run full BATS test suite
make test-debug # Verbose output, keep temp dirs
Test structure:
| File | Tests | What it covers |
|---|---|---|
01-smoke.bats |
Smoke | --help, --version, missing args |
02-gatk-filters.bats |
Filter | Each GATK filter tags correctly |
03-freebayes-filters.bats |
Filter | Each Freebayes filter tags correctly |
04-integration.bats |
Integration | Full pipeline on real data |
05-lib-logging.bats |
Unit | log_msg, debug_msg, error_msg, run_cmd |
06-lib-cli.bats |
Unit | parse_args, validate_args, parse_filter_args |
07-lib-modules.bats |
Unit | All other lib/ module functions |
CI¶
GitHub Actions runs on every push and PR:
- ShellCheck on all shell scripts
- shfmt format check
- BATS test suite
Adding a New Module¶
- Create
lib/newmodule.shwith include guard - Source it in
hardnormly.sh(after logging.sh) - Add tests in
tests/(unit tests for functions, integration for pipeline changes) - The Makefile auto-discovers
lib/*.sh— no manual edits needed