CLI Reference¶
Complete command-line interface documentation for MucOneUp.
muconeup¶
MucOneUp - MUC1 VNTR diploid reference simulator.
Philosophy: Each command does ONE thing (Unix philosophy) simulate - Generate haplotypes ONLY reads - Simulate reads from FASTA (supports multiple files) analyze - Analyze FASTA (ORFs, stats) (supports multiple files)
Examples: # Generate haplotypes muconeup --config X simulate --out-base Y
# Process single file muconeup --config X reads illumina Y.001.simulated.fa --out-base reads
# Process multiple files (batch) muconeup --config X reads illumina Y..simulated.fa muconeup --config X analyze orfs Y..simulated.fa
# Shell composition (Unix philosophy) muconeup --config X simulate --fixed-lengths 20-40 --simulate-series 1 for f in Y.*.fa; do muconeup --config X reads illumina "$f"; done
Usage:
Options:
--version Show the version and exit.
--config FILE Path to JSON configuration file. [required]
--log-level [DEBUG|INFO|WARNING|ERROR|CRITICAL|NONE]
Set logging level.
-v, --verbose Enable verbose output (sets log level to
DEBUG).
--help Show this message and exit.
muconeup analyze¶
Analysis utilities.
Single Responsibility: Analyze ANY FASTA file. Works with MucOneUp outputs or external sequences.
Usage:
Options:
muconeup analyze orfs¶
Predict ORFs and detect toxic protein features from one or more FASTA files.
Supports batch processing following Unix philosophy: - Single file: muconeup analyze orfs file.fa --out-base analysis - Multiple files: muconeup analyze orfs file1.fa file2.fa file3.fa - Glob pattern: muconeup analyze orfs *.simulated.fa
When processing multiple files, --out-base is auto-generated from input filenames unless explicitly provided (which applies to all files).
Examples: # Single file with custom output name muconeup --config X analyze orfs sample.001.fa --out-base my_analysis
# Multiple files (auto-generated output names) muconeup --config X analyze orfs sample.001.fa sample.002.fa
# Glob pattern (shell expands) muconeup --config X analyze orfs sample.*.simulated.fa
Usage:
Options:
--out-dir DIRECTORY Output folder. [default: .]
--out-base TEXT Base name for output files (auto-generated if
processing multiple files).
--orf-min-aa INTEGER Minimum ORF length in amino acids. [default: 100]
--orf-aa-prefix TEXT Filter ORFs by prefix (e.g., MTSSV).
--help Show this message and exit.
muconeup analyze snapshot-validate¶
Validate SNaPshot assay for MUC1 VNTR mutations.
Simulates complete SNaPshot workflow: PCR → MwoI digest → extension → detection.
Examples: # Validate dupC mutation in a sample muconeup --config config.json analyze snapshot-validate sample.fa --mutation dupC
# Save results to JSON muconeup --config config.json analyze snapshot-validate sample.fa --mutation dupC --output results.json
Usage:
Options:
--mutation TEXT Mutation name to validate (e.g., 'dupC'). [required]
--output PATH Output JSON file for validation results (prints to stdout
if not specified).
--help Show this message and exit.
muconeup analyze stats¶
Generate basic sequence statistics from one or more FASTA files.
Supports batch processing following Unix philosophy: - Single file: muconeup analyze stats file.fa --out-base stats - Multiple files: muconeup analyze stats file1.fa file2.fa file3.fa - Glob pattern: muconeup analyze stats *.simulated.fa
When processing multiple files, --out-base is auto-generated from input filenames unless explicitly provided (which applies to all files).
Examples: # Single file with custom output name muconeup --config X analyze stats sample.001.fa --out-base my_stats
# Multiple files (auto-generated output names) muconeup --config X analyze stats sample.001.fa sample.002.fa
# Glob pattern (shell expands) muconeup --config X analyze stats sample.*.simulated.fa
Usage:
Options:
--out-dir DIRECTORY Output folder. [default: .]
--out-base TEXT Base name for output files (auto-generated if
processing multiple files).
--help Show this message and exit.
muconeup analyze vntr-stats¶
Analyze VNTR structures and compute transition probabilities.
Processes a CSV/TSV file containing VNTR structures, calculates statistics (min/max/mean/median repeat units), and builds a transition probability matrix showing the likelihood of each repeat unit following another.
The analysis removes duplicate VNTR structures and includes an "END" state representing sequence termination. Unknown repeat tokens (not in config) trigger warnings but don't cause failure.
Examples: # Analyze example VNTR database muconeup --config X analyze vntr-stats data/examples/vntr_database.tsv --header
# Use custom column and save to file muconeup --config X analyze vntr-stats data.csv \ --delimiter "," --structure-column "sequence" --output stats.json
# Column index without header muconeup --config X analyze vntr-stats data.tsv --structure-column 3
# Pipe to jq for filtering muconeup --config X analyze vntr-stats data/examples/vntr_database.tsv \ --header | jq '.mean_repeats'
Output JSON contains: - Statistics: min/max/mean/median repeat counts - Probabilities: State transition matrix (including END state) - Repeats: Known repeat dictionary from config
Usage:
Options:
--structure-column TEXT Column name (if header) or 0-based index containing
VNTR structure. [default: vntr]
--delimiter TEXT Field delimiter for input file. [default: ]
--header Specify if input file has header row.
-o, --output PATH Output JSON file (default: stdout).
--help Show this message and exit.
muconeup reads¶
Read simulation utilities.
Single Responsibility: Simulate reads from ANY FASTA file. Works with MucOneUp outputs or external sequences.
Usage:
Options:
muconeup reads illumina¶
Simulate Illumina short reads from one or more FASTA files.
Supports batch processing following Unix philosophy: - Single file: muconeup reads illumina file.fa --out-base reads - Multiple files: muconeup reads illumina file1.fa file2.fa file3.fa - Glob pattern: muconeup reads illumina *.simulated.fa
When processing multiple files, --out-base is auto-generated from input filenames unless explicitly provided (which applies to all files).
Examples: # Single file with custom output name muconeup --config X reads illumina sample.001.fa --out-base my_reads
# Multiple files (auto-generated output names) muconeup --config X reads illumina sample.001.fa sample.002.fa
# Glob pattern (shell expands) muconeup --config X reads illumina sample.*.simulated.fa
# Compose with shell (Unix philosophy) for f in *.fa; do muconeup --config X reads illumina "$f"; done
Usage:
Options:
--out-dir DIRECTORY Output folder. [default: .]
--out-base TEXT Base name for output files (auto-generated if
processing multiple files).
--coverage INTEGER Target sequencing coverage. [default: 30]
--threads INTEGER Number of threads. [default: 8]
--seed INTEGER Random seed for reproducibility (same seed = identical
reads).
--help Show this message and exit.
muconeup reads ont¶
Simulate Oxford Nanopore long reads from one or more FASTA files.
Supports batch processing following Unix philosophy: - Single file: muconeup reads ont file.fa --out-base reads - Multiple files: muconeup reads ont file1.fa file2.fa file3.fa - Glob pattern: muconeup reads ont *.simulated.fa
When processing multiple files, --out-base is auto-generated from input filenames unless explicitly provided (which applies to all files).
Examples: # Single file with custom output name muconeup --config X reads ont sample.001.fa --out-base my_reads
# Multiple files (auto-generated output names) muconeup --config X reads ont sample.001.fa sample.002.fa
# Glob pattern (shell expands) muconeup --config X reads ont sample.*.simulated.fa
Usage:
Options:
--out-dir DIRECTORY Output folder. [default: .]
--out-base TEXT Base name for output files (auto-generated if
processing multiple files).
--coverage INTEGER Target coverage. [default: 30]
--min-read-length INTEGER Minimum read length. [default: 100]
--seed INTEGER Random seed for reproducibility (same seed =
identical reads).
--help Show this message and exit.
muconeup reads pacbio¶
Simulate PacBio HiFi reads from one or more FASTA files.
Supports batch processing following Unix philosophy: - Single file: muconeup reads pacbio file.fa --model-file X.model --out-base reads - Multiple files: muconeup reads pacbio file1.fa file2.fa --model-file X.model - Glob pattern: muconeup reads pacbio *.simulated.fa --model-file X.model
When processing multiple files, --out-base is auto-generated from input filenames unless explicitly provided (which applies to all files).
Workflow: 1. Multi-pass CLR simulation (pbsim3) 2. HiFi consensus generation (CCS) 3. Read alignment (minimap2 with map-hifi preset)
Examples: # Single file with standard HiFi settings (Q20) muconeup --config X reads pacbio sample.001.fa \ --model-file /models/QSHMM-SEQUEL.model \ --out-base my_hifi
# Multiple files with high-accuracy HiFi (Q30) muconeup --config X reads pacbio sample.*.fa \ --model-file /models/QSHMM-SEQUEL.model \ --min-rq 0.999 --min-passes 5
# Ultra-deep coverage simulation muconeup --config X reads pacbio sample.fa \ --model-file /models/QSHMM-SEQUEL.model \ --coverage 100 --pass-num 5
Model Files: Download from: https://github.com/yukiteruono/pbsim3/tree/master/data - QSHMM-SEQUEL.model: Sequel II chemistry - QSHMM-RSII.model: RS II chemistry - ERRHMM-SEQUEL.model: Alternative error model
Quality Control: - pass_num ≥2 required for multi-pass (≥3 recommended) - min_passes controls CCS stringency (higher = better quality, lower yield) - min_rq=0.99 is Q20 (standard HiFi threshold) - min_rq=0.999 is Q30 (ultra-high accuracy)
Usage:
Options:
--out-dir DIRECTORY Output folder. [default: .]
--out-base TEXT Base name for output files (auto-generated if
processing multiple files).
--coverage INTEGER Target coverage (overrides config if provided).
--pass-num INTEGER Number of passes per molecule for multi-pass
CLR simulation (≥2, overrides config if
provided).
--min-passes INTEGER Minimum passes required for CCS HiFi consensus
(≥1, overrides config if provided).
--min-rq FLOAT Minimum predicted accuracy for HiFi reads
(0.0-1.0, overrides config if provided). 0.99 =
Q20 (standard HiFi).
--model-type [qshmm|errhmm] pbsim3 model type (overrides config if
provided).
--model-file FILE Path to pbsim3 model file (overrides config if
provided).
--threads INTEGER Number of threads. [default: 4]
--seed INTEGER Random seed for reproducibility (same seed =
identical reads).
--help Show this message and exit.
muconeup simulate¶
Generate MUC1 VNTR diploid haplotypes.
Single Responsibility: ONLY generates haplotype FASTA files. Does NOT run read simulation or ORF prediction. Use 'pipeline' command or chain commands manually for full workflow.
Output: - {out_base}.{iteration}.simulated.fa (haplotype sequences) - {out_base}.{iteration}.vntr_structure.txt (if --output-structure) - {out_base}.{iteration}.simulation_stats.json (statistics)
Example: muconeup --config config.json simulate --out-base output
Usage:
Options:
--out-base TEXT Base name for output files. [default:
muc1_simulated]
--out-dir DIRECTORY Output folder. [default: .]
--num-haplotypes INTEGER Number of haplotypes to simulate. [default:
2]
--seed INTEGER Random seed for reproducibility.
--reference-assembly [hg19|hg38]
Reference assembly (overrides config).
--output-structure Write VNTR structure file.
--fixed-lengths TEXT Fixed VNTR lengths or ranges (e.g., '60' or
'20-40').
--input-structure FILE Predefined VNTR structure file.
--simulate-series INTEGER Series step size for fixed-length ranges.
--mutation-name TEXT Mutation name. Use 'normal,mutation' for
dual simulation.
--mutation-targets TEXT Mutation targets as 'hap_idx,rep_idx' pairs
(1-based).
--snp-input-file FILE TSV file with predefined SNPs.
--random-snps Enable random SNP generation.
--random-snp-density FLOAT SNP density per 1000 bp.
--random-snp-output-file TEXT Output file for random SNPs.
--random-snp-region [all|constants_only|vntr_only]
Region for random SNPs. [default:
constants_only]
--random-snp-haplotypes [all|1|2]
Haplotypes for random SNPs. [default: all]
--help Show this message and exit.
Examples¶
See Quick Start for complete usage examples.