CLI Reference¶

Complete command-line interface documentation for MucOneUp.

muconeup¶

MucOneUp - MUC1 VNTR diploid reference simulator.

Philosophy: Each command does ONE thing (Unix philosophy) simulate - Generate haplotypes ONLY reads - Simulate reads from FASTA (supports multiple files) analyze - Analyze FASTA (ORFs, stats) (supports multiple files)

Examples: # Generate haplotypes muconeup --config X simulate --out-base Y

# Process single file muconeup --config X reads illumina Y.001.simulated.fa --out-base reads

# Process multiple files (batch) muconeup --config X reads illumina Y..simulated.fa muconeup --config X analyze orfs Y..simulated.fa

# Shell composition (Unix philosophy) muconeup --config X simulate --fixed-lengths 20-40 --simulate-series 1 for f in Y.*.fa; do muconeup --config X reads illumina "$f"; done

Usage:

muconeup [OPTIONS] COMMAND [ARGS]...

Options:

  --version                       Show the version and exit.
  --config FILE                   Path to JSON configuration file.  [required]
  --log-level [DEBUG|INFO|WARNING|ERROR|CRITICAL|NONE]
                                  Set logging level.
  -v, --verbose                   Enable verbose output (sets log level to
                                  DEBUG).
  --help                          Show this message and exit.

muconeup analyze¶

Analysis utilities.

Single Responsibility: Analyze ANY FASTA file. Works with MucOneUp outputs or external sequences.

Usage:

muconeup analyze [OPTIONS] COMMAND [ARGS]...

Options:

  --help  Show this message and exit.

muconeup analyze orfs¶

Predict ORFs and detect toxic protein features from one or more FASTA files.

Supports batch processing following Unix philosophy: - Single file: muconeup analyze orfs file.fa --out-base analysis - Multiple files: muconeup analyze orfs file1.fa file2.fa file3.fa - Glob pattern: muconeup analyze orfs *.simulated.fa

When processing multiple files, --out-base is auto-generated from input filenames unless explicitly provided (which applies to all files).

Examples: # Single file with custom output name muconeup --config X analyze orfs sample.001.fa --out-base my_analysis

# Multiple files (auto-generated output names) muconeup --config X analyze orfs sample.001.fa sample.002.fa

# Glob pattern (shell expands) muconeup --config X analyze orfs sample.*.simulated.fa

Usage:

muconeup analyze orfs [OPTIONS] INPUT_FASTAS...

Options:

  --out-dir DIRECTORY   Output folder.  [default: .]
  --out-base TEXT       Base name for output files (auto-generated if
                        processing multiple files).
  --orf-min-aa INTEGER  Minimum ORF length in amino acids.  [default: 100]
  --orf-aa-prefix TEXT  Filter ORFs by prefix (e.g., MTSSV).
  --help                Show this message and exit.

muconeup analyze snapshot-validate¶

Validate SNaPshot assay for MUC1 VNTR mutations.

Simulates complete SNaPshot workflow: PCR → MwoI digest → extension → detection.

Examples: # Validate dupC mutation in a sample muconeup --config config.json analyze snapshot-validate sample.fa --mutation dupC

# Save results to JSON muconeup --config config.json analyze snapshot-validate sample.fa --mutation dupC --output results.json

Usage:

muconeup analyze snapshot-validate [OPTIONS] INPUT_FASTA

Options:

  --mutation TEXT  Mutation name to validate (e.g., 'dupC').  [required]
  --output PATH    Output JSON file for validation results (prints to stdout
                   if not specified).
  --help           Show this message and exit.

muconeup analyze stats¶

Generate basic sequence statistics from one or more FASTA files.

Supports batch processing following Unix philosophy: - Single file: muconeup analyze stats file.fa --out-base stats - Multiple files: muconeup analyze stats file1.fa file2.fa file3.fa - Glob pattern: muconeup analyze stats *.simulated.fa

When processing multiple files, --out-base is auto-generated from input filenames unless explicitly provided (which applies to all files).

Examples: # Single file with custom output name muconeup --config X analyze stats sample.001.fa --out-base my_stats

# Multiple files (auto-generated output names) muconeup --config X analyze stats sample.001.fa sample.002.fa

# Glob pattern (shell expands) muconeup --config X analyze stats sample.*.simulated.fa

Usage:

muconeup analyze stats [OPTIONS] INPUT_FASTAS...

Options:

  --out-dir DIRECTORY  Output folder.  [default: .]
  --out-base TEXT      Base name for output files (auto-generated if
                       processing multiple files).
  --help               Show this message and exit.

muconeup analyze vntr-stats¶

Analyze VNTR structures and compute transition probabilities.

Processes a CSV/TSV file containing VNTR structures, calculates statistics (min/max/mean/median repeat units), and builds a transition probability matrix showing the likelihood of each repeat unit following another.

The analysis removes duplicate VNTR structures and includes an "END" state representing sequence termination. Unknown repeat tokens (not in config) trigger warnings but don't cause failure.

Examples: # Analyze example VNTR database muconeup --config X analyze vntr-stats data/examples/vntr_database.tsv --header

# Use custom column and save to file muconeup --config X analyze vntr-stats data.csv \ --delimiter "," --structure-column "sequence" --output stats.json

# Column index without header muconeup --config X analyze vntr-stats data.tsv --structure-column 3

# Pipe to jq for filtering muconeup --config X analyze vntr-stats data/examples/vntr_database.tsv \ --header | jq '.mean_repeats'

Output JSON contains: - Statistics: min/max/mean/median repeat counts - Probabilities: State transition matrix (including END state) - Repeats: Known repeat dictionary from config

Usage:

muconeup analyze vntr-stats [OPTIONS] INPUT_FILE

Options:

  --structure-column TEXT  Column name (if header) or 0-based index containing
                           VNTR structure.  [default: vntr]
  --delimiter TEXT         Field delimiter for input file.  [default:      ]
  --header                 Specify if input file has header row.
  -o, --output PATH        Output JSON file (default: stdout).
  --help                   Show this message and exit.

muconeup reads¶

Read simulation utilities.

Single Responsibility: Simulate reads from ANY FASTA file. Works with MucOneUp outputs or external sequences.

Usage:

muconeup reads [OPTIONS] COMMAND [ARGS]...

Options:

  --help  Show this message and exit.

muconeup reads illumina¶

Simulate Illumina short reads from one or more FASTA files.

Supports batch processing following Unix philosophy: - Single file: muconeup reads illumina file.fa --out-base reads - Multiple files: muconeup reads illumina file1.fa file2.fa file3.fa - Glob pattern: muconeup reads illumina *.simulated.fa

When processing multiple files, --out-base is auto-generated from input filenames unless explicitly provided (which applies to all files).

Examples: # Single file with custom output name muconeup --config X reads illumina sample.001.fa --out-base my_reads

# Multiple files (auto-generated output names) muconeup --config X reads illumina sample.001.fa sample.002.fa

# Glob pattern (shell expands) muconeup --config X reads illumina sample.*.simulated.fa

# Compose with shell (Unix philosophy) for f in *.fa; do muconeup --config X reads illumina "$f"; done

Usage:

muconeup reads illumina [OPTIONS] INPUT_FASTAS...

Options:

  --out-dir DIRECTORY  Output folder.  [default: .]
  --out-base TEXT      Base name for output files (auto-generated if
                       processing multiple files).
  --coverage INTEGER   Target sequencing coverage.  [default: 30]
  --threads INTEGER    Number of threads.  [default: 8]
  --seed INTEGER       Random seed for reproducibility (same seed = identical
                       reads).
  --help               Show this message and exit.

muconeup reads ont¶

Simulate Oxford Nanopore long reads from one or more FASTA files.

Supports batch processing following Unix philosophy: - Single file: muconeup reads ont file.fa --out-base reads - Multiple files: muconeup reads ont file1.fa file2.fa file3.fa - Glob pattern: muconeup reads ont *.simulated.fa

When processing multiple files, --out-base is auto-generated from input filenames unless explicitly provided (which applies to all files).

Examples: # Single file with custom output name muconeup --config X reads ont sample.001.fa --out-base my_reads

# Multiple files (auto-generated output names) muconeup --config X reads ont sample.001.fa sample.002.fa

# Glob pattern (shell expands) muconeup --config X reads ont sample.*.simulated.fa

Usage:

muconeup reads ont [OPTIONS] INPUT_FASTAS...

Options:

  --out-dir DIRECTORY        Output folder.  [default: .]
  --out-base TEXT            Base name for output files (auto-generated if
                             processing multiple files).
  --coverage INTEGER         Target coverage.  [default: 30]
  --min-read-length INTEGER  Minimum read length.  [default: 100]
  --seed INTEGER             Random seed for reproducibility (same seed =
                             identical reads).
  --help                     Show this message and exit.

muconeup reads pacbio¶

Simulate PacBio HiFi reads from one or more FASTA files.

Supports batch processing following Unix philosophy: - Single file: muconeup reads pacbio file.fa --model-file X.model --out-base reads - Multiple files: muconeup reads pacbio file1.fa file2.fa --model-file X.model - Glob pattern: muconeup reads pacbio *.simulated.fa --model-file X.model

When processing multiple files, --out-base is auto-generated from input filenames unless explicitly provided (which applies to all files).

Workflow: 1. Multi-pass CLR simulation (pbsim3) 2. HiFi consensus generation (CCS) 3. Read alignment (minimap2 with map-hifi preset)

Examples: # Single file with standard HiFi settings (Q20) muconeup --config X reads pacbio sample.001.fa \ --model-file /models/QSHMM-SEQUEL.model \ --out-base my_hifi

# Multiple files with high-accuracy HiFi (Q30) muconeup --config X reads pacbio sample.*.fa \ --model-file /models/QSHMM-SEQUEL.model \ --min-rq 0.999 --min-passes 5

# Ultra-deep coverage simulation muconeup --config X reads pacbio sample.fa \ --model-file /models/QSHMM-SEQUEL.model \ --coverage 100 --pass-num 5

Model Files: Download from: https://github.com/yukiteruono/pbsim3/tree/master/data - QSHMM-SEQUEL.model: Sequel II chemistry - QSHMM-RSII.model: RS II chemistry - ERRHMM-SEQUEL.model: Alternative error model

Quality Control: - pass_num ≥2 required for multi-pass (≥3 recommended) - min_passes controls CCS stringency (higher = better quality, lower yield) - min_rq=0.99 is Q20 (standard HiFi threshold) - min_rq=0.999 is Q30 (ultra-high accuracy)

Usage:

muconeup reads pacbio [OPTIONS] INPUT_FASTAS...

Options:

  --out-dir DIRECTORY          Output folder.  [default: .]
  --out-base TEXT              Base name for output files (auto-generated if
                               processing multiple files).
  --coverage INTEGER           Target coverage (overrides config if provided).
  --pass-num INTEGER           Number of passes per molecule for multi-pass
                               CLR simulation (≥2, overrides config if
                               provided).
  --min-passes INTEGER         Minimum passes required for CCS HiFi consensus
                               (≥1, overrides config if provided).
  --min-rq FLOAT               Minimum predicted accuracy for HiFi reads
                               (0.0-1.0, overrides config if provided). 0.99 =
                               Q20 (standard HiFi).
  --model-type [qshmm|errhmm]  pbsim3 model type (overrides config if
                               provided).
  --model-file FILE            Path to pbsim3 model file (overrides config if
                               provided).
  --threads INTEGER            Number of threads.  [default: 4]
  --seed INTEGER               Random seed for reproducibility (same seed =
                               identical reads).
  --help                       Show this message and exit.

muconeup simulate¶

Generate MUC1 VNTR diploid haplotypes.

Single Responsibility: ONLY generates haplotype FASTA files. Does NOT run read simulation or ORF prediction. Use 'pipeline' command or chain commands manually for full workflow.

Output: - {out_base}.{iteration}.simulated.fa (haplotype sequences) - {out_base}.{iteration}.vntr_structure.txt (if --output-structure) - {out_base}.{iteration}.simulation_stats.json (statistics)

Example: muconeup --config config.json simulate --out-base output

Usage:

muconeup simulate [OPTIONS]

Options:

  --out-base TEXT                 Base name for output files.  [default:
                                  muc1_simulated]
  --out-dir DIRECTORY             Output folder.  [default: .]
  --num-haplotypes INTEGER        Number of haplotypes to simulate.  [default:
                                  2]
  --seed INTEGER                  Random seed for reproducibility.
  --reference-assembly [hg19|hg38]
                                  Reference assembly (overrides config).
  --output-structure              Write VNTR structure file.
  --fixed-lengths TEXT            Fixed VNTR lengths or ranges (e.g., '60' or
                                  '20-40').
  --input-structure FILE          Predefined VNTR structure file.
  --simulate-series INTEGER       Series step size for fixed-length ranges.
  --mutation-name TEXT            Mutation name. Use 'normal,mutation' for
                                  dual simulation.
  --mutation-targets TEXT         Mutation targets as 'hap_idx,rep_idx' pairs
                                  (1-based).
  --snp-input-file FILE           TSV file with predefined SNPs.
  --random-snps                   Enable random SNP generation.
  --random-snp-density FLOAT      SNP density per 1000 bp.
  --random-snp-output-file TEXT   Output file for random SNPs.
  --random-snp-region [all|constants_only|vntr_only]
                                  Region for random SNPs.  [default:
                                  constants_only]
  --random-snp-haplotypes [all|1|2]
                                  Haplotypes for random SNPs.  [default: all]
  --help                          Show this message and exit.

Examples¶

See Quick Start for complete usage examples.