CLI Usage Guide¶
Phentrieve provides a comprehensive command-line interface (CLI) built with Typer for accessing all its functionality. This guide covers the main commands and their options.
Command Structure¶
Phentrieve commands follow this general structure:
Global Options¶
Available for all commands:
--debug: Enable verbose logging for debugging purposes--version: Show version information and exit--help: Show help message for any command
Available Commands¶
Data Management¶
# Download and process HPO data
phentrieve data prepare
# Clean all data directories (use with caution)
phentrieve data clean
Index Management¶
# Build index for a specific model
phentrieve index build --model-name "FremyCompany/BioLORD-2023-M"
# Build indexes for all supported models
phentrieve index build --all-models
# Clean indexes (removes all vector stores)
phentrieve index clean
Interactive Querying¶
The query command allows you to find HPO terms that match a given text:
# Launch interactive query mode
phentrieve query --interactive
# Query with specific text
phentrieve query --text "The patient shows microcephaly and seizures"
Query Options¶
--text: Text to process (if not provided, runs in interactive mode)--similarity-threshold: Minimum similarity score (0-1) to show results (default: 0.3)--num-results: Maximum number of results to display (default: 5)--model-name: Embedding model to use (default: "FremyCompany/BioLORD-2023-M")--enable-reranker: Enable cross-encoder reranking for improved precision--reranker-model: Cross-encoder model (default: "BAAI/bge-reranker-v2-m3")--rerank-count: Number of candidates to pass to reranker (default: 50)
Text Processing¶
Process clinical text to extract HPO terms with advanced pipeline:
# Basic processing with default strategy
phentrieve text process "Patient has arachnodactyly but no scoliosis"
# Process with specific chunking strategy
phentrieve text process "..." --strategy sliding_window_punct_conj_cleaned
# Output as JSON Lines for machine parsing (useful for pipelines)
phentrieve text process "..." --output-format json_lines
# Process from file and save to file
phentrieve text process --input-file notes.txt --output-file results.jsonl \
--output-format json_lines
# Override sliding window parameters for fine-tuning
phentrieve text process "..." \
--strategy sliding_window_punct_conj_cleaned \
--window-size 10 \
--step-size 2 \
--threshold 0.4 \
--min-segment 5
Available Chunking Strategies¶
simple: Paragraph → Sentence splitting (fastest, least granular)sliding_window: Semantic sliding window onlysliding_window_punct_conj_cleaned(Default): Full pipeline with:- Paragraph splitting
- Sentence splitting
- Fine-grained punctuation splitting
- Conjunction splitting
- Semantic sliding window
- Final chunk cleaning
Output Formats¶
json_lines(Default): JSON Lines format - one JSON object per line (machine-readable)rich_json_summary: Rich JSON with complete metadata (human + machine readable)csv_hpo_list: CSV format with HPO IDs and labels (spreadsheet-friendly)
Example JSON Lines output:
{"chunk_index":0,"chunk_text":"Patient has arachnodactyly","assertion":"affirmed","matches":[{"hpo_id":"HP:0001166","label":"Arachnodactyly","score":0.89}]}
{"chunk_index":1,"chunk_text":"no scoliosis","assertion":"negated","matches":[{"hpo_id":"HP:0002650","label":"Scoliosis","score":0.92}]}
Text Processing Options¶
--min-confidence: Minimum similarity score threshold (0.0-1.0, default: 0.3)--top-term-per-chunk: Return only the highest-scoring HPO term per chunk (boolean)--strategy: Chunking strategy (see above)--language: Text language for accurate processing (en, de, es, fr, nl)--output-format: Output format (json_lines, rich_json_summary, csv_hpo_list)
Sliding Window Parameters (override config for all strategies using sliding window):
- --window-size: Window size in tokens (default: 7)
- --step-size: Step size in tokens (default: 1)
- --threshold: Semantic similarity threshold for splitting (default: 0.5)
- --min-segment: Minimum segment length in words (default: 3)
Advanced Options:
- --input-file, -i: Read text from file instead of argument
- --output-file, -f: Save results to file instead of stdout
- --cross-language-hpo-retrieval: Enable retrieval of HPO terms in a different language
HPO Term Similarity¶
Calculate semantic similarity between two specific HPO terms:
# Calculate similarity between two HPO terms
phentrieve similarity calculate HP:0001250 HP:0001251 --formula hybrid
Similarity Options¶
--formula: Similarity formula to use (hybrid, resnik, lin, jc, ic)
Benchmarking¶
Run benchmarks to evaluate model performance:
# Run a benchmark with default settings
phentrieve benchmark run
# Run a benchmark with a specific model
phentrieve benchmark run --model-name "FremyCompany/BioLORD-2023-M"
Benchmarking Options¶
--model-name: Model to benchmark--test-file: Path to test cases file--output-dir: Directory to save benchmark results--enable-reranker: Include reranking in the benchmark--gpu: Use GPU acceleration if available
Getting Help¶
For any command, you can add --help to see available options:
Environment Variables¶
Phentrieve's behavior can be configured through environment variables:
PHENTRIEVE_DATA_DIR: Base directory for all Phentrieve dataPHENTRIEVE_HPO_DATA_DIR: Directory for HPO data filesPHENTRIEVE_INDEX_DIR: Directory for vector indexesPHENTRIEVE_RESULTS_DIR: Directory for benchmark resultsPHENTRIEVE_TRANSLATIONS_DIR: Directory for translation files (if used)