Benchmarking Guide¶
This page shows the concrete benchmark commands for retrieval and LLM full-text validation.
Retrieval Benchmark¶
phentrieve benchmark run \
--test-file tests/data/benchmarks/german/tiny_v1.json \
--model-name "sentence-transformers/LaBSE"
LLM Full-Text Benchmark¶
The primary LLM benchmark workflow uses the converted PhenoBERT full-text
corpus under tests/data/en/phenobert/. The benchmark instantiates the LLM
pipeline directly and does not go through the FastAPI quota layer.
phentrieve benchmark llm \
--test-file tests/data/en/phenobert \
--dataset GeneReviews \
--llm-model gemini-2.5-flash
The converted corpus contains these dataset subsets:
GSC_plusID_68GeneReviewsall
The output JSON includes:
casesdatasetllm_modelllm_modedataset_metadatametricsresultsoutput_path
Corpus Acquisition And Conversion¶
If you need to rebuild the corpus, use the reproducible PhenoBERT download and conversion workflow already documented in this repo:
scripts/PHENOBERT-DOWNLOAD-GUIDE.mdscripts/README.mdscripts/convert_phenobert_data.py
Typical conversion flow:
python scripts/convert_phenobert_data.py \
--phenobert-data /path/to/PhenoBERT/phenobert/data \
--output tests/data/en/phenobert \
--hpo-data data/hpo_core_data
Use a specific upstream PhenoBERT commit for reproducibility and keep the
generated conversion_report.json.
Legacy Smoke Datasets¶
The small JSON files under tests/data/benchmarks/ remain useful for quick
smoke validation, but they are not the primary full-text benchmark workflow.
phentrieve benchmark llm \
--test-file tests/data/benchmarks/german/tiny_v1.json \
--llm-model gemini-2.5-flash
Example CLI LLM Run¶
phentrieve text process clinical_note.txt \
--extraction-backend llm \
--llm-model gemini-3.1-flash-lite-preview
API Quota Environment¶
These variables matter for API and frontend validation. They do not gate the direct benchmark command above.