API Usage¶

This page shows the concrete API calls for full-text extraction, including the LLM-backed path added for end-to-end validation.

Base URL¶

Local API development runs on http://localhost:8734.

Full-Text Extraction¶

CLI example¶

phentrieve text process --extraction-backend llm note.txt

Standard backend¶

curl -X POST "http://localhost:8734/api/v1/text/process" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The patient exhibits microcephaly and frequent seizures.",
    "extraction_backend": "standard"
  }'

LLM backend¶

curl -X POST "http://localhost:8734/api/v1/text/process" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The patient exhibits microcephaly and frequent seizures.",
    "extraction_backend": "llm",
    "llm_mode": "two_phase"
  }'

The LLM response keeps the same top-level shape and includes metadata such as extraction_backend, llm_model, and llm_mode.

Public REST clients cannot select the LLM provider, model, or base URL. The server owns the public LLM target and currently routes LLM extraction to gemini/gemini-3.1-flash-lite-preview. Requests that include llm_model, llm_provider, or llm_base_url are rejected; omit those fields and use llm_mode: "two_phase" when requesting LLM extraction.

LLM extraction uses defense-in-depth prompt-injection controls: submitted text is treated as untrusted data, prompt templates separate data from instructions, and backend validation checks structured output before returning suggestions. These controls reduce risk but do not make the service appropriate for clinical decision support. Use the public REST API for research workflows only, and do not submit identifiable patient data to public deployments.

Production Environment¶

The FastAPI layer uses these environment variables for production LLM handling:

export PHENTRIEVE_ENV=production
export PHENTRIEVE_TRUSTED_PROXY_CIDRS="127.0.0.1/32,10.0.0.0/8"
export PHENTRIEVE_LLM_DAILY_LIMIT=3
export PHENTRIEVE_LLM_QUOTA_DB_PATH="../data/app/llm_quota.db"

PHENTRIEVE_ENV controls whether the API is running in development or production mode.
PHENTRIEVE_TRUSTED_PROXY_CIDRS defines which proxy networks are allowed to forward client IPs for quota tracking.
PHENTRIEVE_LLM_DAILY_LIMIT sets the number of successful anonymous LLM API analyses allowed per UTC day.
PHENTRIEVE_LLM_QUOTA_DB_PATH points to the SQLite database used for API quota persistence.

API Documentation¶

When the API is running, the OpenAPI pages are available at:

Swagger UI: http://localhost:8734/docs
ReDoc: http://localhost:8734/redoc

Profiles vs API¶

The HTTP API does not accept the CLI's --profile flag - request fields are explicit. If you're moving a workflow from the CLI to the API, copy the relevant fields from your profile into the request body. See Configuration Profiles for the profile schema.

Adaptive Re-Chunking¶

TextProcessingRequest accepts an optional adaptive_rechunking field that mirrors the YAML extraction.adaptive_rechunking block. When provided with enabled: true, poor-quality chunks are subdivided at sentence boundaries and re-queried. The full schema and trigger semantics are described in Adaptive Re-Chunking.

Request example:

curl -X POST "http://localhost:8734/api/v1/text/process" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Patient with intellectual disability, microcephaly, seizures, and short stature.",
    "adaptive_rechunking": {
      "enabled": true,
      "quality_threshold": 0.55,
      "margin_threshold": 0.03,
      "max_depth": 2
    }
  }'

All adaptive_rechunking fields are optional; omitted fields fall through to server-side defaults. The full set of knobs is:

enabled (bool, default false)
quality_threshold (float, default 0.55)
margin_threshold (float, default 0.03)
max_depth (int, default 2)
min_chunk_chars (int, default 30)
max_sentences_per_subchunk (int, default 3)
overlap_sentences (int, default 1)
score_improvement_gate (float, default 0.05)
use_ontology_coherence (bool, default false; reserved, inert in v1)

When the feature is enabled and triggered, the response carries a meta.adaptive_rechunking block summarizing what happened:

{
  "meta": {
    "extraction_backend": "standard",
    "adaptive_rechunking": {
      "enabled": true,
      "trigger_count": 3,
      "subdivided_count": 2,
      "reverted_count": 1,
      "max_depth_reached": 1,
      "extra_chunks_added": 4
    }
  }
}

The block is omitted when adaptive_rechunking is disabled or omitted from the request.