Ontology Similarity¶
Phentrieve includes functionality to calculate semantic similarity between HPO terms using the ontology graph structure. This is useful for understanding relationships between different phenotypes and evaluating the precision of term mapping.
Semantic Similarity in Ontologies¶
Semantic similarity in ontologies measures how close two terms are in meaning, based on their positions in the ontology graph and their information content. Several established methods exist for calculating this similarity:
Information Content (IC)¶
Information Content (IC) is a measure of how specific a term is within the ontology. The more specific a term, the higher its IC value.
IC is calculated based on the frequency of the term in a reference corpus or based on the structure of the ontology. In the structural approach, terms with many descendants (more general terms) have lower IC values than terms with few descendants (more specific terms).
Similarity Metrics¶
Phentrieve supports several similarity metrics:
Resnik Similarity¶
Measures similarity based on the Information Content (IC) of the Most Informative Common Ancestor (MICA) of two terms:
Lin Similarity¶
Normalizes Resnik similarity by the information content of the two terms:
Jiang-Conrath Similarity¶
Considers both the information content of the terms and their MICA:
Hybrid Similarity (Default)¶
Phentrieve's default hybrid approach combines elements of multiple metrics for more balanced results.
Using Ontology Similarity in Phentrieve¶
CLI Usage¶
Calculate semantic similarity between two specific HPO terms:
# Calculate similarity between two HPO terms using the default 'hybrid' formula
phentrieve similarity calculate HP:0001250 HP:0001251
# Specify a different similarity formula
phentrieve similarity calculate HP:0001250 HP:0001251 --formula resnik
Available formulas:
hybrid(default): Phentrieve's custom hybrid approachresnik: Resnik similaritylin: Lin similarityjc: Jiang-Conrath similarityic: Information Content only
Programmatic Usage¶
from phentrieve.similarity import calculate_similarity
# Calculate similarity between two HPO terms
similarity = calculate_similarity("HP:0001250", "HP:0001251", formula="hybrid")
print(f"Similarity: {similarity}")
Applications of Ontology Similarity¶
Ontology similarity in Phentrieve serves several purposes:
- Result Evaluation: Assess how close a retrieved HPO term is to the ground truth term
- Term Grouping: Group similar HPO terms for more concise reporting
- Query Expansion: Find related terms to expand search queries
- Hierarchical Navigation: Navigate the HPO hierarchy to find more general or specific terms
Precomputed Graph Data
Phentrieve precomputes and caches graph data (HPO term depths, ancestor counts, etc.) to make similarity calculations more efficient.