Gene Annotation Engine¶
The annotation engine is responsible for enriching gene panels with genomic coordinates, transcript information, and gene descriptions.
Overview¶
The GeneAnnotator
class provides comprehensive gene annotation functionality:
- HGNC standardization - Validates and standardizes gene symbols
- Genomic coordinates - Retrieves current coordinates from Ensembl
- Transcript information - Identifies canonical and MANE transcripts
- Gene descriptions - Adds descriptive information for each gene
- Quality control - Validates data consistency and completeness
API Reference¶
custom_panel.engine.annotator ¶
Gene annotation engine.
This module provides functionality to annotate genes with genomic information using HGNC and Ensembl APIs.
GeneAnnotator ¶
Gene annotation engine using HGNC and Ensembl APIs.
Initialize the gene annotator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
dict[str, Any] | None
|
Configuration dictionary |
None
|
annotate_genes ¶
Annotate genes with genomic information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_df
|
DataFrame
|
DataFrame with unique genes to annotate |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame with annotation columns added |
get_annotation_summary ¶
Generate summary statistics for annotations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
annotated_df
|
DataFrame
|
Annotated DataFrame |
required |
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Summary statistics dictionary |
standardize_gene_symbols ¶
Standardize gene symbols using parallel HGNC batch API calls.
This method is public to allow standardization before merging.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbols
|
list[str]
|
List of gene symbols to standardize |
required |
Returns:
Type | Description |
---|---|
dict[str, dict[str, str | None]]
|
Dictionary mapping original symbols to dict containing approved_symbol and hgnc_id |