Skip to content

Gene Annotation Engine

The annotation engine is responsible for enriching gene panels with genomic coordinates, transcript information, and gene descriptions.

Overview

The GeneAnnotator class provides comprehensive gene annotation functionality:

  • HGNC standardization - Validates and standardizes gene symbols
  • Genomic coordinates - Retrieves current coordinates from Ensembl
  • Transcript information - Identifies canonical and MANE transcripts
  • Gene descriptions - Adds descriptive information for each gene
  • Quality control - Validates data consistency and completeness

API Reference

custom_panel.engine.annotator

Gene annotation engine.

This module provides functionality to annotate genes with genomic information using HGNC and Ensembl APIs.

GeneAnnotator

GeneAnnotator(config=None)

Gene annotation engine using HGNC and Ensembl APIs.

Initialize the gene annotator.

Parameters:

Name Type Description Default
config dict[str, Any] | None

Configuration dictionary

None

annotate_genes

annotate_genes(gene_df)

Annotate genes with genomic information.

Parameters:

Name Type Description Default
gene_df DataFrame

DataFrame with unique genes to annotate

required

Returns:

Type Description
DataFrame

DataFrame with annotation columns added

clear_caches

clear_caches()

Clear API client caches.

get_annotation_summary

get_annotation_summary(annotated_df)

Generate summary statistics for annotations.

Parameters:

Name Type Description Default
annotated_df DataFrame

Annotated DataFrame

required

Returns:

Type Description
dict[str, Any]

Summary statistics dictionary

get_cache_stats

get_cache_stats()

Get cache statistics from API clients.

standardize_gene_symbols

standardize_gene_symbols(gene_symbols)

Standardize gene symbols using parallel HGNC batch API calls.

This method is public to allow standardization before merging.

Parameters:

Name Type Description Default
gene_symbols list[str]

List of gene symbols to standardize

required

Returns:

Type Description
dict[str, dict[str, str | None]]

Dictionary mapping original symbols to dict containing approved_symbol and hgnc_id