Skip to content

Ensembl Client

The Ensembl client provides access to genomic data from the Ensembl REST API.

Overview

The EnsemblClient class handles:

  • Gene coordinate lookup - Retrieves current genomic coordinates
  • Transcript information - Finds canonical and MANE transcripts
  • Batch processing - Efficiently processes multiple genes
  • Error handling - Robust fallback mechanisms for API failures
  • Caching - Reduces API load with intelligent caching

Features

  • Multiple assemblies - Support for GRCh38 and GRCh37
  • MANE transcripts - Identifies MANE Select and Plus Clinical transcripts
  • Exon data - Retrieves detailed exon coordinates for BED file generation
  • Rate limiting - Respects Ensembl API guidelines
  • Retry logic - Handles transient network issues

API Reference

custom_panel.core.ensembl_client

Ensembl client for gene annotation and genomic coordinate retrieval.

This module provides a client for interacting with the Ensembl REST API to retrieve gene coordinates, transcript information, and other genomic data.

EnsemblClient

EnsemblClient(timeout=30, max_retries=3, retry_delay=1.0, transcript_batch_size=50, cache_manager=None)

Client for interacting with the Ensembl REST API.

Initialize the Ensembl client.

Parameters:

Name Type Description Default
timeout int

Request timeout in seconds

30
max_retries int

Maximum number of retry attempts

3
retry_delay float

Delay between retries in seconds

1.0
transcript_batch_size int

Batch size for transcript queries

50
cache_manager CacheManager | None

Optional cache manager instance

None

calculate_gene_coverage

calculate_gene_coverage(gene_start, gene_end, padding=0)

Calculate gene coverage including padding.

Parameters:

Name Type Description Default
gene_start int

Gene start position

required
gene_end int

Gene end position

required
padding int

Padding to add on both sides (in base pairs)

0

Returns:

Type Description
int | None

Total gene coverage in base pairs or None if calculation fails

calculate_gene_size

calculate_gene_size(gene_symbol, species='human')

Calculate the genomic size of a gene (end - start + 1).

Parameters:

Name Type Description Default
gene_symbol str

Gene symbol

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
int | None

Gene size in base pairs or None if not found

calculate_transcript_coverage

calculate_transcript_coverage(transcript_data, padding=0)

Calculate transcript coverage including exons and padding.

Parameters:

Name Type Description Default
transcript_data dict[str, Any]

Transcript data with exon information

required
padding int

Padding to add on both sides (in base pairs)

0

Returns:

Type Description
int | None

Total coverage in base pairs or None if calculation fails

clear_cache

clear_cache()

Clear all cached results.

get_cache_info

get_cache_info()

Get cache statistics for monitoring performance.

Returns:

Type Description
dict[str, Any]

Dictionary with cache statistics

get_gene_annotation

get_gene_annotation(gene_symbol, species='human')

Get comprehensive gene annotation information.

Parameters:

Name Type Description Default
gene_symbol str

Gene symbol

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
dict[str, Any]

Dictionary with comprehensive gene annotation

get_gene_coordinates cached

get_gene_coordinates(gene_symbol, species='human')

Get genomic coordinates for a gene symbol.

Parameters:

Name Type Description Default
gene_symbol str

Gene symbol

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
dict[str, Any] | None

Dictionary with gene coordinates or None if not found

get_gene_exons_by_transcript_type

get_gene_exons_by_transcript_type(gene_data, transcript_type='canonical')

Get exon coordinates for a gene using a specific transcript type.

Parameters:

Name Type Description Default
gene_data dict[str, Any]

Gene data from batch annotation (with transcript info)

required
transcript_type str

Type of transcript ("canonical", "mane_select", "mane_clinical")

'canonical'

Returns:

Type Description
list[dict[str, Any]]

List of exon dictionaries with coordinates and gene info

get_genes_coordinates

get_genes_coordinates(gene_symbols, species='human')

Get genomic coordinates for multiple gene symbols using batch request.

This method is maintained for backward compatibility. For new code, use get_symbols_data_batch() instead.

Parameters:

Name Type Description Default
gene_symbols list[str]

List of gene symbols

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
dict[str, dict[str, Any] | None]

Dictionary mapping gene symbols to coordinate information

get_symbols_data_batch

get_symbols_data_batch(gene_symbols, species='human', expand=False)

Get genomic data for multiple gene symbols using optimized batch requests.

Parameters:

Name Type Description Default
gene_symbols list[str]

List of gene symbols

required
species str

Species name (default: "human")

'human'
expand bool

Whether to fetch transcript data (default: False)

False

Returns:

Type Description
dict[str, dict[str, Any] | None]

Dictionary mapping gene symbols to complete gene data including transcripts if expanded

get_transcript_exons

get_transcript_exons(transcript_id, species='human')

Get exon coordinates for a specific transcript.

Parameters:

Name Type Description Default
transcript_id str

Ensembl transcript ID

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
list[dict[str, Any]]

List of exon dictionaries with coordinates

rsid_to_coordinates cached

rsid_to_coordinates(rsid, species='human')

Convert rsID to genomic coordinates.

Parameters:

Name Type Description Default
rsid str

dbSNP rsID (e.g., "rs1234567")

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
dict[str, Any] | None

Dictionary with variant coordinates or None if not found