Skip to content

Ensembl Client

The Ensembl client provides access to genomic data from the Ensembl REST API.

Overview

The EnsemblClient class handles:

  • Gene coordinate lookup - Retrieves current genomic coordinates
  • Transcript information - Finds canonical and MANE transcripts
  • Batch processing - Efficiently processes multiple genes
  • Error handling - Robust fallback mechanisms for API failures
  • Caching - Reduces API load with intelligent caching

Features

  • Multiple assemblies - Support for GRCh38 and GRCh37
  • MANE transcripts - Identifies MANE Select and Plus Clinical transcripts
  • Exon data - Retrieves detailed exon coordinates for BED file generation
  • Rate limiting - Respects Ensembl API guidelines
  • Retry logic - Handles transient network issues

API Reference

custom_panel.core.ensembl_client

Ensembl client for gene annotation and genomic coordinate retrieval.

This module provides a client for interacting with the Ensembl REST API to retrieve gene coordinates, transcript information, and other genomic data.

EnsemblClient

EnsemblClient(
    timeout=30,
    max_retries=3,
    retry_delay=1.0,
    transcript_batch_size=50,
    cache_manager=None,
)

Client for interacting with the Ensembl REST API.

Initialize the Ensembl client.

Parameters:

Name Type Description Default
timeout int

Request timeout in seconds

30
max_retries int

Maximum number of retry attempts

3
retry_delay float

Delay between retries in seconds

1.0
transcript_batch_size int

Batch size for transcript queries

50
cache_manager CacheManager | None

Optional cache manager instance

None

calculate_gene_coverage

calculate_gene_coverage(gene_start, gene_end, padding=0)

Calculate gene coverage including padding.

Parameters:

Name Type Description Default
gene_start int

Gene start position

required
gene_end int

Gene end position

required
padding int

Padding to add on both sides (in base pairs)

0

Returns:

Type Description
int | None

Total gene coverage in base pairs or None if calculation fails

calculate_gene_size

calculate_gene_size(gene_symbol, species='human')

Calculate the genomic size of a gene (end - start + 1).

Parameters:

Name Type Description Default
gene_symbol str

Gene symbol

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
int | None

Gene size in base pairs or None if not found

calculate_transcript_coverage

calculate_transcript_coverage(transcript_data, padding=0)

Calculate transcript coverage including exons and padding.

Parameters:

Name Type Description Default
transcript_data dict[str, Any]

Transcript data with exon information

required
padding int

Padding to add on both sides (in base pairs)

0

Returns:

Type Description
int | None

Total coverage in base pairs or None if calculation fails

clear_cache

clear_cache()

Clear all cached results.

extract_coordinates_from_variation

extract_coordinates_from_variation(
    variation_data, preferred_assembly="GRCh38"
)

Extract coordinate information from Ensembl variation data.

Parameters:

Name Type Description Default
variation_data dict[str, Any]

Variation data from Ensembl API

required
preferred_assembly str

Preferred genome assembly (default: "GRCh38")

'GRCh38'

Returns:

Type Description
dict[str, Any] | None

Dictionary with coordinate information or None if not found

get_cache_info

get_cache_info()

Get cache statistics for monitoring performance.

Returns:

Type Description
dict[str, Any]

Dictionary with cache statistics

get_gene_annotation

get_gene_annotation(gene_symbol, species='human')

Get comprehensive gene annotation information.

Parameters:

Name Type Description Default
gene_symbol str

Gene symbol

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
dict[str, Any]

Dictionary with comprehensive gene annotation

get_gene_coordinates cached

get_gene_coordinates(gene_symbol, species='human')

Get genomic coordinates for a gene symbol.

Parameters:

Name Type Description Default
gene_symbol str

Gene symbol

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
dict[str, Any] | None

Dictionary with gene coordinates or None if not found

get_gene_exons_by_transcript_type

get_gene_exons_by_transcript_type(
    gene_data, transcript_type="canonical"
)

Get exon coordinates for a gene using a specific transcript type.

Parameters:

Name Type Description Default
gene_data dict[str, Any]

Gene data from batch annotation (with transcript info)

required
transcript_type str

Type of transcript ("canonical", "mane_select", "mane_clinical")

'canonical'

Returns:

Type Description
list[dict[str, Any]]

List of exon dictionaries with coordinates and gene info

get_genes_coordinates

get_genes_coordinates(gene_symbols, species='human')

Get genomic coordinates for multiple gene symbols using batch request.

This method is maintained for backward compatibility. For new code, use get_symbols_data_batch() instead.

Parameters:

Name Type Description Default
gene_symbols list[str]

List of gene symbols

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
dict[str, dict[str, Any] | None]

Dictionary mapping gene symbols to coordinate information

get_symbols_data_batch

get_symbols_data_batch(
    gene_symbols, species="human", expand=False
)

Get genomic data for multiple gene symbols using optimized batch requests.

Parameters:

Name Type Description Default
gene_symbols list[str]

List of gene symbols

required
species str

Species name (default: "human")

'human'
expand bool

Whether to fetch transcript data (default: False)

False

Returns:

Type Description
dict[str, dict[str, Any] | None]

Dictionary mapping gene symbols to complete gene data including transcripts if expanded

get_transcript_exons

get_transcript_exons(transcript_id, species='human')

Get exon coordinates for a specific transcript.

Parameters:

Name Type Description Default
transcript_id str

Ensembl transcript ID

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
list[dict[str, Any]]

List of exon dictionaries with coordinates

get_variations_batch

get_variations_batch(
    rsids, species="homo_sapiens", batch_size=25
)

Get variation information for multiple rsIDs using batched requests.

This method automatically splits large requests into smaller batches to stay under the Ensembl API limit of 200 items per request.

Parameters:

Name Type Description Default
rsids list[str]

List of rsID strings (e.g., ["rs123456", "rs789012"])

required
species str

Species name (default: "homo_sapiens")

'homo_sapiens'
batch_size int

Maximum number of rsIDs per batch (default: 25)

25

Returns:

Type Description
dict[str, dict[str, Any]]

Dictionary mapping rsID to variation information

Raises:

Type Description
RequestException

If request fails

rsid_to_coordinates cached

rsid_to_coordinates(rsid, species='human')

Convert rsID to genomic coordinates.

Parameters:

Name Type Description Default
rsid str

dbSNP rsID (e.g., "rs1234567")

required
species str

Species name (default: "human")

'human'

Returns:

Type Description
dict[str, Any] | None

Dictionary with variant coordinates or None if not found