Ensembl Client¶
The Ensembl client provides access to genomic data from the Ensembl REST API.
Overview¶
The EnsemblClient
class handles:
- Gene coordinate lookup - Retrieves current genomic coordinates
- Transcript information - Finds canonical and MANE transcripts
- Batch processing - Efficiently processes multiple genes
- Error handling - Robust fallback mechanisms for API failures
- Caching - Reduces API load with intelligent caching
Features¶
- Multiple assemblies - Support for GRCh38 and GRCh37
- MANE transcripts - Identifies MANE Select and Plus Clinical transcripts
- Exon data - Retrieves detailed exon coordinates for BED file generation
- Rate limiting - Respects Ensembl API guidelines
- Retry logic - Handles transient network issues
API Reference¶
custom_panel.core.ensembl_client ¶
Ensembl client for gene annotation and genomic coordinate retrieval.
This module provides a client for interacting with the Ensembl REST API to retrieve gene coordinates, transcript information, and other genomic data.
EnsemblClient ¶
EnsemblClient(timeout=30, max_retries=3, retry_delay=1.0, transcript_batch_size=50, cache_manager=None)
Client for interacting with the Ensembl REST API.
Initialize the Ensembl client.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeout
|
int
|
Request timeout in seconds |
30
|
max_retries
|
int
|
Maximum number of retry attempts |
3
|
retry_delay
|
float
|
Delay between retries in seconds |
1.0
|
transcript_batch_size
|
int
|
Batch size for transcript queries |
50
|
cache_manager
|
CacheManager | None
|
Optional cache manager instance |
None
|
calculate_gene_coverage ¶
Calculate gene coverage including padding.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_start
|
int
|
Gene start position |
required |
gene_end
|
int
|
Gene end position |
required |
padding
|
int
|
Padding to add on both sides (in base pairs) |
0
|
Returns:
Type | Description |
---|---|
int | None
|
Total gene coverage in base pairs or None if calculation fails |
calculate_gene_size ¶
Calculate the genomic size of a gene (end - start + 1).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbol
|
str
|
Gene symbol |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
int | None
|
Gene size in base pairs or None if not found |
calculate_transcript_coverage ¶
Calculate transcript coverage including exons and padding.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transcript_data
|
dict[str, Any]
|
Transcript data with exon information |
required |
padding
|
int
|
Padding to add on both sides (in base pairs) |
0
|
Returns:
Type | Description |
---|---|
int | None
|
Total coverage in base pairs or None if calculation fails |
get_cache_info ¶
Get cache statistics for monitoring performance.
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Dictionary with cache statistics |
get_gene_annotation ¶
Get comprehensive gene annotation information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbol
|
str
|
Gene symbol |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Dictionary with comprehensive gene annotation |
get_gene_coordinates
cached
¶
Get genomic coordinates for a gene symbol.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbol
|
str
|
Gene symbol |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
dict[str, Any] | None
|
Dictionary with gene coordinates or None if not found |
get_gene_exons_by_transcript_type ¶
Get exon coordinates for a gene using a specific transcript type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_data
|
dict[str, Any]
|
Gene data from batch annotation (with transcript info) |
required |
transcript_type
|
str
|
Type of transcript ("canonical", "mane_select", "mane_clinical") |
'canonical'
|
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
List of exon dictionaries with coordinates and gene info |
get_genes_coordinates ¶
Get genomic coordinates for multiple gene symbols using batch request.
This method is maintained for backward compatibility. For new code, use get_symbols_data_batch() instead.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbols
|
list[str]
|
List of gene symbols |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
dict[str, dict[str, Any] | None]
|
Dictionary mapping gene symbols to coordinate information |
get_symbols_data_batch ¶
Get genomic data for multiple gene symbols using optimized batch requests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbols
|
list[str]
|
List of gene symbols |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
expand
|
bool
|
Whether to fetch transcript data (default: False) |
False
|
Returns:
Type | Description |
---|---|
dict[str, dict[str, Any] | None]
|
Dictionary mapping gene symbols to complete gene data including transcripts if expanded |
get_transcript_exons ¶
Get exon coordinates for a specific transcript.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transcript_id
|
str
|
Ensembl transcript ID |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
List of exon dictionaries with coordinates |
rsid_to_coordinates
cached
¶
Convert rsID to genomic coordinates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rsid
|
str
|
dbSNP rsID (e.g., "rs1234567") |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
dict[str, Any] | None
|
Dictionary with variant coordinates or None if not found |