Ensembl Client¶
The Ensembl client provides access to genomic data from the Ensembl REST API.
Overview¶
The EnsemblClient
class handles:
- Gene coordinate lookup - Retrieves current genomic coordinates
- Transcript information - Finds canonical and MANE transcripts
- Batch processing - Efficiently processes multiple genes
- Error handling - Robust fallback mechanisms for API failures
- Caching - Reduces API load with intelligent caching
Features¶
- Multiple assemblies - Support for GRCh38 and GRCh37
- MANE transcripts - Identifies MANE Select and Plus Clinical transcripts
- Exon data - Retrieves detailed exon coordinates for BED file generation
- Rate limiting - Respects Ensembl API guidelines
- Retry logic - Handles transient network issues
API Reference¶
custom_panel.core.ensembl_client ¶
Ensembl client for gene annotation and genomic coordinate retrieval.
This module provides a client for interacting with the Ensembl REST API to retrieve gene coordinates, transcript information, and other genomic data.
EnsemblClient ¶
EnsemblClient(
timeout=30,
max_retries=3,
retry_delay=1.0,
transcript_batch_size=50,
cache_manager=None,
)
Client for interacting with the Ensembl REST API.
Initialize the Ensembl client.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeout
|
int
|
Request timeout in seconds |
30
|
max_retries
|
int
|
Maximum number of retry attempts |
3
|
retry_delay
|
float
|
Delay between retries in seconds |
1.0
|
transcript_batch_size
|
int
|
Batch size for transcript queries |
50
|
cache_manager
|
CacheManager | None
|
Optional cache manager instance |
None
|
calculate_gene_coverage ¶
Calculate gene coverage including padding.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_start
|
int
|
Gene start position |
required |
gene_end
|
int
|
Gene end position |
required |
padding
|
int
|
Padding to add on both sides (in base pairs) |
0
|
Returns:
Type | Description |
---|---|
int | None
|
Total gene coverage in base pairs or None if calculation fails |
calculate_gene_size ¶
Calculate the genomic size of a gene (end - start + 1).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbol
|
str
|
Gene symbol |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
int | None
|
Gene size in base pairs or None if not found |
calculate_transcript_coverage ¶
Calculate transcript coverage including exons and padding.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transcript_data
|
dict[str, Any]
|
Transcript data with exon information |
required |
padding
|
int
|
Padding to add on both sides (in base pairs) |
0
|
Returns:
Type | Description |
---|---|
int | None
|
Total coverage in base pairs or None if calculation fails |
extract_coordinates_from_variation ¶
Extract coordinate information from Ensembl variation data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
variation_data
|
dict[str, Any]
|
Variation data from Ensembl API |
required |
preferred_assembly
|
str
|
Preferred genome assembly (default: "GRCh38") |
'GRCh38'
|
Returns:
Type | Description |
---|---|
dict[str, Any] | None
|
Dictionary with coordinate information or None if not found |
get_cache_info ¶
Get cache statistics for monitoring performance.
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Dictionary with cache statistics |
get_gene_annotation ¶
Get comprehensive gene annotation information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbol
|
str
|
Gene symbol |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Dictionary with comprehensive gene annotation |
get_gene_coordinates
cached
¶
Get genomic coordinates for a gene symbol.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbol
|
str
|
Gene symbol |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
dict[str, Any] | None
|
Dictionary with gene coordinates or None if not found |
get_gene_exons_by_transcript_type ¶
Get exon coordinates for a gene using a specific transcript type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_data
|
dict[str, Any]
|
Gene data from batch annotation (with transcript info) |
required |
transcript_type
|
str
|
Type of transcript ("canonical", "mane_select", "mane_clinical") |
'canonical'
|
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
List of exon dictionaries with coordinates and gene info |
get_genes_coordinates ¶
Get genomic coordinates for multiple gene symbols using batch request.
This method is maintained for backward compatibility. For new code, use get_symbols_data_batch() instead.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbols
|
list[str]
|
List of gene symbols |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
dict[str, dict[str, Any] | None]
|
Dictionary mapping gene symbols to coordinate information |
get_symbols_data_batch ¶
Get genomic data for multiple gene symbols using optimized batch requests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbols
|
list[str]
|
List of gene symbols |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
expand
|
bool
|
Whether to fetch transcript data (default: False) |
False
|
Returns:
Type | Description |
---|---|
dict[str, dict[str, Any] | None]
|
Dictionary mapping gene symbols to complete gene data including transcripts if expanded |
get_transcript_exons ¶
Get exon coordinates for a specific transcript.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transcript_id
|
str
|
Ensembl transcript ID |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
List of exon dictionaries with coordinates |
get_variations_batch ¶
Get variation information for multiple rsIDs using batched requests.
This method automatically splits large requests into smaller batches to stay under the Ensembl API limit of 200 items per request.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rsids
|
list[str]
|
List of rsID strings (e.g., ["rs123456", "rs789012"]) |
required |
species
|
str
|
Species name (default: "homo_sapiens") |
'homo_sapiens'
|
batch_size
|
int
|
Maximum number of rsIDs per batch (default: 25) |
25
|
Returns:
Type | Description |
---|---|
dict[str, dict[str, Any]]
|
Dictionary mapping rsID to variation information |
Raises:
Type | Description |
---|---|
RequestException
|
If request fails |
rsid_to_coordinates
cached
¶
Convert rsID to genomic coordinates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rsid
|
str
|
dbSNP rsID (e.g., "rs1234567") |
required |
species
|
str
|
Species name (default: "human") |
'human'
|
Returns:
Type | Description |
---|---|
dict[str, Any] | None
|
Dictionary with variant coordinates or None if not found |