Profiles¶
ReSeq2 stores sequencing statistics and estimated probabilities as profile files. Understanding the two available formats helps you choose the right one for your workflow.
Binary vs Text Format¶
| Binary (default) | Text (--textFormat) |
|
|---|---|---|
| File size | ~65% smaller | Larger |
| Load speed | ~2--3x faster | Slower |
| Portable | No | Yes |
| Use case | Local workflows on a single machine | Sharing, archiving, cross-platform use |
Binary portability
Binary profiles are Boost serialization archives. They are not portable across different CPU architectures or compilers. A profile generated on x86 with GCC may not load on ARM or with Clang. Always use text format when sharing profiles with collaborators or publishing them.
Both formats are auto-detected on load --- you never need to specify which format a file is in when reading.
Converting Between Formats¶
Use the convertProfile command to convert existing profile files.
This overwrites the input file with the text-format version.
Without --textFormat, the default binary format is used.
Probability files (.reseq.ipf) can be converted the same way using -p and -P:
Generating Both Formats at Once¶
During stats collection, use --bothFormats to produce both binary and text profiles simultaneously. The alternate format gets a .text or .bin suffix:
This creates:
my_mappings.bam.reseq(binary, default)my_mappings.bam.reseq.text(text, portable)
Using Pre-built Profiles¶
The ReSeq-profiles repository provides curated profiles with detailed metadata about the original datasets. These are useful when:
- You need a wide variety of sequencing conditions for benchmarking
- You do not have access to a closely matching real dataset
- You want to quickly test a pipeline without running the full stats collection
Best practice
For the most realistic simulations, create your own profiles from a dataset that closely matches your target sequencer, chemistry, fragmentation protocol, adapters, and PCR cycles. Pre-built profiles are a good fallback but may not capture the specific characteristics of your experiment.
Cross-Platform Sharing Advice¶
When distributing profiles (e.g., alongside a publication or benchmark):
- Always share text format --- it loads on any platform regardless of compiler or architecture
- Include the
.reseq.ipffile alongside the.reseqstats file so recipients can skip probability estimation - Document the original dataset --- species, sequencer model, chemistry version, and read length help users assess whether a profile is appropriate for their use case