A versatile Bash script that generates genomic windows of specified sizes from FASTA files. Can also automatically download genomes from Ensembl or UCSC databases. This tool is part of the MethylSense package preprocessing pipeline.
- Direct FASTA input - Process local genome files with
--genomeflag (v1.1) - Automatic genome downloading from Ensembl or UCSC databases
- Flexible window sizes with sensible defaults
- macOS compatible - Works on both Linux and macOS (v1.1)
- System tools fallback - Use system bedtools/samtools with
--no-micromamba(v1.1) - Progress tracking with visual progress bars
- Dry-run mode to preview commands before execution
- Bash (3.2+, macOS compatible)
- bedtools and samtools (via system install OR micromamba)
- curl (for downloading genomes)
git clone https://github.com/markusdrag/GenomeToWindows.git
cd GenomeToWindows
chmod +x window_those_genomes.shIf you have bedtools and samtools installed, you're ready to go with --no-micromamba.
For automatic dependency management:
./install.sh# Process your local genome file
./window_those_genomes.sh \
--genome /path/to/genome.fna \
--windows 200,400,500,750,1000 \
--output ./my_windows \
--no-micromamba# Download human genome from Ensembl and generate default windows
./window_those_genomes.sh --species Homo_sapiens
# Download chicken genome with benchmarking window sizes
./window_those_genomes.sh --species Gallus_gallus --windows 200,400,500,750,1000
# Download mouse genome with custom window sizes
./window_those_genomes.sh --species Mus_musculus --windows 1000,5000,10000
# Use UCSC database instead of Ensembl
./window_those_genomes.sh --species Homo_sapiens --database ucsc
# Preview download commands without executing (dry-run)
./window_those_genomes.sh --species Gallus_gallus --dry-run./window_those_genomes.sh --input ./my_genomes --output ./my_windowsOptions:
-g, --genome FILE Direct path to a FASTA file (.fna/.fa/.fasta)
This is the PREFERRED option for local files
--fai FILE Direct path to a FAI index file (skips samtools indexing)
-s, --species SPECIES Scientific name (e.g., "Homo_sapiens") - downloads genome
-d, --database DB Database: ensembl or ucsc (default: ensembl)
-i, --input DIR Input directory with .fna/.fa files (default: ./genomes)
-o, --output DIR Output directory for BED files (default: ./windowed_genomes)
-w, --windows SIZES Comma-separated window sizes in bp
Example: -w 200,400,500,750,1000
--no-micromamba Use system bedtools/samtools instead of micromamba
--dry-run Preview commands without executing
-h, --help Show help message
-v, --version Show version
The easiest way to get started is to let the script automatically download a genome from Ensembl. Just specify the species name and your desired window sizes:
# Download human genome from Ensembl and generate 1kb, 5kb, and 10kb windows
./window_those_genomes.sh --species Homo_sapiens --windows 1000,5000,10000
# Download chicken genome with MethylSense benchmarking sizes (200bp to 1000bp)
./window_those_genomes.sh --species Gallus_gallus --windows 200,400,500,750,1000
# Download mouse genome from UCSC instead of Ensembl
./window_those_genomes.sh --species Mus_musculus --database ucscThe script will:
- Download the genome FASTA from the selected database
- Index it with
samtools faidx - Generate BED files with non-overlapping windows for each specified size
If you already have a genome FASTA file on your system, use --genome to process it directly. This is faster and avoids re-downloading:
./window_those_genomes.sh \
--genome /path/to/Gallus_gallus.GRCg7b.dna.toplevel.fna \
--windows 200,400,500,750,1000 \
--output ./benchmark_windows \
--no-micromambaTip: Use
--no-micromambaif you have bedtools and samtools already installed on your system.
If you already have a .fai index file, you can skip the indexing step entirely by providing it with --fai:
./window_those_genomes.sh \
--genome /path/to/genome.fna \
--fai /path/to/genome.fna.fai \
--windows 1000,5000,10000This is useful when working with very large genomes where indexing can take several minutes.
All species in Ensembl (100+ species). Common examples:
Homo_sapiens,Mus_musculus,Rattus_norvegicusDanio_rerio,Gallus_gallus,Bos_taurus
Pre-configured mapping:
Homo_sapiens→ hg38Mus_musculus→ mm39Gallus_gallus→ galGal6- And more...
BED files in ./windowed_genomes/ (or custom --output):
{genome_name}_{window_size}bp.bed- Example:
Gallus_gallus_500bp.bed
- NEW:
--genomeflag for direct FASTA input - NEW:
--faiflag for direct FAI index input - NEW:
--no-micromambato use system tools - FIX: macOS compatibility (removed GNU grep -P)
- FIX: Bash 3.x compatibility (removed associative arrays)
- Initial release
MIT License - see LICENSE file for details
Drag, M., et al. (2025). New high accuracy diagnostics for avian Aspergillus fumigatus
infection using Nanopore methylation sequencing of host cell-free DNA and machine
learning prediction. bioRxiv. https://doi.org/10.1101/2025.04.11.648151
- Issues: GitHub Issues
- Email: markusdrag@gmail.com