nextflow
Link to section 'Description' of 'nextflow' Description
Nextflow is a bioinformatics workflow manager that enables the development of portable and reproducible workflows. It supports deploying workflows on a variety of execution platforms including local, HPC schedulers, AWS Batch, Google Cloud Life Sciences, and Kubernetes. Additionally, it provides support for manage your workflow dependencies through built-in support for Conda, Spack, Docker, Podman, Singularity, Modules, and more.
Link to section 'Versions' of 'nextflow' Versions
- Negishi: 21.10.0, 22.10.1, 23.04.1
- Anvil: 21.10.0, 22.10.1
- Bell: 21.10.0, 22.10.4
- Scholar: 21.10.0
- Gautschi: 21.10.0
Link to section 'Module' of 'nextflow' Module
You can load the modules by:
module load nextflow
Note: Docker is not available on Purdue clusters, so use "-profile singularity", environment modules, or conda for running NextFlow pipelines.
Running Nextflow can be computing or memory intensive. Please do not run it on login nodes, as this might affect other users sharing the same login node with you.
Link to section 'Wrap nextflow into slurm jobscript' of 'nextflow' Wrap nextflow into slurm jobscript
The easiest method to use nextflow on clusters is to place the nextflow run command into a batch script and submitting it to Slurm with sbatch. The manager process will run on the allocated compute node, and all tasks are configured to use the local executor.
#!/bin/bash
#SBATCH -A myQueue
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 12
#SBATCH --job-name=nextflow
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module load nextflow
nextflow run main.nf -profile singularity
Link to section 'Nextflow submits tasks as slurm jobs' of 'nextflow' Nextflow submits tasks as slurm jobs
Nextflow can also submit its tasks to Slurm instead of running them on the local host. Place the following file named nextflow.config in your Nextflow working directory:
process {
executor = 'slurm'
queueSize = 50
pollInterval = '1 min'
queueStatInterval = '5 min'
submitRateLimit = '10 sec'
}
Please do not change the above default configuration. Nextflow workflow manager process can generate a disruptive amount of communication requests to Slurm and the configuration file is used to reduce the frequency of those requests.
Link to section 'clusterOptions' of 'nextflow' clusterOptions
Inside the individual process definitions in your scripts, you will need to specify the clusterOptions variable to provide your queue and computing resources appropriate for that task. This can be done by adding something in the pattern of clusterOptions='-A standby -N1 -n1 -c12 -t 1:00:00' to the top of your task process blocks.
Below is a simple example to run Fastqc:
nextflow.enable.dsl=2
process FASTQC {
clusterOptions='-A standby -N1 -n1 -c4 -t 00:30:00'
input:
path reads
script:
"""
mkdir -p fastqc_out
module load biocontainers fastqc
fastqc -o fastqc_out ${reads}
"""
}
reads_ch = Channel.fromPath( 'reads/fastq/*.fastq.gz' )
workflow {
FASTQC(reads_ch)
}
Using clusterOptions='-A standby -N1 -n1 -c4 -t 00:30:00' , each nextflow task will be submitted to standby queue requesting 4 cores and 30 mins walltime.