nextflow

Link to section 'Description' of 'nextflow' Description

Nextflow is a bioinformatics workflow manager that enables the development of portable and reproducible workflows. It supports deploying workflows on a variety of execution platforms including local, HPC schedulers, AWS Batch, Google Cloud Life Sciences, and Kubernetes. Additionally, it provides support for manage your workflow dependencies through built-in support for Conda, Spack, Docker, Podman, Singularity, Modules, and more.

Link to section 'Versions' of 'nextflow' Versions

Negishi: 21.10.0, 22.10.1, 23.04.1
Anvil: 21.10.0, 22.10.1
Bell: 21.10.0, 22.10.4
Scholar: 21.10.0
Gautschi: 21.10.0

Link to section 'Module' of 'nextflow' Module

You can load the modules by:

module load nextflow

Note: Docker is not available on Purdue clusters, so use "-profile singularity", environment modules, or conda for running NextFlow pipelines.

Running Nextflow can be computing or memory intensive. Please do not run it on login nodes, as this might affect other users sharing the same login node with you.

Link to section 'Wrap nextflow into slurm jobscript' of 'nextflow' Wrap nextflow into slurm jobscript

The easiest method to use nextflow on clusters is to place the nextflow run command into a batch script and submitting it to Slurm with sbatch. The manager process will run on the allocated compute node, and all tasks are configured to use the local executor.

#!/bin/bash
#SBATCH -A myQueue
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 12
#SBATCH --job-name=nextflow
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out

module load nextflow

nextflow run main.nf -profile singularity

Link to section 'Nextflow submits tasks as slurm jobs' of 'nextflow' Nextflow submits tasks as slurm jobs

Nextflow can also submit its tasks to Slurm instead of running them on the local host. Place the following file named nextflow.config in your Nextflow working directory:

process {
        executor = 'slurm'
        queueSize = 50
        pollInterval = '1 min'
        queueStatInterval = '5 min'
        submitRateLimit = '10 sec'
}

Please do not change the above default configuration. Nextflow workflow manager process can generate a disruptive amount of communication requests to Slurm and the configuration file is used to reduce the frequency of those requests.

Link to section 'clusterOptions' of 'nextflow' clusterOptions

Inside the individual process definitions in your scripts, you will need to specify the clusterOptions variable to provide your queue and computing resources appropriate for that task. This can be done by adding something in the pattern of clusterOptions='-A standby -N1 -n1 -c12 -t 1:00:00' to the top of your task process blocks.

Below is a simple example to run Fastqc:

nextflow.enable.dsl=2
  
process FASTQC {
   clusterOptions='-A standby -N1 -n1 -c4 -t 00:30:00'
   input:
   path reads
   script:
   """
   mkdir -p fastqc_out
   module load biocontainers fastqc
   fastqc -o fastqc_out ${reads}
   """
}
reads_ch = Channel.fromPath( 'reads/fastq/*.fastq.gz' )

workflow {
  FASTQC(reads_ch)
}

Using clusterOptions='-A standby -N1 -n1 -c4 -t 00:30:00' , each nextflow task will be submitted to standby queue requesting 4 cores and 30 mins walltime.