Parallelizing pipeline runs on HPC systems¶
This guide shows how to parallelize pipeline runs on HPC systems that use job schedulers supported by Nipoppy.
Currently, we have built-in support for the Slurm and SGE job schedulers. However, it is possible to manually add another job scheduler.
Important
Although the default template job script is designed to work with minimal user configuration, each HPC system is different, and some may require different/additional parameters to be set. See the Further customization section for how deeper configuration can be achieved.
If the default Slurm/SGE configurations do not work for you, please consider opening an issue on our GitHub repository so that we can improve our HPC support.
Configuring main HPC options¶
Global settings¶
The default global configuration file has two HPC-related fields that should be updated as needed:
1{
2 "SUBSTITUTIONS": {
3 "_comment": "Self-references like NIPOPPY_DPATH_CONTAINERS are resolved from the layout at runtime, making them layout-aware",
4 "[[NIPOPPY_DPATH_CONTAINERS]]": "[[NIPOPPY_DPATH_CONTAINERS]]",
5 "[[HPC_ACCOUNT_NAME]]": ""
6 },
7 "DICOM_DIR_PARTICIPANT_FIRST": true,
8 "CONTAINER_CONFIG": {
9 "COMMAND": "apptainer",
10 "ARGS": [
11 "--cleanenv"
12 ],
13 "ENV_VARS": {
14 "PYTHONUNBUFFERED": "1"
15 }
16 },
17 "HPC_PREAMBLE": [
18 "# (These lines can all be removed if not using HPC functionality.)",
19 "# ========== Activate Python environment ==========",
20 "# Here we need the command to activate your Python environment in an ",
21 "# HPC job, for example:",
22 "# - venv: source <PATH_TO_VENV>/bin/activate",
23 "# - conda: source ~/.bashrc; conda activate <ENV_NAME>",
24 "# ========== Set environment variables ==========",
25 "export PYTHONUNBUFFERED=1"
26 ],
27 "PIPELINE_VARIABLES": {
28 "BIDSIFICATION": {},
29 "PROCESSING": {},
30 "EXTRACTION": {}
31 },
32 "CUSTOM": {}
33}
HPC_PREAMBLE¶
HPC_PREAMBLE is a list of Bash commands that should executed at the beginning of every job.
Importantly, there should be a command for activating the Nipoppy Python environment.
[[HPC_ACCOUNT_NAME]]¶
The value for the [[HPC_ACCOUNT_NAME]] field in the SUBSTITUTIONS dictionary should be set to the account name/ID the job will be associated with.
By default this will be passed as --account-name in Slurm systems and -q in SGE systems during job submission.
This can be left blank if these options are not needed.
Attention
If your HPC system needs flags other than --account-name or -q need to be set, you will have to modify the template job submission script: see the Further customization section for more information.
Pipeline-specific settings¶
Job time limit and CPU and memory requests can be configured separately for each pipeline via the HPC config file.
Look for this file inside the pipeline config directory at <NIPOPPY_PROJECT_ROOT>/pipelines/<PIPELINE_TYPE>/<PIPELINE_NAME>/<PIPELINE_VERSION> – it is most likely called hpc.json or hpc_config.json (see the pipeline’s config.json file for the exact name)
The HPC config file should look similar to this:
1{
2 "ACCOUNT": "[[HPC_ACCOUNT_NAME]]",
3 "TIME": "1:00:00",
4 "CORES": "1",
5 "MEMORY": "16G",
6 "ARRAY_CONCURRENCY_LIMIT": ""
7}
If the pipeline config directory has no HPC config file
You can create an HPC config file manually by copying the content above into a new file called (for example) hpc.json.
You will also need to add an "HPC_CONFIG_FILE" field for each step in pipeline’s config.json file:
1 "STEPS": [
2 {
3 "INVOCATION_FILE": "invocation.json",
4 "DESCRIPTOR_FILE": "descriptor.json",
5 "HPC_CONFIG_FILE": "hpc.json"
6 }
7 ],
Set the fields in the HPC config file as needed. Set/leave as empty string if the field is not needed.
ACCOUNT: do not modify this field – the account name should be set in the global configuration file.TIME: time limit. Passed as--timein Slurm jobs and-l h_rtin SGE jobs.CORES: number of CPUs requested. Passed as--cpus-per-taskin Slurm jobs and ignored in SGE jobs.MEMORY: amount of memory requested. Passed as--memin Slurm jobs and-l h_vmemin SGE jobs.ARRAY_CONCURRENCY_LIMIT: maximum number of jobs in the array that can be run at the same time. Set as part of--arrayspecification in Slurm jobs and passed as--tcin SGE jobs.
Submitting HPC jobs via nipoppy commands¶
To run a pipeline on an HPC, use the --hpc option to specify the HPC job scheduler when running the nipoppy bidsify, nipoppy process, or nipoppy extract commands:
$ nipoppy <SUBCOMMAND> \
--dataset <NIPOPPY_PROJECT_ROOT> \
--pipeline <PIPELINE_NAME> \
--hpc slurm
# other desired options
# ...
This will submit a job array (one job per participant/session to run) through the requested job scheduler.
Currently, only 'slurm' and 'sge' have built-in support, but it is possible to add a different job scheduler.
Tip
We recommend submitting a single job (i.e. by specifying both --participant-id and --session-id) the first time you launch jobs on an HPC.
This will make it easier to troubleshoot if any problem occurs.
Troubleshooting¶
Below are some troubleshooting tips that might be helpful if your jobs are submitted successfully but failing before pipeline processing begins.
Slurm/SGE log files are written to <NIPOPPY_PROJECT_ROOT>/logs/hpc.
If you see an error message complaining about the nipoppy command not existing, it is likely that your HPC_PREAMBLE does not have the right command(s) for activating your Nipoppy Python environment.
By default, the job script generated by Nipoppy is deleted upon successful job submission.
If you suspect that there is something wrong with the job script, rerun the nipoppy command you used to submit the job(s) with the --keep-workdir flag.
Then, the script can be found at <NIPOPPY_PROJECT_ROOT>/scratch/work/<PIPELINE_NAME>-<PIPELINE_VERSION>/<PIPELINE_NAME>-<PIPELINE_VERSION>/run_queue.sh.
Attention
Modifying <NIPOPPY_PROJECT_ROOT>/scratch/work/<PIPELINE_NAME>-<PIPELINE_VERSION>/<PIPELINE_NAME>-<PIPELINE_VERSION>/run_queue.sh will not have an effect on future job submissions.
Instead, you will need to modify the template job script itself.
Further customization¶
All fields in the HPC config file are passed to the Jinja template job script, which can be found at <NIPOPPY_PROJECT_ROOT>/code/hpc/job_script_template.sh.
The default template job script
1#!/bin/bash
2
3{#-
4# This is a template for generating a job script that will be run on an HPC
5# cluster. It is written using the Jinja templating language (see
6# https://jinja.palletsprojects.com for more information).
7
8# All variables starting with the "NIPOPPY_" prefix are set internally by
9# Nipoppy and cannot be changed. Other (optional) variables can be defined in a
10# pipeline's HPC config file (i.e., hpc.json). Additional variables can also be
11# defined in the HPC config file for further customization.
12
13# Lines surrounded by { # and # } (without spaces) are Jinja comments and will
14# not be included in the final job script.
15#}
16
17{#-
18# ----------------------------
19# JOB SCHEDULER CONFIGURATIONS
20# ----------------------------
21# Below sections are for the Slurm and SGE job schedulers respectively.
22# Depending on the value of the --hpc argument, only one of these will be used.
23# Existing lines should not be modified unless you know what you are doing.
24# New lines can be added to hardcode extra settings that are to be constant for
25# every HPC job (no matter which pipeline).
26#}
27{%- if NIPOPPY_HPC == 'slurm' %}
28{% set NIPOPPY_ARRAY_VAR = 'SLURM_ARRAY_TASK_ID' %}
29# ===== Slurm configs =====
30#SBATCH --job-name={{ NIPOPPY_JOB_NAME }}
31#SBATCH --output={{ NIPOPPY_DPATH_LOGS }}/%x-%A_%a.out
32#SBATCH --array=1-{{ NIPOPPY_COMMANDS | length }}
33{%- if ARRAY_CONCURRENCY_LIMIT -%}
34%{{ ARRAY_CONCURRENCY_LIMIT }}
35{%- endif %}
36{% if TIME -%}
37#SBATCH --time={{ TIME }}
38{%- endif -%}
39{% if MEMORY %}
40#SBATCH --mem={{ MEMORY }}
41{%- endif -%}
42{% if CORES %}
43#SBATCH --cpus-per-task={{ CORES }}
44{%- endif -%}
45{% if ACCOUNT %}
46#SBATCH --account={{ ACCOUNT }}
47{%- endif %}
48{% if PARTITION %}
49#SBATCH --partition={{ PARTITION }}
50{%- endif %}
51
52{%- elif NIPOPPY_HPC == 'sge' %}
53{% set NIPOPPY_ARRAY_VAR = 'SGE_TASK_ID' %}
54# ===== SGE configs =====
55#$ -N {{ NIPOPPY_JOB_NAME }}
56#$ -o {{ NIPOPPY_DPATH_LOGS }}/$JOB_NAME_$JOB_ID_$TASK_ID.out
57#$ -j y
58#$ -t 1-{{ NIPOPPY_COMMANDS | length }}
59{% if ARRAY_CONCURRENCY_LIMIT -%}
60#$ -tc {{ ARRAY_CONCURRENCY_LIMIT }}
61{%- endif -%}
62{% if TIME %}
63#$ -l h_rt={{ TIME }}
64{%- endif -%}
65{% if MEMORY %}
66#$ -l h_vmem={{ MEMORY }}
67{%- endif -%}
68{% if ACCOUNT %}
69#$ -q {{ ACCOUNT }}
70{%- endif %}
71{% endif %}
72
73# for custom scripting
74DPATH_ROOT="{{ NIPOPPY_DPATH_ROOT }}"
75PIPELINE_NAME="{{ NIPOPPY_PIPELINE_NAME }}"
76PIPELINE_VERSION="{{ NIPOPPY_PIPELINE_VERSION }}"
77PIPELINE_STEP="{{ NIPOPPY_PIPELINE_STEP }}"
78PARTICIPANT_IDS=({% for participant_id in NIPOPPY_PARTICIPANT_IDS %} "{{ participant_id }}"{% endfor %} )
79SESSION_IDS=({% for session_id in NIPOPPY_SESSION_IDS %} "{{ session_id }}"{% endfor %} )
80{#
81# -------------------
82# START OF JOB SCRIPT
83# -------------------
84# Below lines should not be modified unless you know what you are doing.
85#}
86{% if NIPOPPY_HPC_PREAMBLE_STRINGS -%}
87# HPC_PREAMBLE from global config file
88{% for NIPOPPY_HPC_PREAMBLE_STRING in NIPOPPY_HPC_PREAMBLE_STRINGS -%}
89{{ NIPOPPY_HPC_PREAMBLE_STRING }}
90{% endfor %}
91{%- endif %}
92# Nipoppy-generated list of commands to be run in job array
93COMMANDS=( \
94{% for command in NIPOPPY_COMMANDS -%}
95 "{{ command }}" \
96{% endfor -%}
97)
98
99# get command from list
100# note that COMMANDS is zero-indexed (bash array)
101# but the job array is one-indexed for compatibility with SGE
102I_JOB=$(({{NIPOPPY_ARRAY_VAR}}-1))
103COMMAND=${COMMANDS[$I_JOB]}
104
105# for custom scripting
106PARTICIPANT_ID=${PARTICIPANT_IDS[$I_JOB]}
107SESSION_ID=${SESSION_IDS[$I_JOB]}
108
109# print/run command
110echo $COMMAND
111eval $COMMAND
This template can be modified to hardcode job submission settings or to expose additional pipeline-specific configurations.
As an example, let’s say we are interested in specifying the --nice option in Slurm jobs.
To hardcode the same
--nicevalue for all jobs/pipelines, add e.g.,#SBATCH --nice=10in a new line near the beginning of the template script (outside of anyifblock).To expose
--niceas a parameter that can be set independently for each pipeline, instead add the following block:{% if NICE %} #SBATCH --nice={{ NICE }} {%- endif %}
Then set
"NICE"in a new field (alongside"TIME","CORE"etc.) in a pipeline’s HPC config file.
Support for other job schedulers¶
Job scheduling support in the Nipoppy package relies on the pysqa package, which can handle several other job schedulers in addition to Slurm and SGE.
To add support for another job scheduler supported by pysqa (e.g., Flux), follow these steps:
Navigate to
<NIPOPPY_PROJECT_ROOT>/code/hpc.Create a
flux.yamlfile. Refer to the existingslurm.yamlandsge.yamlfor what the content of that file should be.Update
clusters.yamlto addfluxas an additional cluster.Update
job_script_template.shto add a section for Flux configs.You should now be able to run
nipoppy bidsify/process/extractwith--hpc flux.
See also the pysqa documentation for more information.
Important
If you have configured the Nipoppy HPC functionalities to work on a job scheduler other than Slurm/SGE, please consider opening an issue on our GitHub repository and contributing your additions back to the codebase.