Parallelizing pipeline runs on HPC systems¶

This guide shows how to parallelize pipeline runs on HPC systems that use job schedulers supported by Nipoppy.

Currently, we have built-in support for the Slurm and SGE job schedulers. However, it is possible to manually add another job scheduler.

Important

Although the default template job script is designed to work with minimal user configuration, each HPC system is different, and some may require different/additional parameters to be set. See the Further customization section for how deeper configuration can be achieved.

If the default Slurm/SGE configurations do not work for you, please consider opening an issue on our GitHub repository so that we can improve our HPC support.

Configuring main HPC options¶

Global settings¶

The default global configuration file has two HPC-related fields that should be updated as needed:

{
    "SUBSTITUTIONS": {
        "_comment": "Self-references like NIPOPPY_DPATH_CONTAINERS are resolved from the layout at runtime, making them layout-aware",
        "[[NIPOPPY_DPATH_CONTAINERS]]": "[[NIPOPPY_DPATH_CONTAINERS]]",
        "[[HPC_ACCOUNT_NAME]]": ""
    },
    "DICOM_DIR_PARTICIPANT_FIRST": true,
    "CONTAINER_CONFIG": {
        "COMMAND": "apptainer",
        "ARGS": [
            "--cleanenv"
        ],
        "ENV_VARS": {
            "PYTHONUNBUFFERED": "1"
        }
    },
    "HPC_PREAMBLE": [
        "# (These lines can all be removed if not using HPC functionality.)",
        "# ========== Activate Python environment ==========",
        "# Here we need the command to activate your Python environment in an ",
        "# HPC job, for example:",
        "# - venv:  source <PATH_TO_VENV>/bin/activate",
        "# - conda: source ~/.bashrc; conda activate <ENV_NAME>",
        "# ========== Set environment variables ==========",
        "export PYTHONUNBUFFERED=1"
    ],
    "PIPELINE_VARIABLES": {
        "BIDSIFICATION": {},
        "PROCESSING": {},
        "EXTRACTION": {}
    },
    "CUSTOM": {}
}

`HPC_PREAMBLE`¶

HPC_PREAMBLE is a list of Bash commands that should executed at the beginning of every job. Importantly, there should be a command for activating the Nipoppy Python environment.

For venv environments, the command would be something like this: "source <PATH_TO_NIPOPPY_VENV>/bin/activate"
For conda environments, the command would instead be something like this: "source ~/.bashrc; conda activate <NIPOPPY_ENV_NAME>"

`[[HPC_ACCOUNT_NAME]]`¶

The value for the [[HPC_ACCOUNT_NAME]] field in the SUBSTITUTIONS dictionary should be set to the account name/ID the job will be associated with. By default this will be passed as --account-name in Slurm systems and -q in SGE systems during job submission. This can be left blank if these options are not needed.

Attention

If your HPC system needs flags other than --account-name or -q need to be set, you will have to modify the template job submission script: see the Further customization section for more information.

Pipeline-specific settings¶

Job time limit and CPU and memory requests can be configured separately for each pipeline via the HPC config file. Look for this file inside the pipeline config directory at <NIPOPPY_PROJECT_ROOT>/pipelines/<PIPELINE_TYPE>/<PIPELINE_NAME>/<PIPELINE_VERSION> – it is most likely called hpc.json or hpc_config.json (see the pipeline’s config.json file for the exact name)

The HPC config file should look similar to this:

{
    "ACCOUNT": "[[HPC_ACCOUNT_NAME]]",
    "TIME": "1:00:00",
    "CORES": "1",
    "MEMORY": "16G",
    "ARRAY_CONCURRENCY_LIMIT": ""
}

If the pipeline config directory has no HPC config file

You can create an HPC config file manually by copying the content above into a new file called (for example) hpc.json.

You will also need to add an "HPC_CONFIG_FILE" field for each step in pipeline’s config.json file:

    "STEPS": [
        {
            "INVOCATION_FILE": "invocation.json",
            "DESCRIPTOR_FILE": "descriptor.json",
            "HPC_CONFIG_FILE": "hpc.json"
        }
    ],

Set the fields in the HPC config file as needed. Set/leave as empty string if the field is not needed.

ACCOUNT: do not modify this field – the account name should be set in the global configuration file.
TIME: time limit. Passed as --time in Slurm jobs and -l h_rt in SGE jobs.
CORES: number of CPUs requested. Passed as --cpus-per-task in Slurm jobs and ignored in SGE jobs.
MEMORY: amount of memory requested. Passed as --mem in Slurm jobs and -l h_vmem in SGE jobs.
ARRAY_CONCURRENCY_LIMIT: maximum number of jobs in the array that can be run at the same time. Set as part of --array specification in Slurm jobs and passed as --tc in SGE jobs.

Submitting HPC jobs via `nipoppy` commands¶

To run a pipeline on an HPC, use the --hpc option to specify the HPC job scheduler when running the nipoppy bidsify, nipoppy process, or nipoppy extract commands:

$ nipoppy <SUBCOMMAND> \
    --dataset <NIPOPPY_PROJECT_ROOT> \
    --pipeline <PIPELINE_NAME> \
    --hpc slurm
    # other desired options
    # ...

This will submit a job array (one job per participant/session to run) through the requested job scheduler. Currently, only 'slurm' and 'sge' have built-in support, but it is possible to add a different job scheduler.

Tip

We recommend submitting a single job (i.e. by specifying both --participant-id and --session-id) the first time you launch jobs on an HPC. This will make it easier to troubleshoot if any problem occurs.

Troubleshooting¶

Below are some troubleshooting tips that might be helpful if your jobs are submitted successfully but failing before pipeline processing begins.

Slurm/SGE log files are written to <NIPOPPY_PROJECT_ROOT>/logs/hpc. If you see an error message complaining about the nipoppy command not existing, it is likely that your HPC_PREAMBLE does not have the right command(s) for activating your Nipoppy Python environment.

By default, the job script generated by Nipoppy is deleted upon successful job submission. If you suspect that there is something wrong with the job script, rerun the nipoppy command you used to submit the job(s) with the --keep-workdir flag. Then, the script can be found at <NIPOPPY_PROJECT_ROOT>/scratch/work/<PIPELINE_NAME>-<PIPELINE_VERSION>/<PIPELINE_NAME>-<PIPELINE_VERSION>/run_queue.sh.

Attention

Modifying <NIPOPPY_PROJECT_ROOT>/scratch/work/<PIPELINE_NAME>-<PIPELINE_VERSION>/<PIPELINE_NAME>-<PIPELINE_VERSION>/run_queue.sh will not have an effect on future job submissions. Instead, you will need to modify the template job script itself.

Further customization¶

All fields in the HPC config file are passed to the Jinja template job script, which can be found at <NIPOPPY_PROJECT_ROOT>/code/hpc/job_script_template.sh.

The default template job script

#!/bin/bash

{#-
# This is a template for generating a job script that will be run on an HPC
# cluster. It is written using the Jinja templating language (see
# https://jinja.palletsprojects.com for more information).

# All variables starting with the "NIPOPPY_" prefix are set internally by
# Nipoppy and cannot be changed. Other (optional) variables can be defined in a
# pipeline's HPC config file (i.e., hpc.json). Additional variables can also be
# defined in the HPC config file for further customization.

# Lines surrounded by { # and # } (without spaces) are Jinja comments and will
# not be included in the final job script.
#}

{#-
# ----------------------------
# JOB SCHEDULER CONFIGURATIONS
# ----------------------------
# Below sections are for the Slurm and SGE job schedulers respectively.
# Depending on the value of the --hpc argument, only one of these will be used.
# Existing lines should not be modified unless you know what you are doing.
# New lines can be added to hardcode extra settings that are to be constant for
# every HPC job (no matter which pipeline).
#}
{%- if NIPOPPY_HPC == 'slurm' %}
{% set NIPOPPY_ARRAY_VAR = 'SLURM_ARRAY_TASK_ID' %}
# ===== Slurm configs =====
#SBATCH --job-name={{ NIPOPPY_JOB_NAME }}
#SBATCH --output={{ NIPOPPY_DPATH_LOGS }}/%x-%A_%a.out
#SBATCH --array=1-{{ NIPOPPY_COMMANDS | length }}
{%- if ARRAY_CONCURRENCY_LIMIT -%}
%{{ ARRAY_CONCURRENCY_LIMIT }}
{%- endif %}
{% if TIME -%}
#SBATCH --time={{ TIME }}
{%- endif -%}
{% if MEMORY %}
#SBATCH --mem={{ MEMORY }}
{%- endif -%}
{% if CORES %}
#SBATCH --cpus-per-task={{ CORES }}
{%- endif -%}
{% if ACCOUNT %}
#SBATCH --account={{ ACCOUNT }}
{%- endif %}
{% if PARTITION %}
#SBATCH --partition={{ PARTITION }}
{%- endif %}

{%- elif NIPOPPY_HPC == 'sge' %}
{% set NIPOPPY_ARRAY_VAR = 'SGE_TASK_ID' %}
# ===== SGE configs =====
#$ -N {{ NIPOPPY_JOB_NAME }}
#$ -o {{ NIPOPPY_DPATH_LOGS }}/$JOB_NAME_$JOB_ID_$TASK_ID.out
#$ -j y
#$ -t 1-{{ NIPOPPY_COMMANDS | length }}
{% if ARRAY_CONCURRENCY_LIMIT -%}
#$ -tc {{ ARRAY_CONCURRENCY_LIMIT }}
{%- endif -%}
{% if TIME %}
#$ -l h_rt={{ TIME }}
{%- endif -%}
{% if MEMORY %}
#$ -l h_vmem={{ MEMORY }}
{%- endif -%}
{% if ACCOUNT %}
#$ -q {{ ACCOUNT }}
{%- endif %}
{% endif %}

# for custom scripting
DPATH_ROOT="{{ NIPOPPY_DPATH_ROOT }}"
PIPELINE_NAME="{{ NIPOPPY_PIPELINE_NAME }}"
PIPELINE_VERSION="{{ NIPOPPY_PIPELINE_VERSION }}"
PIPELINE_STEP="{{ NIPOPPY_PIPELINE_STEP }}"
PARTICIPANT_IDS=({% for participant_id in NIPOPPY_PARTICIPANT_IDS %} "{{ participant_id }}"{% endfor %} )
SESSION_IDS=({% for session_id in NIPOPPY_SESSION_IDS %} "{{ session_id }}"{% endfor %} )
{#
# -------------------
# START OF JOB SCRIPT
# -------------------
# Below lines should not be modified unless you know what you are doing.
#}
{% if NIPOPPY_HPC_PREAMBLE_STRINGS -%}
# HPC_PREAMBLE from global config file
{% for NIPOPPY_HPC_PREAMBLE_STRING in NIPOPPY_HPC_PREAMBLE_STRINGS -%}
{{ NIPOPPY_HPC_PREAMBLE_STRING }}
{% endfor %}
{%- endif %}
# Nipoppy-generated list of commands to be run in job array
COMMANDS=( \
{% for command in NIPOPPY_COMMANDS -%}
    "{{ command }}" \
{% endfor -%}
)

# get command from list
# note that COMMANDS is zero-indexed (bash array)
# but the job array is one-indexed for compatibility with SGE
I_JOB=$(({{NIPOPPY_ARRAY_VAR}}-1))
COMMAND=${COMMANDS[$I_JOB]}

# for custom scripting
PARTICIPANT_ID=${PARTICIPANT_IDS[$I_JOB]}
SESSION_ID=${SESSION_IDS[$I_JOB]}

# print/run command
echo $COMMAND
eval $COMMAND

This template can be modified to hardcode job submission settings or to expose additional pipeline-specific configurations.

As an example, let’s say we are interested in specifying the --nice option in Slurm jobs.

To hardcode the same --nice value for all jobs/pipelines, add e.g., #SBATCH --nice=10 in a new line near the beginning of the template script (outside of any if block).
To expose --nice as a parameter that can be set independently for each pipeline, instead add the following block:
```
{% if NICE %}
#SBATCH --nice={{ NICE }}
{%- endif %}
```
Then set "NICE" in a new field (alongside "TIME", "CORE" etc.) in a pipeline’s HPC config file.

Support for other job schedulers¶

Job scheduling support in the Nipoppy package relies on the pysqa package, which can handle several other job schedulers in addition to Slurm and SGE.

To add support for another job scheduler supported by pysqa (e.g., Flux), follow these steps:

Navigate to <NIPOPPY_PROJECT_ROOT>/code/hpc.
Create a flux.yaml file. Refer to the existing slurm.yaml and sge.yaml for what the content of that file should be.
Update clusters.yaml to add flux as an additional cluster.
Update job_script_template.sh to add a section for Flux configs.
You should now be able to run nipoppy bidsify/process/extract with --hpc flux.

See also the pysqa documentation for more information.

Important

If you have configured the Nipoppy HPC functionalities to work on a job scheduler other than Slurm/SGE, please consider opening an issue on our GitHub repository and contributing your additions back to the codebase.

Parallelizing pipeline runs on HPC systems¶

Configuring main HPC options¶

Global settings¶

HPC_PREAMBLE¶

[[HPC_ACCOUNT_NAME]]¶

Pipeline-specific settings¶

Submitting HPC jobs via nipoppy commands¶

Troubleshooting¶

Further customization¶

Support for other job schedulers¶

`HPC_PREAMBLE`¶

`[[HPC_ACCOUNT_NAME]]`¶

Submitting HPC jobs via `nipoppy` commands¶