feat/refactor_hetjobs #1364

mandresm · 2025-06-11T10:58:02Z

Refactors the hetjob logic and options, and fixes the incorrect .run script when SLURM's hetjobs was used (see #1340).

It deprecates the computer.taskset option that is now substituted by computer.hetjob_strategy. The hetjob_strategy values can be tasket, hetjob or srunsteps. The different options result into these different .run scripts:

hetjob (default, allows for heterogeneous compute resources)

...
#SBATCH --nodes=10
#SBATCH --partition=cpu-clx:test
#SBATCH hetjob
#SBATCH --nodes=2
#SBATCH --partition=cpu-clx:test
#SBATCH hetjob
#SBATCH --nodes=1
#SBATCH --partition=cpu-clx:test
#SBATCH hetjob
#SBATCH --nodes=2
#SBATCH --partition=cpu-clx:test
...

time srun --mpi=pmix -l --kill-on-bad-exit=1 --cpu_bind=none \
--nodes=10 --ntasks=960 --ntasks-per-node=96 --cpus-per-task=1 --export=ALL,OMP_NUM_THREADS=1 ./fesom \
: --nodes=2 --ntasks=192 --ntasks-per-node=96 --cpus-per-task=1 --export=ALL,OMP_NUM_THREADS=1 ./oifs -v ecmwf -e awi3 \
: --nodes=1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=96 --export=ALL,OMP_NUM_THREADS=96 ./rnfma \
: --nodes=2 --ntasks=4 --ntasks-per-node=2 --cpus-per-task=32 --export=ALL,OMP_NUM_THREADS=32 ./xios.x  2>&1 &

taskset (#SBATCH --nodes still needs fixing, see #1148 (comment)

...
#SBATCH --nodes=15
...

#Creating hostlist for MPI + MPI&OMP heterogeneous parallel job
rm -f ./hostlist
export SLURM_HOSTFILE=/Users/mandresm/Work///hetjob_rework/run_20000101-20000131/work//hostlist
IFS=$'\n'; set -f
listnodes=($(< <( scontrol show hostnames $SLURM_JOB_NODELIST )))
unset IFS; set +f
rank=0
current_core=0
current_core_mpi=0
mpi_tasks_fesom=960
omp_threads_fesom=1
mpi_tasks_oifs=192
omp_threads_oifs=1
mpi_tasks_rnfmap=1
omp_threads_rnfmap=96
mpi_tasks_xios=4
omp_threads_xios=32
for model in fesom oifs rnfmap oasis3mct xios ;do
    eval nb_of_cores=\${mpi_tasks_${model}}
    eval nb_of_cores=$((${nb_of_cores}-1))
    for nb_proc_mpi in `seq 0 ${nb_of_cores}`; do
        (( index_host = current_core / 96 ))
        host_value=${listnodes[${index_host}]}
        (( slot =  current_core % 96 ))
        echo $host_value >> hostlist
        (( current_core = current_core + omp_threads_${model} ))
    done
done

time srun --mpi=pmix -l --kill-on-bad-exit=1 --cpu_bind=none --multi-prog hostfile_srun 2>&1 &

srunsteps (equivalent to how it's done in PBS+aprun in Aleph, does not allow for heterogeneous compute resources)

...
#SBATCH --nodes=15
...

time srun --mpi=pmix -l --kill-on-bad-exit=1 --cpu_bind=none \
--nodes=10 --ntasks=960 --ntasks-per-node=96 --cpus-per-task=1 --export=ALL,OMP_NUM_THREADS=1 ./fesom \
: --nodes=2 --ntasks=192 --ntasks-per-node=96 --cpus-per-task=1 --export=ALL,OMP_NUM_THREADS=1 ./oifs -v ecmwf -e awi3 \
: --nodes=1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=96 --export=ALL,OMP_NUM_THREADS=96 ./rnfma \
: --nodes=2 --ntasks=4 --ntasks-per-node=2 --cpus-per-task=32 --export=ALL,OMP_NUM_THREADS=32 ./xios.x  2>&1 &

TODO

Test in Lise (blogin)
Test in Levante
Clean modified functions
Write docstrings
Write documentation
Detect deprecated taskset option and exit with an error clearly stating what the user needs to do in their yaml
Remove unset SLURM_* from any option different than taskset

Closes #1340

Thanks to @ufukozkan for finding this problem and further investigating and to @christgau for investigating and providing the solution.

…s taskset, srunmix and hetjob with all their functionality

mandresm · 2025-06-11T15:01:43Z

Correcting myself here, I doubt that the hetjob option allows for shared MPI_COMM_WORLD across different srun binaries, so I'm looking into the option srunmix as the default.

mandresm · 2025-06-12T07:55:53Z

@JanStreffing, regarding my post above, I am assuming the errors I am seeing in the hetjob approach are because this models share the same MPI_COMM_WORLD, but is this really the case or do each of them have their own MPI_COMM_WORLD and use MPI_COMM_CONNECT?

JanStreffing · 2025-06-12T08:30:25Z

I'm not sure how hetjob works. But I believe for Taskset we let xios init the MPI_COMM_WORLD for all models, and then XIOS knows that OASIS is running, and it needs to split the MPI_COMM_WORLD into the local comms.

Each model then uses that local comm instead of the MPI comm world.

mandresm · 2025-06-12T12:51:32Z

Thanks @JanStreffing, that answers my question. I need to do a bit more of exploration on the hetjob and the srunmix options. Once I achieve some clarity I'll let you know. New srunmix and taskset options work in Levante in this branch.

mandresm · 2025-06-13T14:49:51Z

This is a nice description on how hetjob can be used together with a single srun command, where each part separated by : in the srun command is allocated to a different het-group. So in principle, the old ESM-Tools logic for hetjob should still work.

https://apps.fz-juelich.de/jsc/hps/juwels/modular-jobs.html

mandresm · 2025-06-13T14:51:13Z

This is a nice description on how hetjob can be used together with a single srun command, where each part separated by : in the srun command is allocated to a different het-group. So in principle, the old ESM-Tools logic for hetjob should still work.

And in fact, still works in levante

…gain srunsteps

mandresm · 2025-06-16T08:33:36Z

@ufukozkan, I am managing to get simulations started, but is failing because I don't have reading rights to the inputs. Can you allow read permissions to them?

mandresm · 2025-06-30T07:46:00Z

Related to #1148

ufukozkan · 2025-06-30T09:13:50Z

Hi all,

@ufukozkan, I am managing to get simulations started, but is failing because I don't have reading rights to the inputs. Can you allow read permissions to them?

I gave the permissions on NHR. However, I only have Arc01-TL255 coupling. I can upload CORE2 if you need them.

christgau · 2025-06-30T13:31:07Z

I gave the permissions on NHR.

Nitpicking, but maybe helpful for clarification. NHR is the national HPC alliance. There is no system named NHR nor are the systems of the alliance unified. There are nine NHR centers offering individual HPC services under the NHR umbrella. What used to be HLRN is now NHR@ZIB and NHR@GWDG and they operate independently, including software stack, such as slurm and it's configuration.

Apologies for possibly repeating already known stuff.

remove taskset option and include the new hetjob_strategy with option…

b66f08b

…s taskset, srunmix and hetjob with all their functionality

mandresm mentioned this pull request Jun 11, 2025

Slurm issue with different taskset options #1340

Open

mandresm added 2 commits June 13, 2025 18:00

rename option srunmix to srunsteps and revert hetjobs option to use a…

ccb784f

…gain srunsteps

adds a script to extract model time from the logs

ec3160c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat/refactor_hetjobs #1364

feat/refactor_hetjobs #1364

Uh oh!

mandresm commented Jun 11, 2025 •

edited

Loading

Uh oh!

mandresm commented Jun 11, 2025

Uh oh!

mandresm commented Jun 12, 2025

Uh oh!

JanStreffing commented Jun 12, 2025 •

edited

Loading

Uh oh!

mandresm commented Jun 12, 2025

Uh oh!

mandresm commented Jun 13, 2025

Uh oh!

mandresm commented Jun 13, 2025

Uh oh!

mandresm commented Jun 16, 2025

Uh oh!

mandresm commented Jun 30, 2025

Uh oh!

ufukozkan commented Jun 30, 2025

Uh oh!

christgau commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat/refactor_hetjobs #1364

Are you sure you want to change the base?

feat/refactor_hetjobs #1364

Uh oh!

Conversation

mandresm commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mandresm commented Jun 11, 2025

Uh oh!

mandresm commented Jun 12, 2025

Uh oh!

JanStreffing commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mandresm commented Jun 12, 2025

Uh oh!

mandresm commented Jun 13, 2025

Uh oh!

mandresm commented Jun 13, 2025

Uh oh!

mandresm commented Jun 16, 2025

Uh oh!

mandresm commented Jun 30, 2025

Uh oh!

ufukozkan commented Jun 30, 2025

Uh oh!

christgau commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mandresm commented Jun 11, 2025 •

edited

Loading

JanStreffing commented Jun 12, 2025 •

edited

Loading