BLAS: bad multi-threaded performance for herk and hemm routines with OpenBLAS in the 2023a toolchain

as noted by @casparvl here: https://github.com/EESSI/test-suite/pull/268#issuecomment-3431874223
 
- the `herk` and `hemm` routines have much worse multi-threaded performance than the other routines.
- their performance is the same as the single-threaded performance, while for the other routines performance scales with number of threads
- this bad perf seems to happen only for 2023a, 2022b and 2024a are fine.

as can be seen in the output here: https://github.com/EESSI/test-suite/pull/268#issuecomment-3432154150 ,
the issue seems to be with larger matrices. initially the perf increases with increasing matrix size, but then suddenly from 1400 x 1400 the perf breaks down.

would be good to check the OpenBLAS repo to see if this is a known regression



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BLAS: bad multi-threaded performance for herk and hemm routines with OpenBLAS in the 2023a toolchain #287

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BLAS: bad multi-threaded performance for herk and hemm routines with OpenBLAS in the 2023a toolchain #287

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions