This repository contains a tool for benchmarking different implementations of the BLAS and LAPACK libraries.
This tool has the following dependencies:
- meson, version 1.3.2 or newer.
- ninja
- A Fortran compiler
- The build has been tested with
gfortran 11.4.0andifx 2025.3.0. The project does not currently build withflang.
- The build has been tested with
- FlexiBLAS
- All development and testing was done with version
3.4.5.
- All development and testing was done with version
- CMake (for finding dependencies)
To install the tool:
- Clone this repository
- Navigate to the green-computing-benchmarks directory
- Run the following commands:
meson setup build --optimization=3
meson compile -C build
To specify the compiler you want to use, replace the first command with:
FC=compiler meson setup build --optimization=3
Where compiler is replaced with your chosen compiler executable.
The tool has support for running on GPUs using CUDA and HIP.
To enable CUDA benchmarks, use the meson option -Dcuda=true. For example:
meson setup build -Dcuda=true --optimization=3
You must have an Nvidia GPU and the CUDA and CUBLAS libraries installed.
To enable HIP benchmarks, use the meson option -Dhip=true. For example:
meson setup build -Dhip=true --optimization=3
You must have HIPFort (note: this must be on your CMAKE_PREFIX_PATH, e.g CMAKE_PREFIX_PATH=/opt/hipfort/lib/fortran/f95/cmake/), HIP and HIPBLAS installed. Currently this tool only supports HIP on AMD GPUs.
You can specify the benchmarks you wish to run using a toml configuration file. To add a function to benchmark, use the syntax:
[[benchmarks]]
name = "NAME"
m-sizes = [10, 20, 30]
n-sizes = [10, 50, 30]
k-sizes = [10, 50, 30]
example.toml is included at the top level of this repository to demonstrate an example config.
Where "NAME" corresponds to the function you wish to benchmark. The list of available functions is below; unknown names will be ignored. m- n- and k-sizes correspond to the sizes of the matrices and/or vectors the functions will be benchmarked with.
Benchmarks will be run on the products of these arrays, for example the config above will run on 27 different sized problems. The problems are randomly generated.
To run the benchmarks, run build/benchmark_blas config.toml, where config.toml is replaced with the path to your own configuration file.
A separate csv file will be created for each routine in the directory you run the tool from. Alternatively, you can run the run_blas_benchmarks.sh script, which will run your chosen benchmarks with all BLAS implementations available to FLEXIBLAS and collect all results files in a timestamped directory. We recommend that this script is not used for GPU runs, as changing the BLAS implementation will have no effect but multiple runs will still be performed.
The same is true of the NAIVE benchmark option.
To manually change the BLAS backend benchmarks are run with, use:
FLEXIBLAS="YOUR_BLAS" ./build/benchmark_blas config.toml
Available BLAS backends are shown with flexiblas list.
- DAXPY
- Double precision
$\alpha x + y$ - Required options:
- n-sizes (array)
- Double precision
- DASUM
- Double precision sum of the absolute values of a vector
- Required options:
- n-sizes (array)
- DGEMV
- Double precision
$\alpha A x + \beta y$ - Required options:
- m-sizes (array)
- n-sizes (array)
- Double precision
-
DGEMM
- Double precision
$\alpha A B + \beta C$ - Required options:
- m-sizes (array)
- n-sizes (array)
- k-sizes (array)
- Double precision
-
CUBLAS_DGEMM (CUDA only)
- Double precision
$\alpha A B + \beta C$ on Nvidia GPU - Required options:
- m-sizes (array)
- n-sizes (array)
- k-sizes (array)
- Double precision
-
HIPBLAS_DGEMM (HIP only)
- Double precision
$\alpha A B + \beta C$ on AMD GPU - Required options:
- m-sizes (array)
- n-sizes (array)
- k-sizes (array)
- Double precision
-
DSYRK
- Double precision symmetric rank-k update
$\alpha A A^T + \beta C$ - Required options:
- n-sizes (array)
- k-sizes (array)
- Double precision symmetric rank-k update
-
DSY2RK
- Double precision symmetric rank-2k update
$\alpha A B^T + \alpha B A^T+ \beta C$ - Required options:
- n-sizes (array)
- k-sizes (array)
- Double precision symmetric rank-2k update
-
NAIVE
- Naive non-BLAS matrix multiply
$\alpha A B + \beta C$ - Required options:
- m-sizes (array)
- n-sizes (array)
- k-sizes (array)
- Naive non-BLAS matrix multiply
- DGESV
- Double precision solution to
$Ax = B$ - Required options:
- n-sizes (array)
- Double precision solution to