- Memory: 32 GiB RAM or higher recommended
- OS: Ubuntu 24.04 recommended
- Verilator: 5.034 (build from source and add to
PATH) - CIRCT firtool: Install firtool 1.62.0 and add to
PATH - Other system dependencies:
apt-get install \
mold ccache ninja-build cmake clang clangd clang-format gdb \
help2man perl perl-doc flex bison libfl2 libfl-dev zlib1g zlib1g-dev libgoogle-perftools-dev numactl \
libfmt-dev libspdlog-dev libelf-dev libyaml-cpp-dev nlohmann-json3-dev \
device-tree-compiler bsdmainutils ruby default-jdk python3-tqdmA Dockerfile that bundles all dependencies is provided below.
After cloning this repository to a local path (e.g., ventus-env/):
cd ventus-env/
make init # Fetches all required repositories and data from GitHub; this can take a while—ensure a stable connection.The dataset used by gpu-rodinia will be downloaded and extracted automatically: http://dspdev.ime.tsinghua.edu.cn/images/ventus_dataset/ventus_rodinia_data.tar.xz
You may also download rodinia_data.tar.xz manually from the mirror:
https://cloud.tsinghua.edu.cn/d/ad60a4502fbb43daa45e/
and extract it with:
tar -xf rodinia_data.tar.xzBuild all projects and install them under ventus-env/install/:
bash build-ventus.shIf you have updated or modified certain sub-repositories, we recommend using this script as well. Use --build XXX to build a specific sub-repo. See --help for details.
Before using Ventus, set environment variables:
source env.shRun OpenCL programs with different simulators as the execution backend:
cd rodinia/opencl/gaussian
make
./run # Uses spike by default
VENTUS_BACKEND=spike ./run # Same as above
VENTUS_BACKEND=rtlsim ./run # Verilator-based Chisel RTL simulation
VENTUS_BACKEND=cyclesim ./run # Cycle-accurate simulatorThe following environment variables adjust simulation behavior:
VENTUS_BACKEND=XXX— Select the device/backend:spike|isa,rtl|rtlsim|gpgpu,cyclesim|systemc|simulator.VENTUS_WAVEFORM=1— Enable waveform dump:rtlsim→ FST,cyclesim→ VCD.VENTUS_WAVEFORM_BEGIN/VENTUS_WAVEFORM_END— Dump only a selected simulation interval forrtlsim(speeds up simulation). Not supported bycyclesim.VENTUS_DUMP_RESULT=filename.json— Save all device→host copies from OpenCL programs and their device addresses to a JSON file (useful for debugging).VENTUS_TIMING_DDR=0— Disable DDR timing incyclesim(enabled by default). Current RTL simulation does not support DDR timing.NUM_THREAD=32— Number of threads per warp reported by the POCL device. Forrtlsim/cyclesim, this should match hardware specs; forspike, any value is acceptable.NUM_WARP=8— Max warps per thread block reported by the POCL device. Forrtlsim/cyclesim, match hardware specs; forspike, any value is acceptable.
This repository includes the GPU-Rodinia test suite (rodinia/opencl) and Ventus’s own OpenCL tests (testcases/).
The regression-test.py script runs a subset of the above as regression tests:
python3 ./regression-test.py # Uses spike by default
VENTUS_BACKEND=spike python3 ./regression-test.py
VENTUS_BACKEND=rtlsim python3 ./regression-test.py # Verilator-based Chisel RTL
VENTUS_BACKEND=cyclesim python3 ./regression-test.py # Cycle-accurate simulatorBefore running, we recommend tuning these options:
-t TIMEOUT_SCALE— Scale timeouts based on your machine’s speed (increase if your system is slower).-j JOBS— Number of parallel test processes. With RTL simulation, each test process is multi-threaded (8 threads by default). Adjust to your machine.
To run a single test case manually, change to its directory and:
- Run
maketo build the test. - Use the
./runscript to execute. Many tests require command-line arguments; therunscript includes examples. All tests supported byregression-test.pyprovide arunscript (tests with no arguments may omit it).
source env.sh
cd rodinia/opencl/backprop # for example
make
VENTUS_BACKEND=rtl ./runBuild the OpenCL CTS test suite:
bash build-ventus.sh --build ctsRun all tests under a topic (e.g., compiler):
cd OpenCL-CTS/build/test_conformance/compiler
# Run all 'compiler' tests and save output to a log file
./test_compiler |& tee output.logRun a specific test kernel:
cd OpenCL-CTS/build/test_conformance/basic
./test_basic --help # List available testcases
./test_basic intmath_int4 # Testcase name from the previous commandBatch-run helper (execute all or many tests in parallel):
cd OpenCL-CTS
python3 run_test_parallel.py --json test_list_new.json --max-workers 20--json: Path to the test list (example:test_list_new.json).--max-workers: Degree of parallelism; set based on core count and system load.- The runner prints “Preparing to execute N test tasks, max concurrency = K”, followed by per-test result lines (e.g.,
[ OK ] basic_intmath_long2). - For high concurrency, also log stdout to disk:
python3 run_test_parallel.py ... |& tee cts_parallel.log.
Notes
- OpenCL-CTS is large for simulators and runs take a long time. We run CTS on
spikeonly to verify software-stack correctness. - The
spikesimulator produces instruction-level logs by default. Running CTS can generate huge logs. Before large CTS runs, consider disabling logs inbuild-ventus.shby removing--enable-commitlog:
--- a/build-ventus.sh
+++ b/build-ventus.sh
@@ -183,7 +183,7 @@ build_spike() {
# rm -rf ${SPIKE_BUILD_DIR} || true
mkdir -p ${SPIKE_BUILD_DIR}
cd ${SPIKE_BUILD_DIR}
- ../configure --prefix=${VENTUS_INSTALL_PREFIX} --enable-commitlog
+ ../configure --prefix=${VENTUS_INSTALL_PREFIX}
make -j${BUILD_PARALLEL}
make install
}For the Chisel RTL, use a Scala development environment. You can import the mill config (build.sc) into VS Code via the Scala Metals plugin, or run make idea and open the project in IntelliJ IDEA. See each sub-repo’s README for details.
All other projects use C++. Export compile_commands.json with cmake or bear to enable proper language tooling in VS Code or other IDEs.
Build the images:
docker build --target ventus-dev -t ventus-dev:latest .
docker build --target ventus -t ventus:latest .ventus-devcontains all repositories, build artifacts, and test suites—best for development.ventusincludes only final build artifacts and a subset of tests—best for a quick tryout.
-
Verilator RTL simulation error like:
%Error: /opt/verilator/5.034/share/verilator/include/verilated.cpp:2729: VerilatedContext has 8 threads but model 'Vdut' (instantiated as 'TOP') was Verilated with --threads 11.
The Verilated model was built with too much parallelism (possibly exceeding your CPU’s logical thread count). Reduce
VLIB_NPROC_DUTin:ventus-env/gpgpu/sim-verilator/verilate.mkandventus-env/gpgpu/sim-verilator-nocache/verilate.mk, then rebuild with:bash build-ventus.sh --build gpgpu
-
No characters echoing in the terminal after running spike or regression tests: Try typing blindly and run
stty echo. -
Verilator internal error like
%Error: Internal Error: ../V3FuncOpt.cpp:162: Inconsitent terms. This appears to be an occasional Verilator issue. In most cases, simply re-run the build or simulation.