Division or multiplication

Overview

On many CPUs, floating-point division is much slower than multiplication, see discussion here. Thus, when dividing by the same denominator in a loop, one might consider replacing x / y with a x * z where z = 1 / y is computed outside of the loop. This sacrifices some accuracy (due to the round-off error from computing z) in order to replace the per-iteration divide with a multiply. The question is, does this actually have an impact?

To build, use the usual CMake process:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build

Basic test

We test the speed of iterating over an array and dividing its contents by a constant. This is a pretty common pattern in tatami, and while usually more operations are involved, we'll keep things simple here. We compare to the alternative, i.e., compute the reciprocal and use it to multiply each array element within the loop.

On an Intel i7-8850H, we get:

$ ./build/basic
Division:      	0.0137299 ± 1.95084e-05
Multiplication:	0.0136978 ± 3.16613e-05

On a Mac M2, we get:

$ ./build/basic
Division:      	0.00203902 ± 0.000123369
Multiplication:	0.00200791 ± 8.85522e-05

Not much effect, it seems. I'd guess that the CPU is bottlenecked by reading the array from main memory, such that the speed of the actual operation doesn't matter.

Log-normalization

A more robust test involves log-normalization of a count matrix in RNA-seq data. This involves computing $\log_2(x / f + 1)$ where $x$ is the array of data and $f$ is a constant size factor. We consider several permutations of this approach using log1p instead of log2 and/or multiplying by the reciprocal ($r = f^{-1}$, $M = \log 2$, $N = (\log 2)^{-1}$).

On an Intel i7-8850H, we get:

$ ./build/normalize
log1p(x / f) / M:   0.150602 ± 0.00142801
log1p(x * r) * N:   0.133567 ± 0.00120476
log2(x / f + 1):    0.101818 ± 0.00119196
log2(x * r + 1):    0.0837252 ± 0.00128969
log(x / f + 1) / M: 0.0727775 ± 0.00114991
log(x * r + 1) * N: 0.062309 ± 0.00119778

On a Mac M2, we get:

$ ./build/normalize
log1p(x / f) / M:   0.0603784 ± 0.000133809
log1p(x * r) * N:   0.0546123 ± 0.000144244
log2(x / f + 1):    0.0523682 ± 5.85398e-05
log2(x * r + 1):    0.0448984 ± 6.94794-05
log(x / f + 1) / M: 0.0518799 ± 0.000129268
log(x * r + 1) * N: 0.0447853 ± 7.38861e-05

Division is significantly faster here. Now that we're doing a decent amount of work with the log-transformations, compute is now the bottleneck and the extra time spent in the divide is no longer masked.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
basic.cpp		basic.cpp
normalize.cpp		normalize.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Division or multiplication

Overview

Basic test

Log-normalization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Division or multiplication

Overview

Basic test

Log-normalization

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages