We continue running benchmark tests on new LinuxVixion Workstations based on the Ampere NVIDIA family GPUs.
After testing GROMACS on 2*GPU system (RTX-3080), now is the turn for Relion 3.1 benchmark tests in a LVX SILENT 2*RTX-3090 and LVX SILENT 4*RTX-3090 with 2 and 4 NVIDIA GeForce RTX 3090 GPUs, respectively.
These NVIDIA Ampere GeForce RTX 3090 GPUs with 24GB RAM, 10496 CUDA Cores, 1395 MHz Clock speed has proved to be faster than previous Pascal, Turing and Volta NVIDIA GPUs series.
Relion is one of the most popular software for CryoEM analysis. We would like to compare our tests with those carried out in the past on different platforms and different GPUs.
With this data from our Relion 3.1 benchmark tests we have a new scenario that show how this technique has improved in the last years, if we compare it with the orginal Titan X (Pascal) carried on at the MRC Laboratory of Molecular Biology (LMB) and even with the latest Nvidia V-100. You can find the original data in the web Benchmarks & computer hardware, by Sjors Scheres.
Results are for Plasmodium Ribosome Classification (2D, 3D) using Relion 3.1.
2 GPUs Workstation results
3D benchmark
time mpirun -n 3 which relion_refine_mpi
--i Particles/shiny_2sets.star --ref emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360 --K 6 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed 0 --o class3d --scratch_dir /data --gpu 0:1 --pool 100 --j 4 --dont_combine_weights_via_disc
Output: 71 m 14.239 s → 1hr 11m
So if we compare the time we have obtained, 1hr 11m, with the reference in the original benchmark website, 4h 29m, we can clearly see that our result is much more faster.
2D benchmark
time mpirun -n 3 which relion_refine_mpi
--i Particles/shiny_2sets.star --ctf --iter 25 --tau2_fudge 2 --particle_diameter 360 --K 200 --zero_mask --oversampling 1 --psi_step 6 --offset_range 5 --offset_step 2 --norm --scale --random_seed 0 --o class2d --scratch_dir /data --gpu 0:1 --pool 100 --j 4 --dont_combine_weights_via_disc
Output: 361 m 33.032 s → 6h 1m.
Comparing this number with the original benchmark (11h 2m), there is no doubt that GeForce RTX 3090 provides faster results.

4 GPUs Workstation results
We run basically the same example but using options mpi -n 5 and –j 6. (Note there are not results for 2D classification for 4*V100 in the original tests).
3D benchmark
time mpirun -n 5 which relion_refine_mpi
--i Particles/shiny_2sets.star --ref
emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360 --K 6 --flatten_solvent --zero_mask --oversampling 1
--healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed 0 --o class3d --scratch_dir /data --gpu --pool 100 --j 6 --dont_combine_weights_via_disc
Output: 42 m 58.354s
Again, the time we have obtained in our benchmark is shorter that the original one (72 min).
2D benchmark
time mpirun -n 5 which relion_refine_mpi
--i Particles/shiny_2sets.star --ctf --iter 25 --tau2_fudge 2 --particle_diameter 360 --K 200 --zero_mask --oversampling 1 --psi_step 6 --offset_range 5 --offset_step 2 --norm --scale --random_seed 0 --o class2d --scratch_dir /scratch --gpu --pool 100 --j 6 --dont_combine_weights_via_disc
Output: 200 m 4.513 s → 3h 20m. (No 2D Classification in 4*V100 system).
