A computer model of production wells was used to compare the parallel computing speed on CPUs and GPUs.
The hardware was selected from widely available user computing resources such as the Intel Core i7 CPU and the Nvidia Titan graphics card.
Intel Core i7-3770 | Nvidia GeForce GTX Titan |
![]() |
![]() |
Specifications | Specifications |
Cores: 4 | Cores: 2688 |
Base Clock: 3.4 GHz | Base Clock: 836 MHz |
Boost Clock: 3.9 GHz | Boost Clock: 876 MHz |
Graphics Card Power: 77 W | Graphics Card Power: 250 W |
Recommended price: 250$ | Recommended price: 1300$ |
The three-dimensional model was discretized with different spatial steps. As a result, meshes with the following number of cells were obtained: ~2 million, 4 million, 8 million and 16 million. Each computational mesh was computed on 1 core of Intel Core i7, 4 cores of Intel Core i7 and the GeForce GTX Titan video card. Below there are computational results for the two-year simulation prediction.
Number of Cells | Processing Time | Speedup Factor | ||||
1 core of Intel Core i7
Single Core CPU Version |
4 cores of Intel Core i7
Multi-Core CPU Version |
GeForce GTX Titan
GPU Version |
4 cores of Intel Core i7 to 1 core | GeForce GTX Titan to 4 cores of Intel Core i7 | GeForce GTX Titan to 1 core Intel Core i7 | |
2 000 000 |
9.62 h (34,632 s) |
5.97 h (21,504 s) |
34.11 min (2,047 s) |
1.61x | 10.50x | 16.91x |
4 000 000 |
18.16 h (65,388 s) |
10.63 h (38,287 s) |
57.65 min (3,459 s) |
1.70x | 11.06x | 18.90x |
8 000 000 |
34.33 h (123,600 s) |
19.22 h (69,221 s) |
1.62 h (5,844 s) |
1.78x | 11.84x | 21.14x |
16 000 000 |
61.14 h (220,104 s) |
32.98 h (118,736 s) |
2.62 h (9,456 s) |
1.85x | 12.55x | 23.27x |
The performance of 1 core of Intel Core i7 represents an speedup factor of 1x
It should be noted that, when comparing the computational speed on multi-core architectures, the following model parameters have a significant impact on the acceleration:
– number of materials;
– the number of boundary conditions;
– mesh uniformity;
– multiplicity of mesh cells and computational cores;
– conformity of thermo-physical properties of materials.
It means that the maximum acceleration on parallel architectures could be achieved on the simplest models with a uniform computational mesh and the minimum number of materials and boundary conditions. In practice, however, computational models are more complicated, that’s why our speed analysis was based on the production wells simulation model for more objective results.
Conclusions:
- The use of computational algorithms with a low degree of parallelization is inefficient on multi-core processors and video accelerators.
- The major engineering analysis software packages on the market contain a high degree of serial code, significantly hampering the acceleration potential of parallel computing. This is largely due to the implementation of now dated mathematical solver algorithms, developed when there were no technologies such as CUDA and therefore not designed to take advantage of these parallelization technology enhancements.
- Mathematical algorithms in the latest generation CAE software are designed basing on parallel processing technology. It allows achieving speedup by a factor of ten by transferring computation from one CPU core to multi-core graphics accelerators.