Frost 3D Universal

A computer model of production wells was used to compare the parallel computing speed on CPUs and GPUs.

Soil thermal field distribution over 5 years in the XZ plane

The hardware was selected from widely available user computing resources such as the Intel Core i7 CPU and the Nvidia Titan graphics card.

Intel Core i7-3770 Nvidia GeForce GTX Titan
Intel Core i7 CPU Nvidia Titan video card
Specifications Specifications
Cores: 4 Cores: 2688
Base Clock: 3.4 GHz Base Clock: 836 MHz
Boost Clock: 3.9 GHz Boost Clock: 876 MHz
Graphics Card Power: 77 W Graphics Card Power: 250 W
Recommended price: 250$ Recommended price: 1300$

The three-dimensional model was discretized with different spatial steps. As a result, meshes with the following number of cells were obtained: ~2 million, 4 million, 8 million and 16 million. Each computational mesh was computed on 1 core of Intel Core i7, 4 cores of Intel Core i7 and the GeForce GTX Titan video card. Below there are computational results for the two-year simulation prediction.

Number of Cells Processing Time Speedup Factor
1 core of Intel Core i7

Single Core CPU Version

4 cores of Intel Core i7

Multi-Core CPU Version

GeForce GTX Titan

GPU Version

4 cores of Intel Core i7 to 1 core GeForce GTX Titan to 4 cores of Intel Core i7 GeForce GTX Titan to 1 core Intel Core i7
2 000 000

9.62 h

(34,632 s)

5.97 h

(21,504 s)

34.11 min

(2,047 s)

1.61x 10.50x 16.91x
4 000 000

18.16 h

(65,388 s)

10.63 h

(38,287 s)

57.65 min

(3,459 s)

1.70x 11.06x 18.90x
8 000 000

34.33 h

(123,600 s)

19.22 h

(69,221 s)

1.62 h

(5,844 s)

1.78x 11.84x 21.14x
16 000 000

61.14 h

(220,104 s)

32.98 h

(118,736 s)

2.62 h

(9,456 s)

1.85x 12.55x 23.27x

Computation acceleration chart

The performance of 1 core of Intel Core i7 represents an speedup factor of 1x

It should be noted that, when comparing the computational speed on multi-core architectures, the following model parameters have a significant impact on the acceleration:
- number of materials;
- the number of boundary conditions;
- mesh uniformity;
- multiplicity of mesh cells and computational cores;
- conformity of thermo-physical properties of materials.
It means that the maximum acceleration on parallel architectures could be achieved on the simplest models with a uniform computational mesh and the minimum number of materials and boundary conditions. In practice, however, computational models are more complicated, that’s why our speed analysis was based on the production wells simulation model for more objective results.

Conclusions:

  1. The use of computational algorithms with a low degree of parallelization is inefficient on multi-core processors and video accelerators.
  2. The major engineering analysis software packages on the market contain a high degree of serial code, significantly hampering the acceleration potential of parallel computing. This is largely due to the implementation of now dated mathematical solver algorithms, developed when there were no technologies such as CUDA and therefore not designed to take advantage of these parallelization technology enhancements.
  3. Mathematical algorithms in the latest generation CAE software are designed basing on parallel processing technology. It allows achieving speedup by a factor of ten by transferring computation from one CPU core to multi-core graphics accelerators.