As we promised, let us today examine the details of the new architecture fermi. NVIDIA asserts that this large and most important architecture updates from the moment of g80 release (GeForce 8800 ):
- Stream processors (CUDA cores). Their number composes 512 pieces. Moreover, in contrast to GT200, now for the calculations of single (FP32) and double (FP64) precision answer the same blocks. With the use of FP64 the work is reduced doubly, ensuring thus 256 calculations with double precision per time. In GT200, for comparison, there were only 30 chosen blocks for similar calculations. Furthermore, each CUDA core includes besides the device for operations with floating point, separate device for the integral operations with 64- bit accuracy. All calculations are performed at the sane time. However, if earlier SP and SFU relationship composed 4 to 1, then now to each SFU there is 8 SP, i.e., two times more. From other sides, their productivity grew approximately by four, so the total specific increase can be estimated as two-fold;
- The hierarchical organization of chip also changed. If earlier base unit was TPC (textural- processor cluster), containing eight blocks of sample textures (TMU) and three array of stream processors (SM), then now TPC was actually updated to SM, which grew out from 8 stream processors to 32. Thus, GF100 (GT300) includes 16 blocks SM, each of which consists of 2x16 CUDA cores, 16 blocks of load and unloads (LSU) and 4 SFU;
- Two task thread on each SM ensures actually the analog of hyper-Threading technology on GPU, which favorably affects the effectiveness of load and. Therefore, the productivity;
- GPU contains the optimized caches of first level with the summary volume 1 mb and cache in the second level with the volume 768 KB.
It is possible already to note that numerous changes to increase the EFFICIENCY in calculations of general purpose on GPU. However, we hope that the architecture updates will favorably affect 3D. applications. Related Products :
|