|
Today appeared new information about Fermi and GF100 . The new architecture has many revolutionary and evolutionary changes, let us examine them based on the example of GF100. Chip will consist of the following global blocks:
- PCI express bus and image controllers;
- Thread distribution block : gigaThread engine;
- Four video clusters – GPC (Graphics processing cluster);
- Six 64- bit memory controllers with cache ub the second level and ROP blocks.
The PCI express , image controllers and the block gigaThread engine do not need an idea, but here let us examine in more detail. We start from Graphics processing cluster. Actually, this is independent graphic processor, which in GF100 are four pieces, which removes a whole series of limitations and enlarges the bottlenecks of graphic conveyor. On NVIDIA statements , Fermi was developed taking into account all special features Of directX 11 and basic difference from the previous architecture appears an increase of geometric productivity by several times. Each Graphics processing cluster consists of one block (not to confuse with ROP) and four SM (Streaming multiprocessors).

Each SM, in turn, it consists of:
- Configured cache first level (64 KB) and textures cache (12 KB);
- Two scheduler ;
- File register (128 KB);
- Two array of scalar processors, on 16 CUDA cores in each (in all in GF100 512 pieces);
- Four blocks SFU for special computational operations (in all in GF of 100 64 pieces);
- Sixteen blocks LSU (load and unloading data, in all in GF100 256 pieces);
- Four blocks TMU (sample and textures filtration , in all in GF100 64 pieces);
- Block polyMorph engine;
Let us talk in greater detail on the latter. This block answers for five stages of graphic conveyor - sample apexes, conversion in the screen coordinates, installation of an attributes and thread conclusions.

The first important conclusion of GF100 does not have the united chosen block of tessellation as Cypress, but each GPC has four similar blocks, which allow to a considerable degree execute in parallel this operation. Now the question is one large block of tessellation (AMD) or sixteen small (NVIDIA) is better? Answer to this question is demonstrated by tests NVIDIA results , where the Nvidia solution goes around Radeon HD 5870 2-6 times.
Of course similar superiority relates to the synthetic and semisynthetic tests of tessellation, but not general productivity of chip. With the comparison Of radeon HD 5870 and GF100 in the popular test unigine heaven the latter demonstrates superiority over rival approximately by 1.6 times.
In the popular game far cry 2 future NVIDIA card demonstrates analogous indices : it was one and a half times faster than elder one chip solution from AMD. However, in 3DMark vantage, using extreme setting, GF100 is faster than Radeon HF 5870 approximately by 80%. All in all, according to the first tests results it is possible to note that GF100 is considerably faster than one chip AMD card card, it is actually compared with the two-chip solution radeon HD 5970.
The total number of sample and textures filtering blocks is only 64 pieces, which is less than in Cypress and GT200 (80 pieces). NVIDIA asserts that they succeeded in considerably increasing the effectiveness of their work, which is 40-60% superior to GT200.
Some independent specialists even assume that TMU now operate at the shader frequency . By the way, about the frequencies. Solutions on base Of g80 had two independent frequencies : core and shader domain. Moreover, the second was always more than double of the first . In Fermi , as the fundamental chip frequency now is considered the shader frequency , which in GF100 is within the limits 1400-1500 MHz, and the remaining blocks frequency is now rigid through reducing coefficient 1/2.
Last component in fermi architecture is the memory controllers , ROP blocks cache second level. The general cache in the second level (768 KB) is broken in six blocks on 128 KB, which communicate directly with the memory controller and ROP blocks. The memory controllers of GF100 support GDDR- 5 memories and have a 384 bits bus , which increase the memory capacity more than one and a half times in comparison with GT200. The scanning operation's blocks (ROP) also underwent changes in comparison with GT200. Their number per channel is doubled (8 pieces to each memory controller , in all in GF100 there are 48 pieces, in GT200 32 pieces), and effectiveness is increased, which allow the G100 to anticipate GeForce GTX 285, with the use of 8x smoothing more than twice.
NVIDIA made immediately two interesting update : considerably improve architecture for GPGPU and simultaneously produce a number of revolutionary and evolutionary changes for an increase of productivity in 3d applications.
At the conclusion, we can share fresh rumors relative to new card from NVIDIA. Nvidia plan to release two solutions, with the assumed names GeForce GTX 380 and GeForce GTX 360. The first product will be high end , compete with Radeon HD 5970, energy consumption of approximately 280 W and cost of approximately $500-550. The low-end solution will have fewer numbers of blocks and a little anticipate Radeon HD 5870 ($399). Supposedly, the number of its actuating elements will be reduced doubly, in comparison with GF100 and it will consist of two blocks GPC. Clock frequencies can be scarcely higher, then in GF100, which will allow to demonstrate productivity at the level 50-55% in comparison with high end card. It is assumed that GF104 will appear simultaneously with GF100 or in the nearest time .
 Related Products :
|