Tesla V100 and T4 - Series of dedicated GPUs in Data Center

Although graphics processors were originally intended for gamers, computer geeks know that they are also extremely valuable in other areas. Currently, crypto miners are starting to switch to ASICs, the price and market for GPU consumption is stabilizing again. Deep learning communities can breathe, relax, shop for GPUs for new needs, and run training models.

NVIDIA shows that gamers are no longer the sole target audience for their products. In September 2018, they released the NVIDIA Tesla T4: a dedicated server inference card for Deep Learning. The NVIDIA Tesla V100, dedicated to training, is part of their line of cards specifically for Deep Learning. These cards are equipped with a feature called Tensor Core to increase the performance of the neural network. Similar Tensor Cores are also present in the latest generation of popular cards such as RTX 2060, RTX 2070, RTX 2080 and RTX 2080 Ti and new models of “SUPER” cards. If you look for one on the net, you'll find that the 2080 Ti is most often recommended for machine learning at the moment. In this article, we will look at the full range of order factors.

Specialized GPU card for server
First, there is a profound difference between the pricing of the entry-level card and the server card that NVIDIA sells. Example with a Tesla V100: this is a server card GPU based on the Volta GV100 architecture. Similarly, there is a popular card, Titan V, which is based on the same architecture with almost identical specifications. Both have 5120 CUDA core, TDP 250 Watts and about 15 TFLOPS of single-precision floating point performance. The V100 has more memory: it has 16GB of HBM2 memory running at a slightly higher clock speed than the Titan V's 12GB memory capacity. The main difference lies in the price: Titan V is sold at around 3,000 USD back, Tesla V100 is about 10,000USD! What could possibly explain this huge price difference? NVIDIA explains that the Tesla V100 has all the features of a dedicated server card: a 3-year warranty, designed and proven to be used on rack servers for long periods of time. For one thing, the EULA that comes with the necessary drivers for these cards represents NO universal use in data centers. This is why AWS, Azure, and Google Cloud don't offer Titan V cards. Basically, you need to pay about $ 7,000 more to be able to use the GPU card in the data center. If you still want to use Titan V in your data center (if somewhere allows it) then whether it will ... fire or not, no one can be sure! If you are a scientific researcher and are very serious about your project, you need to deeply understand the conditions for a server to operate in the data center, and the factors that ensure the GPU runs continuously. in the long run.

NVIDIA Tesla V100 installed in HPE DL380 G10 server
Assembly of PoC installation for NVIDIA Tesla V100 in Server World
Performance issue
We will learn about the performance of the Tesla V100 and T4, as these are the GPU models NVIDIA is mainly targeting deep learning.

Deep learning performance: For the Tesla V100, this gpu has 125 TFLOPS, compared to a single-precision performance of 15 TFLOPS. This is a huge number! How do they get this? It is based on NVIDIA's so-called "mixed precision performance". Basically, using a number of mathematical tricks, NVIDIA sought to combine both the advantages of the FP32 as well as the FP16 training: for fast results and accurate convergence. The 640 Tensor Cores introduced in the Tesla V100 were specially engineered to accelerate half-precision training, allowing them to achieve these extraordinary performance results. Deep learning experts will say: miraculous, how to use it? If you are using TensorFlow, you need a “NVIDIA NGC TensorFlow 19.03 container” and run it in a Docker instance. Then fire up an environment variable and you're ready to go!

Tesla V100 Specifications:


