One of the main issues is that the extreme performance levels needed for large scale neural networks requires performance in the 50-200 TFLOP range. Large scale being something like 8-20 billion neuron units. So there are a few cards that can do such things like Nvidia's new Titan Rcx(?) something or other. But there's a catch. It's only at 16bit. go to full 32 bit precision and performance drops off.
Neural networks rarely need such large numbers. What they need more is precision. Yet bFloat16 only has 7 bits for the fraction, rather than standard 16 bit float which has 10. since each is a DOUBLING of range, those 2 bits mean a big difference in precision. What gives?
Remember that most neural networks store values between 0 and 1. The need for large numbers doesn't exist.
It has to do with the speed of moving huge pipelines of 32 bit precision data in traditional memory into and out of these systems. By removing 2 precision bits it allows conversion from 32 bit float to be much simpler as the exponent is identical. You basically just whack 16 bits off the end.
32 bit floating point |
bFloat16 |
What is the trade off? Well, its the loss of precision in the small numbers. So if you are designing a re-entrant highly convolutional dimensional neural network precision might be much more important to you vs. a vision system which needs huge data pipes into the system. Since neural network systems use very specialized memory (gddr6 or HMB) which is more expensive, its tough to get 16 or even 24 gigabytes of ram to support the processing. That limits the number of nodes you can express.
bFloat16 is the google pushed format, but is it right for everyone? No it's not. The loss of precision will probably hinder the kinds of problems you can solve with neural networks. Personally, I'd rather go with precision.