IBM Has An AI Training Advantage
By Paul Teich
This article focuses on the “under the hood” machine learning (ML) work IBM announced last week at Think 2018, work that will soon accelerate Watson and PowerAI training performance even further. This coincidentally highlights IBM’s partnership with NVIDIA and access to NVIDIA’s NVLink interconnect technology for GPUs.
Last year, I discussed an IBM paper that described how to train a ML image classification model in under an hour at 95% scaling efficiency and 75% accuracy, using the same data sets that Facebook used for training. IBM ran that training benchmark in the first half of 2017 using the 64 POWER8-based Power System S822LC for High Performance Computing systems. Each of those systems had four NVIDIA Tesla P100 SXM2-connected GPUs and used IBM’s PowerAI software platform with Distributed Deep Learning (DDL).
IBM’s new paper, “Snap Machine Learning,” describes a new IBM machine learning (ML) library that more effectively uses available network, memory and heterogeneous compute resources for ML training tasks. It is also based on a new platform—IBM Power Systems AC922 server. IBM’s AC922 has four SXM2-connected NVIDIA Tesla V100 GPUs attached to dual-POWER9 processors via NVIDIA’s latest NVLINK 2.0 interface.
More => https://www.forbes.com/sites/tiriasresearch/2018/03/26/ibm-has-ai-training-advantage/#7e1ee1d45434