Latest developments on the Evolve platform
During the previous periods, most of our developments have been dedicated to the convergence of the respective High-Performance Computing (HPC), Big Data (BD) and Machine Learning (ML) domains with high benefits expected from this cross-fertilization but raising at the same time multiple technical challenges. Indeed, HPC has been traditionally used for modeling and simulation while BD is dedicated to the ingestion and analysis of data from diverse sources. Both communities are evolving in response to changing user needs and technological landscapes and further with a convergence towards using machine learning (ML) not only for data analytics but also for modeling and simulation. These joint evolutions in the software field and at the hardware level put considerable pressure on system design. Evolve platform is the perfect vehicle to design and assess the converged architectures of tomorrow.
Evolve: Conceptual view of a modern workflow running on the heterogeneous platform developed during the project.
Evolve workflow and platform as seen by end users
During the last period, our work has been carried out in several directions at the hardware level:
- Improvement of computing power by upgrading certain CPUs with more recent versions (Intel/SKL-X);
- Increasing local memory/storage capacity at compute nodes;
- Enrichment of heterogeneous computation by the introduction of general-purpose state-of-the-art accelerators Nvidia/V100 GPU and Intel/Stratix-10 FPGA;
- Investigation of the AI-Intel® Vision Accelerator Design with Intel® Movidius™ VPUs.
Software level work followed earlier hardware improvements with a strong trend towards AI-Acceleration:
- Support for CUDA and OpenCL programming models respectively for general-purpose accelerators Nvidia/GPU and Intel/Stratix-10;
- Implementation of the Intel OpenVINO™ software layer dedicated to the optimization of inference engines on a variety of processing devices (CPU, Intel GPU, VPU, etc.)
- Porting of the open source Tensor Flow2/Keras framework on the HPC platform to support the design of ANNs;
- Deployment of the latest version of DDN burst-buffers IME 1.4. This new release supports advanced features, with better interoperability such as extended NFS support or Quota management, improved performance and better reliability, for instance with integrated memory swap, hot swappable device and hot ejection / insertion of a new node in the system.
All the previous developments have been validated by a set of standard ANN benchmarks (VGG16, ResNet50) trained on a reduced number of standard datasets. The validation was also performed on the acceleration of Thales’ pilot using FPGAs and adopting the OpenCL programming model. It is worth to mention that investigative work was also carried out on the acceleration of workflows through data optimization as well as local data processing (e.g. in-storage computing).
In fact, these works form a few snippets of components for opening up the HPC architecture in the future. Indeed, in addition to raw computing power, new AI/BD application and data issues are emerging, i.e., it is no longer a question of simply multiplying processing units, but it becomes mandatory to collaborate more than ever with the cluster user and the programmers. Hardware/Software synchronization, custom acceleration, energy efficiency, integrated memory, scalable file systems, etc. are mandatory in response to the growth in AI applications and Data. It follows evolutions respectively from temporal to spatial architecture and data towards model parallelism. The platform defined in Evolve is customer centric. All the technologies and the architecture in general are checked against the pilot applications and the 10+ additional Proof-of-Concept use cases which have been enlisted these recent months.