Alibaba Cloud EHPC Empowers New Manufacturing — SAIC Simulation Computing Cloud (SSCC)
As the collaboration between SAIC Motor Passenger Vehicle Corporation and Alibaba Cloud progresses, multiple Alibaba Cloud technologies are being applied to SAIC’s core business of auto development and design. SAIC Motor, with the help of Alibaba Cloud, has upgraded its engineering development simulation capabilities and has promoted simulation computing efficiency by 25% thus allowing engineers to focus more intently on product design and performance optimization so as to deliver world-class products. This is evident with the launch of the MG X-Motion concept car made at the 2018 Auto China Show Beijing International Automotive Exhibition. Its production version has repeatedly been optimized for superior performance using SAIC’s simulation computing cloud platform.
A typically representative new manufacturing project is the simulation computing hybrid cloud co-built by SAIC Motor Passenger Vehicle Corporation under SAIC Motor Corporation Limited (generally referred to as “SAIC Motor Passenger Vehicle Corporation”) and Alibaba Cloud. SAIC Motor Passenger Vehicle Corporation, a wholly-owned subsidiary of SAIC Motor, is responsible for the R&D, manufacturing and sales of all of SAIC Motor’s proprietary brand vehicles. It owns two vehicle brands — Roewe and MG — and has three technology R&D centers located in Shanghai, Nanjing and the UK as well as three manufacturing sites at Shanghai Lingang, Nanjing Pukou and Longbridge in the UK.
Challenges of Production-Level Engineering Simulations
As SAIC Motor passenger cars are showing extraordinary market performance, it is absolutely necessary to keep developing and upgrading vehicle models to stay ahead of the curve. However, available computing resources for engineering simulations fall far behind the current demand in the following ways.
Massive Urgent Requirements
Computer-aided engineering (CAE) simulation computation which is currently responsible for handling critical tasks that involve various computing scenarios, quick turnaround times, and a large scale. Therefore, there is an urgent need to quickly acquire high-performance computing resources;
Lagging Resource Iteration
The current on-premises HPC clusters built by SAIC Motor Passenger Vehicle Corporation have been extended and upgraded many times. However, appropriate service requirements still are not being met, and the R&D simulation progress has been dramatically hindered due to following problems such as severely aging hardware resources, high hardware resource failure rate, low computational performance, and slow resource iteration.
Poor User Experience
Simulation developers generally still use traditional HPC computing centers. These isolate individual processes such as offline pre-processing and post-processing, online computing and solution finding. Due to fragmented processes and frequent data movement, a highly immersive, full-service online CAE simulation analysis service platform is urgently needed.
SAIC Simulation Computing Cloud (SSCC)
To tackle these problems, SAIC Motor Passenger Vehicle Corporation collaborated with Alibaba Cloud and Fanyun Technologies at the end of 2017 to build the first industrial hybrid IaaS simulation computing service platform in the industry — SAIC Simulation Computing Cloud or SSCC. Launched in early 2018, the SSCC platform demonstrates once again that cloud computing can provide high elasticity, high speed and high efficiency.
SSCC is primarily composed of Alibaba Cloud’s public cloud clusters and clusters built by SAIC Motor Passenger Vehicle Corporation. Data interconnectivity and unified scheduling for computing resources are delivered by using a high-speed leased line. Alibaba Cloud’s public cloud clusters primarily provide the following computing resources.
HPC Computing Clusters
HPC clusters’ computing nodes are composed of Alibaba Cloud Super Computing Cluster (SCC) instances. SCC has inherited from the elastic bare metal (X-Dragon) server, providing proven governance and elastic resources of cloud computing as well as performance equivalent to that of physical machines. In addition, it includes high-speed RDMA interconnectivity support to significantly improve network performance and increase the speedup ratio of large-scale clusters.
NAS Shared Network Attached Storage
NAS functions as the sharing hub for data flow in the cloud, from input task submitted by users and task solution results, to post processing data input data, all the VPC computing resources can simultaneously access data. NASplus also provides access to shared data across Windows/Linux platforms to meet needs of common enterprise business scenarios. NAS incorporates the latest Alibaba Cloud Apsara Pangu 2.0 technology. It also provides highly aggregated bandwidth to fully comply with CAE software’s I/O performance requirements. NAS also provides data availability up to 99.99999999% by using multiple backups among other methods. As business grows over time, it can be upgraded to the CPFS distributed file system for superior I/O performance as necessary.
Graphic Processing Cluster
The enterprise-level Nvidia Tesla series GPU with Pascal architecture can perform tasks such as producing fluent demo animation and quickly rendering models when multiple logged-in users are simultaneously using the graphics server, It does this all while providing high availability and ensuring the integrity and reliability of pre-process/post-process work flows.
Statistics show that an average of more than 500 multi-discipline simulation computing jobs are carried out per day on the SSCC platform. These include collision analyses, structure rigidity analyses, fluid analyses and NVH analyses. They also simulate running statuses of vehicles, engines and other parts. Thanks to performance improvements brought about by Alibaba Cloud super-computing clusters, it takes less time to compute and find solutions compared with previous use of local clusters; user jobs’ queuing time has also been significantly reduced. In addition, the majority of job data flows are in the closed loop of Alibaba Cloud public cloud cluster. This reduces the pressure on local storage and reserves more historical engineering data, making it easier for engineers to do a comparative analysis based on multiple scenarios.
With Alibaba Cloud, SAIC Motor Passenger Vehicle Corporation has upgraded its engineering development simulation capabilities and has promoted simulation computing efficiency by 25% thus allowing engineers to focus more intently on product design and performance optimization so as to deliver world-class products. The MG X-Motion concept car made its global debut at the 2018 Auto China Show Beijing International Automotive Exhibition. Its production version has repeatedly been optimized for superior performance using SAIC’s simulation computing cloud platform.
You Jin, an engineering application support senior manager in the Information System Department at SAIC Motor Passenger Vehicle Corporation said “Alibaba Cloud and the HPC clusters built by SAIC deliver excellent performance alongside flexible and scalable resources, dramatically reducing the pressure on R&D and ensuring project development progress.” Qiang Bin, infrastructure director of SAIC Data and Information Systems, said “Alibaba Cloud’s public cloud has a proven governance model which complies with SAIC’s security requirements; the flexible resource activation of the cloud also reduces investment and labor cost which would have been incurred by self-built clusters.” The application of hybrid cloud technologies boosts SAIC’s global research and development, and aligns with SAIC Motor’s R&D goals for internationalization of production; this efficient collaboration model can quickly extend across the entire product R&D chain, allowing SAIC Motor Passenger Vehicle Corporation to quickly release to terminal markets its vehicles and travel services such as those which are targeting the Four New Goals of electrification, smart networking, sharing and globalization.
Features of Alibaba Cloud EHPC Technology
SSCC is built based on Alibaba Cloud’s Elastic High Performance Computing (EHPC) solution. With innovations in IaaS/PaaS/SaaS, SSCC provides the following technological advantages.
1. Excellent Performance
HPC computing nodes have powerful performance. The Intel Xeon Gold 6149 CPU and the latest 5th generation Skylake architecture provide more excellent computing performance. It features advanced high-performance network architecture, RoCE 2 × 25 Gbps interconnection, low latency, high bandwidth, and greatly increased speedup ratio. NASplus/CPFS shared storage provides aggregative bandwidth and can meet the majority of CAE scenarios’ needs. It can also be upgraded to the CPFS file system.
These clusters provide world-class performance.
2. SLA Assurance
A variety of response methods (including the public cloud’s full-fledged, reliable governance system and migration during downtime) help provide availability up to 99.95% for a single computing node and ensure continuous CAE simulation computing services.
3. Hybrid Cloud Architecture
VPC in the cloud is connected with local clusters using Express Connect (leased line) and serves as an independent subnet to ensure secure data interconnection. Computing resources in the cloud are seamlessly connected to local licenses, schedulers, and SaaS. Increasing public cloud resources temporarily is the best way to respond to the unplanned need for compute resources (e.g. urgent projects).
4. Automatic Scaling (*E-HPC Support)
If a proper cluster load threshold is set, automatic scaling can minimize public cloud resources’ overheads and handle workloads at peak times thus ensuring the smooth operation of CAE simulation computing solution services.
5. Fast POC
Activation of public cloud resources is directly carried out in the Alibaba Cloud console. Delivery of the entire cluster is completed in just minutes, after which testing can immediately be conducted. Various trifling and annoying matters such as waiting for procurement or scheduling/installing deployment in data centers can all be avoided.
During POC, you can choose to temporarily activate more resources in the environment than originally planned for so as to accelerate the CAE application verification process.
6. Data Sharing and Interconnectivity for Linux/Windows
Alibaba Cloud provides the leading NAS in the industry, which can be mounted to Linux/Windows by using the NFS protocol. These critical features allow users to directly read computation and solution results in shared storage when they are performing interactive post-processing operations in the Windows user interface that they are already familiar with.
7. Closed Loop with Elastic Data Volumes
After users upload input task data and the computed result data is written to NAS, users can start post-processing using the graphic server in the cloud. In this way, a closed loop of data is built to provide both high security and high reliability. Except under special circumstances, there is no need to download it back to local storage.
Even if data usage exceeds planned volume (e.g. a purchased volume package), the 10 PB public cloud NAS storage can ensure that data is written successfully. Computing is almost free from bucket limitations, therefore business continuity can be ensured.
8. Collaborative Development Enabled by Perfect Account Management
By authorizing sub-account read-only permission via RAM, users can allow their partners to log on to machines in the cloud for software maintenance and troubleshooting/analysis without having to go on site. The Shared background VNC link also makes discussion and collaboration easier.
9. Perfect SaaS Service Capabilities
This platform features two types of built-in IaaS resource entries — cluster computing and virtual applications. Unified deployment, integration, scheduling and monitoring can be performed based on the engineering software’s features so as to provide online engineering software services such as CAD and CAE:
Interactive applications: HyperWorks, EnSight, Converge Studio, Star-CCM+, Fluent, MSC.Admas, Abaqus, NCode
Computing applications: LS-Dyna, Converge, Star-CCM+, Fluent, MSC.Nastran, NX.Nastran, MSC.Admas, NCode, OptiStruct, Abaqus, Star-CD, iSight
10. Granular Business Scheduling Capabilities
Based on the resource differences between on-premises HPC clusters of SAIC Motor Passenger Vehicle Corporation and Alibaba Cloud as well as data storage consistency requirements, this platform is designed to deliver granular simulation computing business scheduling capabilities, which include but not limited to:
- Resource Quota Scheduling: This platform imposes quota limitations on fixed resources and public resources based on department and project group properties. This can meet rigid computing demand at the department and project group level, and also implement elastic resource scheduling at the enterprise level.
- Unified Scheduling of IaaS Resources: With the device grouping policy, this platform schedules on-premise HPC devices and Alibaba Cloud cluster instances in a unified manner, ensuring efficient and concurrent running of a single computing instance as well as fast resource scheduling for a lot of tasks.
- Unified User Data View: This platform can manage both on-premise storage and Alibaba Cloud storage. A unified user data view is designed to provide excellent user data management experience. CAE data can perform smart matching with neighboring resource nodes, and then start computing or interaction operations.
- Preemptive Scheduling During Idle Time: This platform features some policies policy that enable preemptive scheduling during idle time so as to meet requirements of some specific computing scenarios. Quote limits on predefined resources can be exceeded during a specified period to maximize computing resource usage.
- Advanced License Scheduling Mechanisms: To comply with the features of industrial software licenses, this platform integrates a range of advanced scheduling mechanisms that allow users to control license resources reserved for device node groups and user groups.
SSCC is the first computer-aided engineering (CAE) simulation computing hybrid cloud in China that has been applied in production. It can provide online services for a simulation analysis team of hundreds of members and implement tens of thousands of simulation computing tasks each month.
If we review the journey this project has taken and outlook on the future of China’s industrial independent research and development, we believe SSCC will become an important milestone in the practice of cloud computing, and shows Chinese enterprises the true capability and huge value of intelligent industrial development cloud:
SAIC Motor has established an elastic supply system for computing resources and flexible governance mechanisms. SAIC has also implemented granular development resource management as well as a secure and reliable closed loop of core development data production. These achievements will undoubtedly inspire innovative and intelligent development and further increase core development productivity; Alibaba Cloud is committed to delivering comprehensive and efficient computing engine services for “intelligent manufacturing in China” and has made great significance in the industrial simulation computing field.