This layer includes multiple application instances that run within containers. These applications may encompass a variety of machine learning and deep learning tasks, such as facial recognition and handwritten digit recognition.
The container layer is responsible for the full lifecycle management of containers, including creation, startup, stopping, and destruction. Container technology allows applications to run in isolated environments, improving resource utilization and application portability.
This layer involves the allocation and management of resources, including GPU resource management, user/group management, volume management, and permission management. It ensures the efficient use of resources and access control.
Responsible for the management of storage resources, including hard disk management and storage management, ensuring data persistence and efficient access.
Includes service definition, service deployment, and dynamic scaling, responsible for the service-oriented management and elastic scaling of the entire system.
Responsible for cluster deployment and system monitoring, managing the coordinated work of management nodes and computing nodes, as well as the connection of high-speed 10-gigabit Ethernet networks.
Includes management nodes and multiple computing nodes, which are physical or virtual servers that perform computational tasks and system management tasks.
Provides high-speed network connections, supporting rapid data transfer between different nodes and containers.
Includes mainstream machine learning frameworks such as Tensorflow, Caffe, Torch, Keras, and deep learning algorithm libraries such as LeNet, LSTM, AlexNet, GoogleNet, ResNet, GAN, Faster R-CNN, etc.
Includes standard datasets such as ImageNet, COCO, PASCAL VOC, CIFAR, Open Image, and Youtube-8M, providing training and validation data for machine learning models.