Implementing MLOps on Edge-Cloud Systems: A New Paradigm for Training at the Edge

Dave, Ridham

Implementing MLOps on Edge-Cloud Systems: A New Paradigm for Training at the Edge

Files

Ridham_Dave.pdf (1.88 MB)

Date

2023-08-18

Authors

Dave, Ridham

Advisor

Fischmeister, Sebastian

Publisher

University of Waterloo

Abstract

Owing to the rise in data from the Internet of Things~(IoT) devices and the increasing demand for intelligent decision-making on the network's edge, there has been a significant surge in interest in the intersection of edge computing, cloud computing and artificial intelligence~(AI). Various sectors are adopting such an integrated approach because of the low-latency operating capability due to edge computing, intelligent decision-making due to AI and scalable computing in the cloud. Due to low-latency requirements, in case of performance degradation of the AI application, it is crucial to rapidly adapt and update the edge environment independently while maintaining state synchronization with the cloud. Owing to the prerequisite for rapid adaptability, a necessity for personalized Machine Learning~(ML) training on the edge becomes evident. Furthermore, the universal ML model training is typically conducted in the cloud, leveraging its higher computing resources and abundant data in the central storage. In such a hybrid environment with multiple model sources, it is essential to maintain consistency and a synchronized state of the system. Conventional Machine Learning Operations, also known as MLOps, manage the efficient deployment and monitoring of machine learning models in a single-tier environment. The challenge of performing MLOps in an edge-cloud environment grows with the number of IoT devices, edge servers and machine learning models. Thus, streamlining the machine learning process, including model training, deployment, and performance monitoring, requires a scalable and robust hybrid approach. To solve the challenge of performing multi-tiered MLOps in a hybrid ecosystem, we propose a novel MLOps architecture to orchestrate the edge-cloud model training and synchronization. This thesis assesses the proposed architecture using quality attributes, including maintainability, reliability, scalability, functional adaptability and robustness. Furthermore, the thesis tests the proposed architecture in a practical case study experiment, including multiple IoT devices, edge servers and centralized cloud infrastructure. This thesis presents an innovative solution for maintaining ML-enabled edge-cloud systems.