Reinforcement Learning for Scheduling Processes Under Uncertainty in Chemical Engineering Facilities

Rangel-Martinez, Daniel

Reinforcement Learning for Scheduling Processes Under Uncertainty in Chemical Engineering Facilities

Files

Rangel-Martinez_Daniel.pdf (3.64 MB)

Date

2025-06-16

Authors

Rangel-Martinez, Daniel

Advisor

Ricardez-Sandoval, Luis Alberto

Publisher

University of Waterloo

Abstract

Optimal scheduling of chemical systems has gained interest as it provides economic advantages. The development of methodologies for approaching this problem have expanded considerably bringing multiple options to the process optimization field. Real world applications deal with uncertainty in multiple stages of the process, from price fluctuations to processing times and material quality. Preventive and reactive scheduling techniques, which adapt to uncertainty realizations were developed to approach the scheduling optimization problem under uncertainty. Recently, the use of Deep Neural Networks (DNN) combined with Reinforcement Learning (RL) algorithms became an option to generate policies for decision-making processes. From the context of scheduling, a policy can be beneficial as it produces the schedule online, responding to the needs of the process and the realization of uncertainties in real time. In this thesis, a set of methodologies to approach the scheduling problem under uncertainty for batch systems with Deep Reinforcement Learning (DRL) methods is presented. The state-of-the-art methods available in this area are improved in this work with the development of different techniques that study the translation of the scheduling problem into the framework of Reinforcement Learning. Contrary to multiple approaches in the literature where the process environment is assumed to be a Markov Decision Process (MDP), the methods presented in this work assume a partial observability of the process. This is called Partially Observable Markov Decision Process (POMDP) and it is useful to handle the uncertainty in the process as well to perceive the evolution over time of the process. To hold this assumption for the scheduling process, the use of Recurrent Neural Networks (RNN) is implemented in order to analyze their performance in the scheduling optimization problem. To the author’s knowledge, these networks have not been used before with this purpose, i.e., their implementation in the literature is focused on other features of this type of network, for instance, their capacity to handle inputs of various lengths. This new perspective is compared with implementations framed as MDPs showing the advantage of approaching the scheduling problem in this way. The decision space of the scheduling problem is usually discretized into a set of actions that can be taken in the online scheduling process with DRL. In this work we propose the use of hybrid agents that can make simultaneous decisions in the process at every decision-step. This allows to expand the applicability of the scheduling agent into more realistic scenarios where more than one decision is required. Moreover, this method extends the nature of the decisions into the continuous space by using DRL algorithms that can work on this domain. This method also allows the integration of the scheduling tasks with others levels in the hierarchical manufacturing systems, e.g., planning or control tasks. To the author’s knowledge, attempts to extend the number of decisions, from the agent to these other levels, are not reported in the literature. The use of DNNs for modeling policies in the scheduling process brings the issue of the “black box” model in which the model does not provide any descriptions of its heuristics for humans to understand the logics behind its decisions. This becomes an obstacle when ensuring that the setting of hyperparameters aligns with the current decisions of the agent. In other words, the agent does not provide any insights about its decisions and the only way to check this is through the final results of the agent. To provide the agent with interpretability, the use of attention mechanisms is implemented in this work. They allow to develop an attention matrix at every decision-step in the process which allows to gather information on the logics behind the decisions. Attention mechanisms have been used in the literature due to their outstanding capacity for building correlations between the elements of the process. To the author’s knowledge, the use of this interpretability for inspection and correction of hyperparameters in the scheduling problem is not reported in the literature. The algorithms from DRL that are used in the methods presented are: a) a variation of Deep Q-Learning and b) Proximal Policy Optimization (PPO). Deep Q-Learning is a well-known DRL method that specializes in problems with a discrete action space; on the other hand, PPO is applicable to discrete and continuous action spaces. These methods are of relatively simple implementation and showed good performance on the works presented in this thesis. The implementations of the methods developed were tested on job-shops, flow-shops and State Task Networks, following objective functions related to reduction of makespan and product maximization. Results showed that the trained agents with the presented methods can generate schedules considering the uncertain parameters of the system, thus making this online scheduling agent attractive for industrial-scale applications. The capacity to react of the agent is demonstrated with the experiments performed on each study. Moreover, the results were compared with different benchmarks including alternative DNNs and optimization solvers. The implementation of DRL methods to approach the scheduling problem under uncertainty demonstrated that this alternative has potential as a reactive online scheduler that can provide reliable responses in short turnaround times. The set of methods presented in this thesis illustrate the advantages and limitations of the incorporation of machine learning into the decision-making process in the context of chemical engineering.