top of page

Optimal Maintenance of Deteriorating Systems Integrating Deep Reinforcement Learning and Bayesian Inference

ML Tags

Deep Reinforcement Learning

Partially Observable Markov Decision Processes

Q-learning

Actor-Critic

Double Deep Q-Network

Proximal Policy Optimization

Bayesian Model Updating

Topic Tags

Engineering systems maintenance

Decision making optimization

Stochastic degradation

> Software & Plug-ins Used 


Several Python libraries & packages (NumPy, SciPy, PyMC3, Gym, PyTorch, Matplotlib, Seaborn) for coding and training model 

> Workflow


Figure provided by author: Problem conceptual breakdown 


Figure provided by author: Figure provided by author: Problem conceptual breakdown 


Figure provided by author: Figure provided by author:  Framework for the case study (three storey frame) 



Short summary video.


Extended summary video.

> Summary


A key computational challenge in maintenance planning for deteriorating structures is to concurrently secure (i) an optimal series of decisions over long planning timelines, and (ii) accurate updates of real-time parameter in high-dimensional stochastic spaces. Remember that stochastic spaces are those where the result is impacted by random variables.  Both challenges are further restricted by the fact that the deterioration of a structure can be measured along every point in time which means the model that describes the deterioration is a discretised continuous-state model. An additional complication is that decision making about maintenance planning allows for multiple options at each step leading to a combinatorial decision space, or a decision space that requires finding an optimal option out of an infinite number of options. Recent advances in Deep Reinforcement Learning (DRL) formulations for inspection and maintenance planning provide powerful frameworks to address these concerns by efficiently handling near-optimal decision-making in immense state and action spaces without the need for offline system knowledge. This means DRL can be used to optimize the decision-making process despite the challenges. Bayesian Model Updating (BMU), was another method that aided in the process. BMU minimizes discrepancies between measured and predicted responses to estimate uncertain model parameters. This advanced sampling methods, allowed the authors to address dimensionality and accuracy issues related to the discretized degradation processes. Building upon these concepts, the authors developed a joint framework in the work, coupling DRL, more specifically deep Q-learning and actor-critic algorithms, with BMU through Hamiltonian Monte Carlo. 


Hamiltonian Monte Carlo is the algorithm used to solve the optimization problem. To understand how it works, let’s review a few concepts. A Markov chain is a model that is used to describe events that occur in a sequence. It describes the possible events with the probability of each event happening depending only on the state of the previous event. Markov chain Monte Carlo is a method that constructs a Markov chain with a desired probability distribution as its equilibrium distribution. This allows samples to be obtained by recording states from the Markov chain. Hamiltonian Monte Carlo is an algorithm that uses the Markov chain Monte Carlo method for obtaining a sequence of samples that converge to a distribution that aligns with the targeted probability distribution. When the distribution aligns with the targeted probability distribution, an optimial solution is identified. 


The paper examines single and multi-component  structural systems, and shows that the proposed methodology yields reduced lifelong maintenance costs, and policies of high fidelity and sophistication compared to traditional optimized time- and condition-based maintenance strategies. 


LIMITATIONS: The proposed framework relies on limited assumptions and simplifications of the physical problem. By simplifying the physical problem, it was possible to decrease the computational time. This is a common concern with machine learning problems because the training of the DRL agent as well as the tuning of the model’s hyperparameters are time consuming procedures. It is important to weigh the trade-off between simplifying the problem (which is often necessary) and reduced training times. In this case, this issue could be characterized as a disadvantage worth having, since such a tool can yield the optimal maintenance strategy for the whole lifetime of an engineering system, making the runtime seem less important on a relative scale.

> Additional Visuals



Figure provided by author : Neural network architecture for Q-function approximation and actor-critic algorithms


Figure provided by author : Case study geometry - Three storey plane frame


Figure provided by author : Policy realizations for different training episodes for different components

> Possible Applications

bottom of page